feat: add qa_build mode, tests, and region mode support

Router Configuration:
- Add mode='qa_build' routing rule in router-config.yml
- Priority 8, uses local_qwen3_8b for Q&A generation

2-Stage Q&A Pipeline Tests:
- Create test_qa_pipeline.py with comprehensive tests
- Test prompt building, JSON parsing, router integration
- Mock DAGI Router responses for testing

Region Mode (Grounding OCR):
- Add region_bbox and region_page parameters to ParseRequest
- Support region mode in local_runtime with bbox in prompt
- Update endpoints to accept region parameters (x, y, width, height, page)
- Validate region parameters and filter pages for region mode
- Pass region_bbox through inference pipeline

Updates:
- Update local_runtime to support region_bbox in prompts
- Update inference.py to pass region_bbox to local_runtime
- Update endpoints.py to handle region mode parameters
This commit is contained in:
Apple
2025-11-16 04:26:35 -08:00
parent be22752590
commit d3c701f3ff
6 changed files with 266 additions and 12 deletions

View File

@@ -29,7 +29,8 @@ async def parse_document_with_ollama(
images: List[Image.Image],
output_mode: Literal["raw_json", "markdown", "qa_pairs", "chunks", "layout_only", "region"] = "raw_json",
doc_id: Optional[str] = None,
doc_type: Literal["pdf", "image"] = "image"
doc_type: Literal["pdf", "image"] = "image",
region_bbox: Optional[dict] = None
) -> ParsedDocument:
"""
Parse document using Ollama API
@@ -109,7 +110,8 @@ def parse_document_from_images(
images: List[Image.Image],
output_mode: Literal["raw_json", "markdown", "qa_pairs", "chunks", "layout_only", "region"] = "raw_json",
doc_id: Optional[str] = None,
doc_type: Literal["pdf", "image"] = "image"
doc_type: Literal["pdf", "image"] = "image",
region_bbox: Optional[dict] = None
) -> ParsedDocument:
"""
Parse document from list of images using dots.ocr model
@@ -159,7 +161,7 @@ def parse_document_from_images(
image_bytes = buf.getvalue()
# Use local_runtime with native prompt modes
generated_text = parse_document_with_local(image_bytes, output_mode)
generated_text = parse_document_with_local(image_bytes, output_mode, region_bbox)
logger.debug(f"Model output for page {idx}: {generated_text[:100]}...")