Commit Graph

2 Commits

Author SHA1 Message Date
Apple
49272b66e6 feat: add RAG converter utilities and update integration guide
RAG Converter:
- Create app/utils/rag_converter.py with conversion functions
- parsed_doc_to_haystack_docs() - convert ParsedDocument to Haystack format
- parsed_chunks_to_haystack_docs() - convert ParsedChunk list to Haystack
- validate_parsed_doc_for_rag() - validate required fields before conversion
- Automatic metadata extraction (dao_id, doc_id, page, block_type)
- Preserve optional fields (bbox, section, reading_order)

Integration Guide:
- Update with ready-to-use converter functions
- Add validation examples
- Complete workflow examples
2025-11-16 03:03:20 -08:00
Apple
7251e519d6 feat: enhance model output parser and add integration guide
Model Output Parser:
- Support multiple dots.ocr output formats (JSON, structured text, plain text)
- Normalize all formats to standard ParsedBlock structure
- Handle JSON with blocks/pages arrays
- Parse markdown-like structured text
- Fallback to plain text parsing
- Better error handling and logging

Schemas:
- Document must-have fields for RAG (doc_id, pages, metadata.dao_id)
- ParsedChunk must-have fields (text, metadata.dao_id, metadata.doc_id)
- Add detailed field descriptions for RAG integration

Integration Guide:
- Create INTEGRATION.md with complete integration guide
- Document dots.ocr output formats
- Show ParsedDocument → Haystack Documents conversion
- Provide DAGI Router integration examples
- RAG pipeline integration with filters
- Complete workflow examples
- RBAC integration recommendations
2025-11-16 03:02:42 -08:00