RAG Converter: - Create app/utils/rag_converter.py with conversion functions - parsed_doc_to_haystack_docs() - convert ParsedDocument to Haystack format - parsed_chunks_to_haystack_docs() - convert ParsedChunk list to Haystack - validate_parsed_doc_for_rag() - validate required fields before conversion - Automatic metadata extraction (dao_id, doc_id, page, block_type) - Preserve optional fields (bbox, section, reading_order) Integration Guide: - Update with ready-to-use converter functions - Add validation examples - Complete workflow examples
17 lines
310 B
Python
17 lines
310 B
Python
"""
|
|
Utility functions for PARSER Service
|
|
"""
|
|
|
|
from app.utils.rag_converter import (
|
|
parsed_doc_to_haystack_docs,
|
|
parsed_chunks_to_haystack_docs,
|
|
validate_parsed_doc_for_rag
|
|
)
|
|
|
|
__all__ = [
|
|
"parsed_doc_to_haystack_docs",
|
|
"parsed_chunks_to_haystack_docs",
|
|
"validate_parsed_doc_for_rag"
|
|
]
|
|
|