feat: add RAG converter utilities and update integration guide

RAG Converter:
- Create app/utils/rag_converter.py with conversion functions
- parsed_doc_to_haystack_docs() - convert ParsedDocument to Haystack format
- parsed_chunks_to_haystack_docs() - convert ParsedChunk list to Haystack
- validate_parsed_doc_for_rag() - validate required fields before conversion
- Automatic metadata extraction (dao_id, doc_id, page, block_type)
- Preserve optional fields (bbox, section, reading_order)

Integration Guide:
- Update with ready-to-use converter functions
- Add validation examples
- Complete workflow examples
This commit is contained in:
Apple
2025-11-16 03:03:20 -08:00
parent 7251e519d6
commit 49272b66e6
3 changed files with 208 additions and 0 deletions

View File

@@ -174,6 +174,23 @@ async def route(request: RouterRequest):
### 1. Конвертація ParsedDocument → Haystack Documents
**Готова функція:** `app/utils/rag_converter.py`
```python
from app.utils.rag_converter import parsed_doc_to_haystack_docs, validate_parsed_doc_for_rag
# Валідація перед конвертацією
is_valid, errors = validate_parsed_doc_for_rag(parsed_doc)
if not is_valid:
logger.error(f"Document validation failed: {errors}")
return
# Конвертація
haystack_docs = parsed_doc_to_haystack_docs(parsed_doc)
```
**Або вручну:**
```python
from haystack.schema import Document