Commit Graph

3 Commits

Author SHA1 Message Date
Apple
4601c6fca8 feat: add Vision Encoder service + Vision RAG implementation
- Vision Encoder Service (OpenCLIP ViT-L/14, GPU-accelerated)
  - FastAPI app with text/image embedding endpoints (768-dim)
  - Docker support with NVIDIA GPU runtime
  - Port 8001, health checks, model info API

- Qdrant Vector Database integration
  - Port 6333/6334 (HTTP/gRPC)
  - Image embeddings storage (768-dim, Cosine distance)
  - Auto collection creation

- Vision RAG implementation
  - VisionEncoderClient (Python client for API)
  - Image Search module (text-to-image, image-to-image)
  - Vision RAG routing in DAGI Router (mode: image_search)
  - VisionEncoderProvider integration

- Documentation (5000+ lines)
  - SYSTEM-INVENTORY.md - Complete system inventory
  - VISION-ENCODER-STATUS.md - Service status
  - VISION-RAG-IMPLEMENTATION.md - Implementation details
  - vision_encoder_deployment_task.md - Deployment checklist
  - services/vision-encoder/README.md - Deployment guide
  - Updated WARP.md, INFRASTRUCTURE.md, Jupyter Notebook

- Testing
  - test-vision-encoder.sh - Smoke tests (6 tests)
  - Unit tests for client, image search, routing

- Services: 17 total (added Vision Encoder + Qdrant)
- AI Models: 3 (qwen3:8b, OpenCLIP ViT-L/14, BAAI/bge-m3)
- GPU Services: 2 (Vision Encoder, Ollama)
- VRAM Usage: ~10 GB (concurrent)

Status: Production Ready 
2025-11-17 05:24:36 -08:00
Apple
1ed1181105 feat: add RAG quality metrics, optimized prompts, and evaluation tools
Optimized Prompts:
- Create utils/rag_prompt_builder.py with citation-optimized prompts
- Specialized for DAO tokenomics and technical documentation
- Proper citation format [1], [2] with doc_id, page, section
- Memory context integration (facts, events, summaries)
- Token count estimation

RAG Service Metrics:
- Add comprehensive logging in query_pipeline.py
- Log: question, doc_ids, scores, retrieval method, timing
- Track: retrieval_time, total_query_time, documents_found, citations_count
- Add metrics in ingest_pipeline.py: pages_processed, blocks_processed, pipeline_time

Router Improvements:
- Use optimized prompt builder in _handle_rag_query()
- Add graceful fallback: if RAG unavailable, use Memory only
- Log prompt token count, RAG usage, Memory usage
- Return detailed metadata (rag_used, memory_used, citations_count, metrics)

Evaluation Tools:
- Create tests/rag_eval.py for systematic quality testing
- Test fixed questions with expected doc_ids
- Save results to JSON and CSV
- Compare RAG Service vs Router results
- Track: citations, expected docs found, query times

Documentation:
- Create docs/RAG_METRICS_PLAN.md
- Plan for Prometheus metrics collection
- Grafana dashboard panels and alerts
- Implementation guide for metrics
2025-11-16 05:12:19 -08:00
Apple
9b86f9a694 feat: implement RAG Service MVP with PARSER + Memory integration
RAG Service Implementation:
- Create rag-service/ with full structure (config, document_store, embedding, pipelines)
- Document Store: PostgreSQL + pgvector via Haystack
- Embedding: BAAI/bge-m3 (multilingual, 1024 dim)
- Ingest Pipeline: Convert ParsedDocument to Haystack Documents, embed, index
- Query Pipeline: Retrieve documents, generate answers via DAGI Router
- FastAPI endpoints: /ingest, /query, /health

Tests:
- Unit tests for ingest and query pipelines
- E2E test with example parsed JSON
- Test fixtures with real PARSER output example

Router Integration:
- Add mode='rag_query' routing rule in router-config.yml
- Priority 7, uses local_qwen3_8b for RAG queries

Docker:
- Add rag-service to docker-compose.yml
- Configure dependencies (router, city-db)
- Add model cache volume

Documentation:
- Complete README with API examples
- Integration guides for PARSER and Router
2025-11-16 04:41:53 -08:00