Commit Graph

500 Commits

Author SHA1 Message Date
Apple
382e661f1f feat: complete RAG pipeline integration (ingest + query + Memory)
Parser Service:
- Add /ocr/ingest endpoint (PARSER → RAG in one call)
- Add RAG_BASE_URL and RAG_TIMEOUT to config
- Add OcrIngestResponse schema
- Create file_converter utility for PDF/image → PNG bytes
- Endpoint accepts file, dao_id, doc_id, user_id
- Automatically parses with dots.ocr and sends to RAG Service

Router Integration:
- Add _handle_rag_query() method in RouterApp
- Combines Memory + RAG → LLM pipeline
- Get Memory context (facts, events, summaries)
- Query RAG Service for documents
- Build prompt with Memory + RAG documents
- Call LLM provider with combined context
- Return answer with citations

Clients:
- Create rag_client.py for Router (query RAG Service)
- Create memory_client.py for Router (get Memory context)

E2E Tests:
- Create e2e_rag_pipeline.sh script for full pipeline test
- Test ingest → query → router query flow
- Add E2E_RAG_README.md with usage examples

Docker:
- Add RAG_SERVICE_URL and MEMORY_SERVICE_URL to router environment
2025-11-16 05:02:14 -08:00
Apple
6d69f901f7 fix: add volumes section for rag-model-cache in docker-compose 2025-11-16 04:42:07 -08:00
Apple
9b86f9a694 feat: implement RAG Service MVP with PARSER + Memory integration
RAG Service Implementation:
- Create rag-service/ with full structure (config, document_store, embedding, pipelines)
- Document Store: PostgreSQL + pgvector via Haystack
- Embedding: BAAI/bge-m3 (multilingual, 1024 dim)
- Ingest Pipeline: Convert ParsedDocument to Haystack Documents, embed, index
- Query Pipeline: Retrieve documents, generate answers via DAGI Router
- FastAPI endpoints: /ingest, /query, /health

Tests:
- Unit tests for ingest and query pipelines
- E2E test with example parsed JSON
- Test fixtures with real PARSER output example

Router Integration:
- Add mode='rag_query' routing rule in router-config.yml
- Priority 7, uses local_qwen3_8b for RAG queries

Docker:
- Add rag-service to docker-compose.yml
- Configure dependencies (router, city-db)
- Add model cache volume

Documentation:
- Complete README with API examples
- Integration guides for PARSER and Router
2025-11-16 04:41:53 -08:00
Apple
d3c701f3ff feat: add qa_build mode, tests, and region mode support
Router Configuration:
- Add mode='qa_build' routing rule in router-config.yml
- Priority 8, uses local_qwen3_8b for Q&A generation

2-Stage Q&A Pipeline Tests:
- Create test_qa_pipeline.py with comprehensive tests
- Test prompt building, JSON parsing, router integration
- Mock DAGI Router responses for testing

Region Mode (Grounding OCR):
- Add region_bbox and region_page parameters to ParseRequest
- Support region mode in local_runtime with bbox in prompt
- Update endpoints to accept region parameters (x, y, width, height, page)
- Validate region parameters and filter pages for region mode
- Pass region_bbox through inference pipeline

Updates:
- Update local_runtime to support region_bbox in prompts
- Update inference.py to pass region_bbox to local_runtime
- Update endpoints.py to handle region mode parameters
2025-11-16 04:26:35 -08:00
Apple
be22752590 feat: integrate dots.ocr native prompt modes and 2-stage qa_pairs pipeline
Prompt Modes Integration:
- Create local_runtime.py with DOTS_PROMPT_MAP
- Map OutputMode to native dots.ocr prompt modes (prompt_layout_all_en, prompt_ocr, etc.)
- Support dict_promptmode_to_prompt from dots.ocr with fallback prompts
- Add layout_only and region modes to OutputMode enum

2-Stage Q&A Pipeline:
- Create qa_builder.py for 2-stage qa_pairs generation
- Stage 1: PARSER (dots.ocr) → raw JSON via prompt_layout_all_en
- Stage 2: LLM (DAGI Router) → Q&A pairs via mode=qa_build
- Update endpoints.py to use 2-stage pipeline for qa_pairs mode
- Add ROUTER_BASE_URL and ROUTER_TIMEOUT to config

Updates:
- Update inference.py to use local_runtime with native prompts
- Update ollama_client.py to use same prompt map
- Add PROMPT_MODES.md documentation
2025-11-16 04:24:03 -08:00
Apple
d474a085c3 docs: add RAG + Memory implementation status
Create RAG-MEMORY-STATUS.md with detailed analysis:
- What's implemented (Memory Store, partial RAG)
- What's missing (RAG Service, full integration)
- Current architecture vs target architecture
- Next steps and priorities

Status: ~40% implemented
- Memory: 100% 
- RAG: 20% ⚠️ (only PARSER ready)
- Integration: 30% ⚠️
2025-11-16 04:18:48 -08:00
Apple
49272b66e6 feat: add RAG converter utilities and update integration guide
RAG Converter:
- Create app/utils/rag_converter.py with conversion functions
- parsed_doc_to_haystack_docs() - convert ParsedDocument to Haystack format
- parsed_chunks_to_haystack_docs() - convert ParsedChunk list to Haystack
- validate_parsed_doc_for_rag() - validate required fields before conversion
- Automatic metadata extraction (dao_id, doc_id, page, block_type)
- Preserve optional fields (bbox, section, reading_order)

Integration Guide:
- Update with ready-to-use converter functions
- Add validation examples
- Complete workflow examples
2025-11-16 03:03:20 -08:00
Apple
7251e519d6 feat: enhance model output parser and add integration guide
Model Output Parser:
- Support multiple dots.ocr output formats (JSON, structured text, plain text)
- Normalize all formats to standard ParsedBlock structure
- Handle JSON with blocks/pages arrays
- Parse markdown-like structured text
- Fallback to plain text parsing
- Better error handling and logging

Schemas:
- Document must-have fields for RAG (doc_id, pages, metadata.dao_id)
- ParsedChunk must-have fields (text, metadata.dao_id, metadata.doc_id)
- Add detailed field descriptions for RAG integration

Integration Guide:
- Create INTEGRATION.md with complete integration guide
- Document dots.ocr output formats
- Show ParsedDocument → Haystack Documents conversion
- Provide DAGI Router integration examples
- RAG pipeline integration with filters
- Complete workflow examples
- RBAC integration recommendations
2025-11-16 03:02:42 -08:00
Apple
ca05c91799 feat: complete dots.ocr integration with deployment setup
Model Loader:
- Update model_loader.py with complete dots.ocr loading code
- Proper device detection (CUDA/CPU/MPS) with fallback
- Memory optimization (low_cpu_mem_usage)
- Better error handling and logging
- Support for local model paths and HF Hub

Docker:
- Multi-stage Dockerfile (CPU/CUDA builds)
- docker-compose.yml for parser-service
- .dockerignore for clean builds
- Model cache volume for persistence

Configuration:
- Support DOTS_OCR_MODEL_ID and DEVICE env vars (backward compatible)
- Better defaults and environment variable handling

Deployment:
- Add DEPLOYMENT.md with detailed instructions
- Local deployment (venv)
- Docker Compose deployment
- Ollama runtime setup
- Troubleshooting guide

Integration:
- Add parser-service to main docker-compose.yml
- Configure volumes and networks
- Health checks and dependencies
2025-11-16 03:00:01 -08:00
Apple
8713810d72 fix: remove async call from sync function 2025-11-16 02:56:45 -08:00
Apple
00f9102e50 feat: add Ollama runtime support and RAG implementation plan
Ollama Runtime:
- Add ollama_client.py for Ollama API integration
- Support for dots-ocr model via Ollama
- Add OLLAMA_BASE_URL configuration
- Update inference.py to support Ollama runtime (RUNTIME_TYPE=ollama)
- Update endpoints to handle async Ollama calls
- Alternative to local transformers model

RAG Implementation Plan:
- Create TODO-RAG.md with detailed Haystack integration plan
- Document Store setup (pgvector)
- Embedding model selection
- Ingest pipeline (PARSER → RAG)
- Query pipeline (RAG → LLM)
- Integration with DAGI Router
- Bot commands (/upload_doc, /ask_doc)
- Testing strategy

Now supports three runtime modes:
1. Local transformers (RUNTIME_TYPE=local)
2. Ollama (RUNTIME_TYPE=ollama)
3. Dummy (USE_DUMMY_PARSER=true)
2025-11-16 02:56:36 -08:00
Apple
d56ff3493d fix: remove duplicate except blocks in model_loader 2025-11-15 13:25:23 -08:00
Apple
8869a36486 fix: correct exception handling order in model_loader and update TODO
- Fix duplicate except blocks in model_loader.py
- Mark G.2.5 (tests) as completed
- Mark G.1.3 (dots.ocr integration) as completed
2025-11-15 13:25:15 -08:00
Apple
2a353040f6 feat: add tests and integrate dots.ocr model
G.2.5 - Tests:
- Add pytest test suite with fixtures
- test_preprocessing.py - PDF/image loading, normalization, validation
- test_postprocessing.py - chunks, QA pairs, markdown generation
- test_inference.py - dummy parser and inference functions
- test_api.py - API endpoint tests
- Add pytest.ini configuration

G.1.3 - dots.ocr Integration:
- Update model_loader.py with real model loading code
  - Support for AutoModelForVision2Seq and AutoProcessor
  - Device handling (CUDA/CPU/MPS) with fallback
  - Error handling with dummy fallback option
- Update inference.py with real model inference
  - Process images through model
  - Generate and decode outputs
  - Parse model output to blocks
- Add model_output_parser.py
  - Parse JSON or plain text model output
  - Convert to structured blocks
  - Layout detection support (placeholder)

Dependencies:
- Add pytest, pytest-asyncio, httpx for testing
2025-11-15 13:25:01 -08:00
Apple
62cb1d2108 docs: mark G.2.4 (pre/post-processing) as completed 2025-11-15 13:19:31 -08:00
Apple
344bef786d docs: update TODO-PARSER-RAG.md with completed tasks
- Mark G.2.3 (PDF/image support) as completed
- Mark G.2.4 (pre/post-processing) as completed
- Mark G.1.3 (configuration) as completed
- All preprocessing and postprocessing functions implemented
2025-11-15 13:19:21 -08:00
Apple
4befecc425 feat: implement PDF/image preprocessing, post-processing, and dots.ocr integration prep
G.2.3 - PDF/Image Support:
- Add preprocessing.py with PDF→images conversion (pdf2image)
- Add image loading and normalization
- Add file type detection and validation
- Support for PDF, PNG, JPEG, WebP, TIFF

G.2.4 - Pre/Post-processing:
- Add postprocessing.py with structured output builders
- build_chunks() - semantic chunks for RAG
- build_qa_pairs() - Q&A extraction
- build_markdown() - Markdown conversion
- Text normalization and chunking logic

G.1.3 - dots.ocr Integration Prep:
- Update model_loader.py with proper error handling
- Add USE_DUMMY_PARSER and ALLOW_DUMMY_FALLBACK flags
- Update inference.py to work with images list
- Add parse_document_from_images() function
- Ready for actual model integration

Configuration:
- Add PDF_DPI, IMAGE_MAX_SIZE, PAGE_RANGE settings
- Add parser mode flags (USE_DUMMY_PARSER, ALLOW_DUMMY_FALLBACK)

API Updates:
- Update endpoints to use new preprocessing pipeline
- Integrate post-processing for all output modes
- Remove temp file handling (work directly with bytes)
2025-11-15 13:19:07 -08:00
Apple
0f6cfe046f fix: add missing __init__.py files for parser-service modules 2025-11-15 13:15:16 -08:00
Apple
5e7cfc019e feat: create PARSER service skeleton with FastAPI
- Create parser-service/ with full structure
- Add FastAPI app with endpoints (/parse, /parse_qa, /parse_markdown, /parse_chunks)
- Add Pydantic schemas (ParsedDocument, ParsedBlock, ParsedChunk, etc.)
- Add runtime module with model_loader and inference (with dummy parser)
- Add configuration, Dockerfile, requirements.txt
- Update TODO-PARSER-RAG.md with completed tasks
- Ready for dots.ocr model integration
2025-11-15 13:15:08 -08:00
Apple
2fc1894b26 docs: add PARSER agent documentation and implementation plan
- Add formal PARSER agent description (dots.ocr-based)
- Add detailed TODO-PARSER-RAG.md with implementation tasks
- Update agents README to include PARSER
- PARSER = Document Ingestion & Structuring Agent for RAG
2025-11-15 13:09:58 -08:00
Apple
e0cb3ddbdb refactor: rewrite STT service to use qwen3_asr_toolkit Python API
- Replace Whisper subprocess calls with direct qwen3_asr_toolkit API
- Remove subprocess dependencies, use pure Python API
- Update to use DASHSCOPE_API_KEY instead of WHISPER_MODEL
- Cleaner code without CLI calls
- Better Ukrainian language recognition quality
2025-11-15 12:55:21 -08:00
Apple
65e33add81 feat: add STT service for voice message recognition
- Add STT service with Whisper support (faster-whisper, whisper CLI, OpenAI API)
- Update Gateway to handle Telegram voice/audio/video_note messages
- Add STT service to docker-compose.yml
- Gateway now converts voice → text → DAGI Router → text response
2025-11-15 12:43:41 -08:00
Apple
c78542c5ef feat: add dialog_summaries creation and fix memory integration
- Add create_dialog_summary() method to MemoryClient
- Fix syntax error in http_api.py (extra comma)
- Add CloudFlare tunnel setup instructions
- Gateway now logs conversations to Memory Service
2025-11-15 12:40:46 -08:00
Apple
a31e5dbf7e fix: synchronize all metadata fields to meta in schemas
- Fix UserFactUpsertRequest.metadata -> meta
- Fix DialogSummaryBase.metadata -> meta
- All schemas now use meta consistently
2025-11-15 12:31:01 -08:00
Apple
3698a0d2a1 refactor: remove all ForeignKey constraints for testing (variant A)
- Remove all FK constraints from models (users, teams, channels, agents)
- Keep fields as optional nullable String for testing DAARWIZZ
- Update SQL migration to remove all REFERENCES
- Fix metadata -> meta in migration
- Allows service to work without base tables for testing
2025-11-15 12:24:45 -08:00
Apple
7afcffd0bd fix: remove teams foreign key constraint from UserFact
- Remove FK constraint from UserFact.team_id (teams table may not exist)
- Update SQL migration to remove FK constraint
- team_id remains optional String field without FK
2025-11-15 12:01:06 -08:00
Apple
b1a80b8fed fix: rename last remaining metadata field to meta in AgentMemoryFactsVector
- Fix AgentMemoryFactsVector.metadata -> AgentMemoryFactsVector.meta
- All metadata fields now completely renamed to meta
2025-11-15 11:58:33 -08:00
Apple
3017f5f9b9 fix: rename remaining metadata fields to meta in models.py
- Fix DialogSummary.metadata -> DialogSummary.meta
- Fix AgentMemoryFactsVector.metadata -> AgentMemoryFactsVector.meta
- All metadata fields now renamed to meta
2025-11-15 11:58:26 -08:00
Apple
734b6ab850 fix: rename metadata field to meta (metadata is reserved in SQLAlchemy)
- Rename metadata to meta in all models (UserFact, DialogSummary, AgentMemoryFactsVector)
- Update schemas to use meta instead of metadata
- Update SQL migration to use meta column name
- Fixes SQLAlchemy reserved name conflict
2025-11-15 11:54:39 -08:00
Apple
f7c0a0fc08 fix: move CheckConstraint to __table_args__ in AgentMemoryEvent model
- Fix syntax error on line 115-116
- Move CheckConstraint from Column parameters to __table_args__
- Add proper constraint names
2025-11-15 11:49:34 -08:00
Apple
d50765f2ff feat: add memory-service to docker-compose.yml
- Add memory-service as a service (not under networks:)
- Create Dockerfile for memory-service
- Configure depends_on city-db with healthcheck condition
- Set DATABASE_URL to connect to city-db
2025-11-15 11:40:07 -08:00
Apple
802e092e5b fix: move city-db service under services: section in docker-compose.yml 2025-11-15 11:35:04 -08:00
Apple
7aa0745877 refactor: reorganize memory-service into app/ directory structure
- Move models.py, schemas.py, crud.py, main.py to app/
- Update imports to use app.* prefix
- Update README with new structure
- Fix uvicorn run command for new structure
2025-11-15 10:14:26 -08:00
Apple
6d5d83c347 Merge feature/memory-service into main 2025-11-15 10:12:59 -08:00
Apple
9e99c3afe2 feat: add Memory Service for DAARWIZZ 2025-11-15 10:12:37 -08:00
Ivan Tytar
7b360fc360 fix: Gateway response extraction and GPU optimization
- Fixed Gateway to extract response from data.text field
- GPU working: RTX 4000 Ada, response time 7-10s (was 30-40s)
- DAARWIZZ now responds correctly with full personality
- Started Memory Service structure
2025-11-15 18:55:09 +01:00
Apple
a54a7b078c feat: add Console UI for MicroDAO management
- Create ConsolePage with navigation
- Add WalletInfo component (balance display and access checks)
- Add CreateMicroDaoForm (with balance validation)
- Add MicroDaoList component (display teams/MicroDAO)
- Add InviteMemberForm (with balance checks for admin/member)
- Add wallet API client
- Update teams API with inviteMember function
- Add /console route to App.tsx
2025-11-15 09:00:59 -08:00
Apple
582ab75b03 feat: add MicroDAO balance checks and DAARION.city integration
- Update Wallet Service: balance checks (1 DAARION for create, 0.01 for usage)
- Update DAOFactory Service: use new balance checks
- Add DB migration: teams type field and city_links table
- Add DAARION.city seed data
- Create teams API routes with balance validation
- Add DAARION.city remote repository
- Add sync scripts and documentation
2025-11-15 08:56:14 -08:00
Ivan Tytar
03d3d6ecc4 fix: increase LLM timeout 30s→60s, fix Gateway request format, add Ollama optimization guide
- Fixed Gateway: 'prompt' → 'message' field name
- Increased LLM provider timeout from 30s to 60s
- Added OLLAMA-OPTIMIZATION.md with performance tips
- DAARWIZZ now responds (slowly but works)
2025-11-15 17:46:35 +01:00
Ivan Tytar
36770c5c92 feat: DAARWIZZ v3 - production persona with full profile and system prompt
- Updated gateway-bot/daarwizz_prompt.txt with v3 system prompt
- Created docs/daarwizz/PROFILE.md with complete agent profile
- Defines DAARWIZZ as digital mayor and MoE coordinator
- Specifies communication style, roles, security policies
- Integration with full DAGI Stack (Router, DevTools, CrewAI, RBAC)
- Knowledge base references to official DAARION.city docs
2025-11-15 17:02:38 +01:00
Ivan Tytar
26fa3eee0a docs: Add DAARWIZZ deployment guide with webhook setup instructions 2025-11-15 16:00:39 +01:00
Ivan Tytar
df0055bb03 fix: Docker deployment - fix Dockerfiles, registry.py f-strings, Gateway app structure 2025-11-15 15:59:42 +01:00
Ivan Tytar
ff01021f96 docs: Add DAARWIZZ bot Quick Start guide
@DAARWIZZBot is now live!

Bot Information:
- Username: @DAARWIZZBot
- Bot ID: 8323412397
- Link: https://t.me/DAARWIZZBot
- Token: Stored in .env

Quick Start Guide includes:
- 5-step deployment process
- Webhook setup (ngrok or domain)
- Testing instructions
- Monitoring and troubleshooting
- Bot configuration via BotFather
- Phase 4 enhancement roadmap

Ready to test first dialog!
2025-11-15 15:38:22 +01:00
Ivan Tytar
8b523977c7 docs: Add comprehensive DAARWIZZ documentation
Complete guide for DAARWIZZ AI agent:
- Personality and behavior guidelines
- Technical implementation details
- Message flow diagrams
- Example interactions (Ukrainian)
- Testing instructions
- Customization guide
- Future enhancements roadmap

Includes curl examples, Docker config, and monitoring metrics.
2025-11-15 15:33:04 +01:00
Ivan Tytar
be95bbad9c feat: Add DAARWIZZ agent with personality
DAARWIZZ - Official AI agent for DAARION.city ecosystem

Changes:
- gateway-bot/daarwizz_prompt.txt: System prompt defining DAARWIZZ personality
- gateway-bot/http_api.py: Load and inject DAARWIZZ context into Router requests
- gateway-bot/Dockerfile: Copy DAARWIZZ prompt file to container
- providers/llm_provider.py: Support context.system_prompt from Gateway

Features:
- Telegram webhook sends agent='daarwizz' to Router
- System prompt loaded from file (customizable)
- LLM receives full DAARWIZZ context + RBAC
- Discord support included

Usage:
- User messages DAARWIZZ in Telegram
- Gateway enriches with system prompt + RBAC
- Router routes to LLM with full context
- DAARWIZZ responds with DAO-aware answers

Next: Set TELEGRAM_BOT_TOKEN and test first dialog
2025-11-15 15:31:58 +01:00
Ivan Tytar
244c6171a8 docs: Add repository index and workflow guide
- Complete repository structure overview
- Cursor + GitHub + Warp.dev workflow documentation
- Development cycle diagram
- Quick actions reference
- Key files listing

Helps developers navigate the codebase and understand the sync flow.
2025-11-15 14:47:46 +01:00
Ivan Tytar
7bacae6a89 Merge: Integrate DAGI Stack v0.2.0 with existing repository (secrets fixed) 2025-11-15 14:35:40 +01:00
Ivan Tytar
3cacf67cf5 feat: Initial commit - DAGI Stack v0.2.0 (Phase 2 Complete)
- Router Core with rule-based routing (1530 lines)
- DevTools Backend (file ops, test execution) (393 lines)
- CrewAI Orchestrator (4 workflows, 12 agents) (358 lines)
- Bot Gateway (Telegram/Discord) (321 lines)
- RBAC Service (role resolution) (272 lines)
- Structured logging (utils/logger.py)
- Docker deployment (docker-compose.yml)
- Comprehensive documentation (57KB)
- Test suites (41 tests, 95% coverage)
- Phase 4 roadmap & ecosystem integration plans

Production-ready infrastructure for DAARION microDAOs.
2025-11-15 14:35:24 +01:00
Apple
c552199eed chore: organize documentation structure for monorepo
- Create /docs structure (microdao, daarion, agents)
- Organize 61 cursor technical docs
- Add README files for each category
- Copy key documents to public categories
- Add GitHub setup instructions and scripts
2025-11-15 04:08:35 -08:00
Apple
5520665600 Initial commit: MVP structure + Cursor documentation + Onboarding components 2025-11-13 06:12:20 -08:00