- Router Core with rule-based routing (1530 lines) - DevTools Backend (file ops, test execution) (393 lines) - CrewAI Orchestrator (4 workflows, 12 agents) (358 lines) - Bot Gateway (Telegram/Discord) (321 lines) - RBAC Service (role resolution) (272 lines) - Structured logging (utils/logger.py) - Docker deployment (docker-compose.yml) - Comprehensive documentation (57KB) - Test suites (41 tests, 95% coverage) - Phase 4 roadmap & ecosystem integration plans Production-ready infrastructure for DAARION microDAOs.
12 KiB
Phase 4: Real-World Rollout & Optimization
Objective: Transform DAGI Stack from "deployment-ready" to "battle-tested production system"
Timeline: 2-4 weeks after first live deployment
Status: Planned
Prerequisites: Phase 3 complete, first live deployment successful
🎯 Phase 4 Goals
- Production Stability: 99%+ uptime, predictable performance
- Real-world Validation: 50+ dialogs processed, feedback collected
- Performance Optimization: LLM response < 3s, error rate < 0.5%
- Ecosystem Integration: Dify backend, MCP server ready
📊 Stage 1: First Live Deploy + Feedback Loop (Week 1)
1.1 Deploy to Production
Actions:
- Configure
.envwith production credentials - Start services:
docker-compose up -d - Run smoke tests:
./smoke.sh - Set up monitoring cron (every 5 min)
- Configure log rotation (100MB max)
Success Criteria:
- All 5 services healthy
- Smoke tests passing
- First dialog successful (< 5s response)
- No critical errors in logs
Deliverables:
- Deployment log file (
/tmp/deploy-$(date).log) - First dialog screenshot/transcript
- Baseline metrics file
1.2 Collect Real Dialogs (5-10 conversations)
Objective: Understand real user patterns and pain points
Data to Collect:
{
"dialog_id": "001",
"timestamp": "2024-11-15T12:00:00Z",
"user_id": "tg:12345",
"dao_id": "greenfood-dao",
"prompts": [
{
"text": "Привіт! Що це за DAO?",
"response_time_ms": 3200,
"provider": "llm_local_qwen3_8b",
"rbac_role": "member",
"status": "success"
}
],
"insights": {
"worked_well": "Fast response, context-aware",
"issues": "None",
"suggestions": "Add DAO statistics command"
}
}
Actions:
- Monitor logs for incoming requests
- Document 5-10 real conversations
- Identify common patterns (greetings, questions, commands)
- Note slow/failed requests
- Collect user feedback (if available)
Save to: /tmp/real-dialogs/dialog-001.json, etc.
1.3 Analyze Patterns
Questions to Answer:
- What are the most common queries?
- Which features are unused (DevTools, CrewAI)?
- What response times are typical?
- What errors occur in production?
- What new workflows/tools are needed?
Analysis Template:
## Dialog Analysis Summary
### Common Queries
- [ ] Greetings (30%)
- [ ] DAO info requests (25%)
- [ ] Role/permission questions (20%)
- [ ] Proposal questions (15%)
- [ ] Other (10%)
### Performance
- Average response time: 3.5s
- P95 response time: 5.2s
- Error rate: 0.2%
### Unused Features
- DevTools: 0 requests
- CrewAI workflows: 1 request (onboarding)
### Improvement Ideas
1. Add /help command with common queries
2. Cache frequent responses (DAO info)
3. Add workflow triggers (e.g., "review my proposal")
Deliverable: docs/analysis/real-world-feedback-week1.md
1.4 Update SCENARIOS.md
Actions:
- Add "Real World Scenarios" section
- Document 3-5 actual production dialogs
- Include response times, RBAC context, outcomes
Example Entry:
## Real World Scenario #1: DAO Info Request
**Date**: 2024-11-15
**User**: tg:12345 (member role)
**Query**: "Що це за DAO і які тут проєкти?"
**Flow:**
1. Gateway receives message (50ms)
2. Router fetches RBAC (80ms)
3. LLM generates response (3200ms)
4. Total: 3330ms
**Response Quality**: ✅ Accurate DAO description
**Performance**: ✅ Within target (< 5s)
**User Feedback**: Positive
**Insights:**
- Common query pattern identified
- Consider caching DAO info
- RBAC context useful for personalization
⚡ Stage 2: Performance & Reliability (Week 2)
2.1 LLM Performance Optimization
Problem: qwen3:8b can timeout on long prompts
Solutions:
-
Token Limits
# router-config.yml llm_providers: - name: llm_local_qwen3_8b config: max_tokens: 200 # Reduced from default temperature: 0.7 timeout_ms: 5000 -
Retry Policy
# providers/ollama_provider.py @retry(max_attempts=2, delay=1.0) async def call_llm(self, prompt: str): # LLM call with retry -
Request Queue
# utils/rate_limiter.py class RequestQueue: def __init__(self, max_concurrent=3): self.semaphore = asyncio.Semaphore(max_concurrent) async def enqueue(self, request): async with self.semaphore: return await process_request(request)
Actions:
- Add
max_tokensto all LLM providers - Implement retry logic (2 attempts, 1s delay)
- Add request queue (max 3 concurrent)
- Test with high load (10 concurrent requests)
Expected Improvement:
- Response time P95: 5.2s → 4.0s
- Timeout rate: 5% → 1%
2.2 Production Configuration Profile
Objective: Separate dev and prod configs
Create: config/profiles/prod.yml
version: "0.3.0"
environment: production
debug: false
llm_providers:
- name: llm_prod_qwen3_8b
type: ollama
config:
base_url: http://localhost:11434
model: qwen3:8b
max_tokens: 200
temperature: 0.7
timeout_ms: 5000
routing_rules:
- name: "prod_chat"
priority: 10
conditions:
mode: "chat"
use_provider: "llm_prod_qwen3_8b"
timeout_ms: 5000
fallback_provider: "llm_remote_deepseek"
logging:
level: INFO
format: json
rotation:
max_size_mb: 100
max_files: 10
Actions:
- Create
config/profiles/directory - Add
prod.yml,staging.yml,dev.yml - Update
config_loader.pyto support profiles - Add
--profileflag tomain_v2.py
Usage:
python main_v2.py --profile prod --port 9102
2.3 Auto-Restart & Watchdog
Systemd Service (Production)
# /etc/systemd/system/dagi-router.service
[Unit]
Description=DAGI Router Service
After=network.target
[Service]
Type=simple
User=dagi
WorkingDirectory=/opt/dagi-stack
Environment="PATH=/opt/dagi-stack/.venv/bin"
ExecStart=/opt/dagi-stack/.venv/bin/python main_v2.py --profile prod
Restart=always
RestartSec=10
StartLimitBurst=5
StartLimitIntervalSec=60
[Install]
WantedBy=multi-user.target
Docker Healthcheck Enhancement
# docker-compose.yml
services:
router:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9102/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
restart: unless-stopped
Actions:
- Create systemd service files for all components
- Test auto-restart (kill -9 process)
- Document restart behavior
- Set up alerts for restart events
🌐 Stage 3: Ecosystem Integration (Week 3-4)
3.1 Open Core Model
Objective: Define what's open-source vs proprietary
Open Source (MIT License):
- ✅ Router core (
routing_engine.py,config_loader.py) - ✅ Provider interfaces (
providers/base_provider.py) - ✅ Base LLM providers (Ollama, OpenAI, DeepSeek)
- ✅ DevTools backend (file ops, test execution)
- ✅ RBAC service (role resolution)
- ✅ Gateway bot (Telegram/Discord webhooks)
- ✅ Utils (logging, validation)
- ✅ Documentation (all
.mdfiles) - ✅ Test suites (
smoke.sh, E2E tests)
Proprietary/Private (Optional):
- ⚠️ Custom CrewAI workflows (microDAO-specific)
- ⚠️ Advanced RBAC policies (DAO-specific rules)
- ⚠️ Custom LLM fine-tuning data
- ⚠️ Enterprise features (SSO, audit logs)
Actions:
- Create
docs/open-core-model.md - Add LICENSE file (MIT)
- Update README with licensing info
- Add CONTRIBUTING.md guide
Deliverable: docs/open-core-model.md
3.2 Dify Integration
Objective: Use DAGI Router as LLM backend for Dify
Architecture:
Dify UI → Dify Backend → DAGI Router (:9102) → LLM/DevTools/CrewAI
Integration Steps:
-
Router as LLM Provider
# Dify custom LLM provider { "provider": "dagi-router", "base_url": "http://localhost:9102", "model": "dagi-stack", "api_key": "optional" } -
Adapter Endpoint
# router_app.py - Add Dify-compatible endpoint @app.post("/v1/chat/completions") async def dify_compatible(request: DifyRequest): # Convert Dify format → DAGI format dagi_request = convert_from_dify(request) result = await router.handle(dagi_request) # Convert DAGI format → Dify format return convert_to_dify(result) -
Tools Integration
# Dify tools.yaml tools: - name: devtools_read type: api url: http://localhost:9102/route method: POST params: mode: devtools metadata: tool: fs_read
Actions:
- Create
/v1/chat/completionsendpoint - Add Dify format converters
- Test with Dify UI
- Document integration in
docs/dify-integration.md
Deliverable: docs/dify-integration.md
3.3 MCP Server (Model Context Protocol)
Objective: Expose DAGI Stack as MCP-compatible server
MCP Tools:
{
"tools": [
{
"name": "router_call",
"description": "Route request to LLM/agent",
"parameters": {
"prompt": "string",
"mode": "chat|crew|devtools",
"metadata": "object"
}
},
{
"name": "devtools_task",
"description": "Execute DevTools task",
"parameters": {
"tool": "fs_read|fs_write|run_tests",
"params": "object"
}
},
{
"name": "workflow_run",
"description": "Run CrewAI workflow",
"parameters": {
"workflow": "string",
"inputs": "object"
}
},
{
"name": "microdao_query",
"description": "Query microDAO RBAC/metadata",
"parameters": {
"dao_id": "string",
"query_type": "roles|members|proposals"
}
}
]
}
Implementation:
# mcp-server/main.py
from mcp import Server, Tool
server = Server("dagi-stack")
@server.tool("router_call")
async def router_call(prompt: str, mode: str, metadata: dict):
# Call DAGI Router
pass
@server.tool("devtools_task")
async def devtools_task(tool: str, params: dict):
# Call DevTools
pass
# ... more tools
if __name__ == "__main__":
server.run(port=9400)
Actions:
- Create
mcp-server/directory - Implement MCP server (Python)
- Define 4-5 core tools
- Test with Claude Desktop / Cursor
- Document in
docs/mcp-integration.md
Deliverable: mcp-server/main.py, docs/mcp-integration.md
📈 Success Metrics
| Metric | Target | Current | Status |
|---|---|---|---|
| Uptime | 99%+ | TBD | 🟡 |
| Response time (P95) | < 4s | TBD | 🟡 |
| Error rate | < 0.5% | TBD | 🟡 |
| Real dialogs processed | 50+ | 0 | 🔴 |
| Dify integration | Working | Not started | 🔴 |
| MCP server | Beta | Not started | 🔴 |
🗂️ Deliverables
Week 1
- Production deployment successful
- 5-10 real dialogs documented
docs/analysis/real-world-feedback-week1.md- Updated
SCENARIOS.mdwith real-world examples
Week 2
- LLM performance optimized (token limits, retry, queue)
config/profiles/prod.ymlcreated- Systemd services configured
- Auto-restart tested
Week 3
docs/open-core-model.mdpublished- LICENSE file added (MIT)
- CONTRIBUTING.md created
Week 4
docs/dify-integration.mdpublished/v1/chat/completionsendpoint implemented- Dify integration tested
mcp-server/skeleton createddocs/mcp-integration.mdpublished
🔄 Phase 4 → Phase 5 Transition
Phase 5: Scale & Ecosystem Growth
After Phase 4 completion:
- Horizontal scaling (load balancer + multiple Router instances)
- Distributed tracing (Jaeger/Zipkin)
- On-chain governance integration (proposals, voting)
- Public open-source release (GitHub, docs site)
- Community growth (Discord, contributor onboarding)
Phase 4 Start Date: TBD
Phase 4 Target Completion: 4 weeks after first deploy
Owner: DAARION Core Team
Version: 0.3.0 (planned)