Files
microdao-daarion/PHASE-4-ROADMAP.md
Ivan Tytar 3cacf67cf5 feat: Initial commit - DAGI Stack v0.2.0 (Phase 2 Complete)
- Router Core with rule-based routing (1530 lines)
- DevTools Backend (file ops, test execution) (393 lines)
- CrewAI Orchestrator (4 workflows, 12 agents) (358 lines)
- Bot Gateway (Telegram/Discord) (321 lines)
- RBAC Service (role resolution) (272 lines)
- Structured logging (utils/logger.py)
- Docker deployment (docker-compose.yml)
- Comprehensive documentation (57KB)
- Test suites (41 tests, 95% coverage)
- Phase 4 roadmap & ecosystem integration plans

Production-ready infrastructure for DAARION microDAOs.
2025-11-15 14:35:24 +01:00

12 KiB
Raw Blame History

Phase 4: Real-World Rollout & Optimization

Objective: Transform DAGI Stack from "deployment-ready" to "battle-tested production system"

Timeline: 2-4 weeks after first live deployment
Status: Planned
Prerequisites: Phase 3 complete, first live deployment successful


🎯 Phase 4 Goals

  1. Production Stability: 99%+ uptime, predictable performance
  2. Real-world Validation: 50+ dialogs processed, feedback collected
  3. Performance Optimization: LLM response < 3s, error rate < 0.5%
  4. Ecosystem Integration: Dify backend, MCP server ready

📊 Stage 1: First Live Deploy + Feedback Loop (Week 1)

1.1 Deploy to Production

Actions:

  • Configure .env with production credentials
  • Start services: docker-compose up -d
  • Run smoke tests: ./smoke.sh
  • Set up monitoring cron (every 5 min)
  • Configure log rotation (100MB max)

Success Criteria:

  • All 5 services healthy
  • Smoke tests passing
  • First dialog successful (< 5s response)
  • No critical errors in logs

Deliverables:

  • Deployment log file (/tmp/deploy-$(date).log)
  • First dialog screenshot/transcript
  • Baseline metrics file

1.2 Collect Real Dialogs (5-10 conversations)

Objective: Understand real user patterns and pain points

Data to Collect:

{
  "dialog_id": "001",
  "timestamp": "2024-11-15T12:00:00Z",
  "user_id": "tg:12345",
  "dao_id": "greenfood-dao",
  "prompts": [
    {
      "text": "Привіт! Що це за DAO?",
      "response_time_ms": 3200,
      "provider": "llm_local_qwen3_8b",
      "rbac_role": "member",
      "status": "success"
    }
  ],
  "insights": {
    "worked_well": "Fast response, context-aware",
    "issues": "None",
    "suggestions": "Add DAO statistics command"
  }
}

Actions:

  • Monitor logs for incoming requests
  • Document 5-10 real conversations
  • Identify common patterns (greetings, questions, commands)
  • Note slow/failed requests
  • Collect user feedback (if available)

Save to: /tmp/real-dialogs/dialog-001.json, etc.


1.3 Analyze Patterns

Questions to Answer:

  1. What are the most common queries?
  2. Which features are unused (DevTools, CrewAI)?
  3. What response times are typical?
  4. What errors occur in production?
  5. What new workflows/tools are needed?

Analysis Template:

## Dialog Analysis Summary

### Common Queries
- [ ] Greetings (30%)
- [ ] DAO info requests (25%)
- [ ] Role/permission questions (20%)
- [ ] Proposal questions (15%)
- [ ] Other (10%)

### Performance
- Average response time: 3.5s
- P95 response time: 5.2s
- Error rate: 0.2%

### Unused Features
- DevTools: 0 requests
- CrewAI workflows: 1 request (onboarding)

### Improvement Ideas
1. Add /help command with common queries
2. Cache frequent responses (DAO info)
3. Add workflow triggers (e.g., "review my proposal")

Deliverable: docs/analysis/real-world-feedback-week1.md


1.4 Update SCENARIOS.md

Actions:

  • Add "Real World Scenarios" section
  • Document 3-5 actual production dialogs
  • Include response times, RBAC context, outcomes

Example Entry:

## Real World Scenario #1: DAO Info Request

**Date**: 2024-11-15  
**User**: tg:12345 (member role)  
**Query**: "Що це за DAO і які тут проєкти?"

**Flow:**
1. Gateway receives message (50ms)
2. Router fetches RBAC (80ms)
3. LLM generates response (3200ms)
4. Total: 3330ms

**Response Quality**: ✅ Accurate DAO description  
**Performance**: ✅ Within target (< 5s)  
**User Feedback**: Positive

**Insights:**
- Common query pattern identified
- Consider caching DAO info
- RBAC context useful for personalization

Stage 2: Performance & Reliability (Week 2)

2.1 LLM Performance Optimization

Problem: qwen3:8b can timeout on long prompts

Solutions:

  1. Token Limits

    # router-config.yml
    llm_providers:
      - name: llm_local_qwen3_8b
        config:
          max_tokens: 200  # Reduced from default
          temperature: 0.7
          timeout_ms: 5000
    
  2. Retry Policy

    # providers/ollama_provider.py
    @retry(max_attempts=2, delay=1.0)
    async def call_llm(self, prompt: str):
        # LLM call with retry
    
  3. Request Queue

    # utils/rate_limiter.py
    class RequestQueue:
        def __init__(self, max_concurrent=3):
            self.semaphore = asyncio.Semaphore(max_concurrent)
    
        async def enqueue(self, request):
            async with self.semaphore:
                return await process_request(request)
    

Actions:

  • Add max_tokens to all LLM providers
  • Implement retry logic (2 attempts, 1s delay)
  • Add request queue (max 3 concurrent)
  • Test with high load (10 concurrent requests)

Expected Improvement:

  • Response time P95: 5.2s → 4.0s
  • Timeout rate: 5% → 1%

2.2 Production Configuration Profile

Objective: Separate dev and prod configs

Create: config/profiles/prod.yml

version: "0.3.0"

environment: production
debug: false

llm_providers:
  - name: llm_prod_qwen3_8b
    type: ollama
    config:
      base_url: http://localhost:11434
      model: qwen3:8b
      max_tokens: 200
      temperature: 0.7
      timeout_ms: 5000

routing_rules:
  - name: "prod_chat"
    priority: 10
    conditions:
      mode: "chat"
    use_provider: "llm_prod_qwen3_8b"
    timeout_ms: 5000
    fallback_provider: "llm_remote_deepseek"

logging:
  level: INFO
  format: json
  rotation:
    max_size_mb: 100
    max_files: 10

Actions:

  • Create config/profiles/ directory
  • Add prod.yml, staging.yml, dev.yml
  • Update config_loader.py to support profiles
  • Add --profile flag to main_v2.py

Usage:

python main_v2.py --profile prod --port 9102

2.3 Auto-Restart & Watchdog

Systemd Service (Production)

# /etc/systemd/system/dagi-router.service
[Unit]
Description=DAGI Router Service
After=network.target

[Service]
Type=simple
User=dagi
WorkingDirectory=/opt/dagi-stack
Environment="PATH=/opt/dagi-stack/.venv/bin"
ExecStart=/opt/dagi-stack/.venv/bin/python main_v2.py --profile prod
Restart=always
RestartSec=10
StartLimitBurst=5
StartLimitIntervalSec=60

[Install]
WantedBy=multi-user.target

Docker Healthcheck Enhancement

# docker-compose.yml
services:
  router:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9102/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    restart: unless-stopped

Actions:

  • Create systemd service files for all components
  • Test auto-restart (kill -9 process)
  • Document restart behavior
  • Set up alerts for restart events

🌐 Stage 3: Ecosystem Integration (Week 3-4)

3.1 Open Core Model

Objective: Define what's open-source vs proprietary

Open Source (MIT License):

  • Router core (routing_engine.py, config_loader.py)
  • Provider interfaces (providers/base_provider.py)
  • Base LLM providers (Ollama, OpenAI, DeepSeek)
  • DevTools backend (file ops, test execution)
  • RBAC service (role resolution)
  • Gateway bot (Telegram/Discord webhooks)
  • Utils (logging, validation)
  • Documentation (all .md files)
  • Test suites (smoke.sh, E2E tests)

Proprietary/Private (Optional):

  • ⚠️ Custom CrewAI workflows (microDAO-specific)
  • ⚠️ Advanced RBAC policies (DAO-specific rules)
  • ⚠️ Custom LLM fine-tuning data
  • ⚠️ Enterprise features (SSO, audit logs)

Actions:

  • Create docs/open-core-model.md
  • Add LICENSE file (MIT)
  • Update README with licensing info
  • Add CONTRIBUTING.md guide

Deliverable: docs/open-core-model.md


3.2 Dify Integration

Objective: Use DAGI Router as LLM backend for Dify

Architecture:

Dify UI → Dify Backend → DAGI Router (:9102) → LLM/DevTools/CrewAI

Integration Steps:

  1. Router as LLM Provider

    # Dify custom LLM provider
    {
      "provider": "dagi-router",
      "base_url": "http://localhost:9102",
      "model": "dagi-stack",
      "api_key": "optional"
    }
    
  2. Adapter Endpoint

    # router_app.py - Add Dify-compatible endpoint
    @app.post("/v1/chat/completions")
    async def dify_compatible(request: DifyRequest):
        # Convert Dify format → DAGI format
        dagi_request = convert_from_dify(request)
        result = await router.handle(dagi_request)
        # Convert DAGI format → Dify format
        return convert_to_dify(result)
    
  3. Tools Integration

    # Dify tools.yaml
    tools:
      - name: devtools_read
        type: api
        url: http://localhost:9102/route
        method: POST
        params:
          mode: devtools
          metadata:
            tool: fs_read
    

Actions:

  • Create /v1/chat/completions endpoint
  • Add Dify format converters
  • Test with Dify UI
  • Document integration in docs/dify-integration.md

Deliverable: docs/dify-integration.md


3.3 MCP Server (Model Context Protocol)

Objective: Expose DAGI Stack as MCP-compatible server

MCP Tools:

{
  "tools": [
    {
      "name": "router_call",
      "description": "Route request to LLM/agent",
      "parameters": {
        "prompt": "string",
        "mode": "chat|crew|devtools",
        "metadata": "object"
      }
    },
    {
      "name": "devtools_task",
      "description": "Execute DevTools task",
      "parameters": {
        "tool": "fs_read|fs_write|run_tests",
        "params": "object"
      }
    },
    {
      "name": "workflow_run",
      "description": "Run CrewAI workflow",
      "parameters": {
        "workflow": "string",
        "inputs": "object"
      }
    },
    {
      "name": "microdao_query",
      "description": "Query microDAO RBAC/metadata",
      "parameters": {
        "dao_id": "string",
        "query_type": "roles|members|proposals"
      }
    }
  ]
}

Implementation:

# mcp-server/main.py
from mcp import Server, Tool

server = Server("dagi-stack")

@server.tool("router_call")
async def router_call(prompt: str, mode: str, metadata: dict):
    # Call DAGI Router
    pass

@server.tool("devtools_task")
async def devtools_task(tool: str, params: dict):
    # Call DevTools
    pass

# ... more tools

if __name__ == "__main__":
    server.run(port=9400)

Actions:

  • Create mcp-server/ directory
  • Implement MCP server (Python)
  • Define 4-5 core tools
  • Test with Claude Desktop / Cursor
  • Document in docs/mcp-integration.md

Deliverable: mcp-server/main.py, docs/mcp-integration.md


📈 Success Metrics

Metric Target Current Status
Uptime 99%+ TBD 🟡
Response time (P95) < 4s TBD 🟡
Error rate < 0.5% TBD 🟡
Real dialogs processed 50+ 0 🔴
Dify integration Working Not started 🔴
MCP server Beta Not started 🔴

🗂️ Deliverables

Week 1

  • Production deployment successful
  • 5-10 real dialogs documented
  • docs/analysis/real-world-feedback-week1.md
  • Updated SCENARIOS.md with real-world examples

Week 2

  • LLM performance optimized (token limits, retry, queue)
  • config/profiles/prod.yml created
  • Systemd services configured
  • Auto-restart tested

Week 3

  • docs/open-core-model.md published
  • LICENSE file added (MIT)
  • CONTRIBUTING.md created

Week 4

  • docs/dify-integration.md published
  • /v1/chat/completions endpoint implemented
  • Dify integration tested
  • mcp-server/ skeleton created
  • docs/mcp-integration.md published

🔄 Phase 4 → Phase 5 Transition

Phase 5: Scale & Ecosystem Growth

After Phase 4 completion:

  1. Horizontal scaling (load balancer + multiple Router instances)
  2. Distributed tracing (Jaeger/Zipkin)
  3. On-chain governance integration (proposals, voting)
  4. Public open-source release (GitHub, docs site)
  5. Community growth (Discord, contributor onboarding)

Phase 4 Start Date: TBD
Phase 4 Target Completion: 4 weeks after first deploy
Owner: DAARION Core Team
Version: 0.3.0 (planned)