Files

Ivan Tytar 3cacf67cf5 feat: Initial commit - DAGI Stack v0.2.0 (Phase 2 Complete)

- Router Core with rule-based routing (1530 lines)
- DevTools Backend (file ops, test execution) (393 lines)
- CrewAI Orchestrator (4 workflows, 12 agents) (358 lines)
- Bot Gateway (Telegram/Discord) (321 lines)
- RBAC Service (role resolution) (272 lines)
- Structured logging (utils/logger.py)
- Docker deployment (docker-compose.yml)
- Comprehensive documentation (57KB)
- Test suites (41 tests, 95% coverage)
- Phase 4 roadmap & ecosystem integration plans

Production-ready infrastructure for DAARION microDAOs.

2025-11-15 14:35:24 +01:00

12 KiB

Raw Blame History

Phase 4: Real-World Rollout & Optimization

Objective: Transform DAGI Stack from "deployment-ready" to "battle-tested production system"

Timeline: 2-4 weeks after first live deployment
Status: Planned
Prerequisites: Phase 3 complete, first live deployment successful

🎯 Phase 4 Goals

Production Stability: 99%+ uptime, predictable performance
Real-world Validation: 50+ dialogs processed, feedback collected
Performance Optimization: LLM response < 3s, error rate < 0.5%
Ecosystem Integration: Dify backend, MCP server ready

📊 Stage 1: First Live Deploy + Feedback Loop (Week 1)

1.1 Deploy to Production

Actions:

Configure .env with production credentials
Start services: docker-compose up -d
Run smoke tests: ./smoke.sh
Set up monitoring cron (every 5 min)
Configure log rotation (100MB max)

Success Criteria:

All 5 services healthy
Smoke tests passing
First dialog successful (< 5s response)
No critical errors in logs

Deliverables:

Deployment log file (/tmp/deploy-$(date).log)
First dialog screenshot/transcript
Baseline metrics file

1.2 Collect Real Dialogs (5-10 conversations)

Objective: Understand real user patterns and pain points

Data to Collect:

{
  "dialog_id": "001",
  "timestamp": "2024-11-15T12:00:00Z",
  "user_id": "tg:12345",
  "dao_id": "greenfood-dao",
  "prompts": [
    {
      "text": "Привіт! Що це за DAO?",
      "response_time_ms": 3200,
      "provider": "llm_local_qwen3_8b",
      "rbac_role": "member",
      "status": "success"
    }
  ],
  "insights": {
    "worked_well": "Fast response, context-aware",
    "issues": "None",
    "suggestions": "Add DAO statistics command"
  }
}

Actions:

Monitor logs for incoming requests
Document 5-10 real conversations
Identify common patterns (greetings, questions, commands)
Note slow/failed requests
Collect user feedback (if available)

Save to: /tmp/real-dialogs/dialog-001.json, etc.

1.3 Analyze Patterns

Questions to Answer:

What are the most common queries?
Which features are unused (DevTools, CrewAI)?
What response times are typical?
What errors occur in production?
What new workflows/tools are needed?

Analysis Template:

## Dialog Analysis Summary

### Common Queries
- [ ] Greetings (30%)
- [ ] DAO info requests (25%)
- [ ] Role/permission questions (20%)
- [ ] Proposal questions (15%)
- [ ] Other (10%)

### Performance
- Average response time: 3.5s
- P95 response time: 5.2s
- Error rate: 0.2%

### Unused Features
- DevTools: 0 requests
- CrewAI workflows: 1 request (onboarding)

### Improvement Ideas
1. Add /help command with common queries
2. Cache frequent responses (DAO info)
3. Add workflow triggers (e.g., "review my proposal")

Deliverable: docs/analysis/real-world-feedback-week1.md

1.4 Update SCENARIOS.md

Actions:

Add "Real World Scenarios" section
Document 3-5 actual production dialogs
Include response times, RBAC context, outcomes

Example Entry:

## Real World Scenario #1: DAO Info Request

**Date**: 2024-11-15  
**User**: tg:12345 (member role)  
**Query**: "Що це за DAO і які тут проєкти?"

**Flow:**
1. Gateway receives message (50ms)
2. Router fetches RBAC (80ms)
3. LLM generates response (3200ms)
4. Total: 3330ms

**Response Quality**: ✅ Accurate DAO description  
**Performance**: ✅ Within target (< 5s)  
**User Feedback**: Positive

**Insights:**
- Common query pattern identified
- Consider caching DAO info
- RBAC context useful for personalization

⚡ Stage 2: Performance & Reliability (Week 2)

2.1 LLM Performance Optimization

Problem: qwen3:8b can timeout on long prompts

Solutions:

Token Limits

# router-config.yml
llm_providers:
  - name: llm_local_qwen3_8b
    config:
      max_tokens: 200  # Reduced from default
      temperature: 0.7
      timeout_ms: 5000

Retry Policy

# providers/ollama_provider.py
@retry(max_attempts=2, delay=1.0)
async def call_llm(self, prompt: str):
    # LLM call with retry

Request Queue

# utils/rate_limiter.py
class RequestQueue:
    def __init__(self, max_concurrent=3):
        self.semaphore = asyncio.Semaphore(max_concurrent)

    async def enqueue(self, request):
        async with self.semaphore:
            return await process_request(request)

Actions:

Add max_tokens to all LLM providers
Implement retry logic (2 attempts, 1s delay)
Add request queue (max 3 concurrent)
Test with high load (10 concurrent requests)

Expected Improvement:

Response time P95: 5.2s → 4.0s
Timeout rate: 5% → 1%

2.2 Production Configuration Profile

Objective: Separate dev and prod configs

Create: config/profiles/prod.yml

version: "0.3.0"

environment: production
debug: false

llm_providers:
  - name: llm_prod_qwen3_8b
    type: ollama
    config:
      base_url: http://localhost:11434
      model: qwen3:8b
      max_tokens: 200
      temperature: 0.7
      timeout_ms: 5000

routing_rules:
  - name: "prod_chat"
    priority: 10
    conditions:
      mode: "chat"
    use_provider: "llm_prod_qwen3_8b"
    timeout_ms: 5000
    fallback_provider: "llm_remote_deepseek"

logging:
  level: INFO
  format: json
  rotation:
    max_size_mb: 100
    max_files: 10

Actions:

Create config/profiles/ directory
Add prod.yml, staging.yml, dev.yml
Update config_loader.py to support profiles
Add --profile flag to main_v2.py

Usage:

python main_v2.py --profile prod --port 9102

2.3 Auto-Restart & Watchdog

Systemd Service (Production)

# /etc/systemd/system/dagi-router.service
[Unit]
Description=DAGI Router Service
After=network.target

[Service]
Type=simple
User=dagi
WorkingDirectory=/opt/dagi-stack
Environment="PATH=/opt/dagi-stack/.venv/bin"
ExecStart=/opt/dagi-stack/.venv/bin/python main_v2.py --profile prod
Restart=always
RestartSec=10
StartLimitBurst=5
StartLimitIntervalSec=60

[Install]
WantedBy=multi-user.target

Docker Healthcheck Enhancement

# docker-compose.yml
services:
  router:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9102/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    restart: unless-stopped

Actions:

Create systemd service files for all components
Test auto-restart (kill -9 process)
Document restart behavior
Set up alerts for restart events

🌐 Stage 3: Ecosystem Integration (Week 3-4)

3.1 Open Core Model

Objective: Define what's open-source vs proprietary

Open Source (MIT License):

✅ Router core (routing_engine.py, config_loader.py)
✅ Provider interfaces (providers/base_provider.py)
✅ Base LLM providers (Ollama, OpenAI, DeepSeek)
✅ DevTools backend (file ops, test execution)
✅ RBAC service (role resolution)
✅ Gateway bot (Telegram/Discord webhooks)
✅ Utils (logging, validation)
✅ Documentation (all .md files)
✅ Test suites (smoke.sh, E2E tests)

Proprietary/Private (Optional):

⚠️ Custom CrewAI workflows (microDAO-specific)
⚠️ Advanced RBAC policies (DAO-specific rules)
⚠️ Custom LLM fine-tuning data
⚠️ Enterprise features (SSO, audit logs)

Actions:

Create docs/open-core-model.md
Add LICENSE file (MIT)
Update README with licensing info
Add CONTRIBUTING.md guide

Deliverable: docs/open-core-model.md

3.2 Dify Integration

Objective: Use DAGI Router as LLM backend for Dify

Architecture:

Dify UI → Dify Backend → DAGI Router (:9102) → LLM/DevTools/CrewAI

Integration Steps:

Router as LLM Provider

# Dify custom LLM provider
{
  "provider": "dagi-router",
  "base_url": "http://localhost:9102",
  "model": "dagi-stack",
  "api_key": "optional"
}

Adapter Endpoint

# router_app.py - Add Dify-compatible endpoint
@app.post("/v1/chat/completions")
async def dify_compatible(request: DifyRequest):
    # Convert Dify format → DAGI format
    dagi_request = convert_from_dify(request)
    result = await router.handle(dagi_request)
    # Convert DAGI format → Dify format
    return convert_to_dify(result)

Tools Integration

# Dify tools.yaml
tools:
  - name: devtools_read
    type: api
    url: http://localhost:9102/route
    method: POST
    params:
      mode: devtools
      metadata:
        tool: fs_read

Actions:

Create /v1/chat/completions endpoint
Add Dify format converters
Test with Dify UI
Document integration in docs/dify-integration.md

Deliverable: docs/dify-integration.md

3.3 MCP Server (Model Context Protocol)

Objective: Expose DAGI Stack as MCP-compatible server

MCP Tools:

{
  "tools": [
    {
      "name": "router_call",
      "description": "Route request to LLM/agent",
      "parameters": {
        "prompt": "string",
        "mode": "chat|crew|devtools",
        "metadata": "object"
      }
    },
    {
      "name": "devtools_task",
      "description": "Execute DevTools task",
      "parameters": {
        "tool": "fs_read|fs_write|run_tests",
        "params": "object"
      }
    },
    {
      "name": "workflow_run",
      "description": "Run CrewAI workflow",
      "parameters": {
        "workflow": "string",
        "inputs": "object"
      }
    },
    {
      "name": "microdao_query",
      "description": "Query microDAO RBAC/metadata",
      "parameters": {
        "dao_id": "string",
        "query_type": "roles|members|proposals"
      }
    }
  ]
}

Implementation:

# mcp-server/main.py
from mcp import Server, Tool

server = Server("dagi-stack")

@server.tool("router_call")
async def router_call(prompt: str, mode: str, metadata: dict):
    # Call DAGI Router
    pass

@server.tool("devtools_task")
async def devtools_task(tool: str, params: dict):
    # Call DevTools
    pass

# ... more tools

if __name__ == "__main__":
    server.run(port=9400)

Actions:

Create mcp-server/ directory
Implement MCP server (Python)
Define 4-5 core tools
Test with Claude Desktop / Cursor
Document in docs/mcp-integration.md

Deliverable: mcp-server/main.py, docs/mcp-integration.md

📈 Success Metrics

Metric	Target	Current	Status
Uptime	99%+	TBD	🟡
Response time (P95)	< 4s	TBD	🟡
Error rate	< 0.5%	TBD	🟡
Real dialogs processed	50+	0	🔴
Dify integration	Working	Not started	🔴
MCP server	Beta	Not started	🔴

🗂️ Deliverables

Week 1

Production deployment successful
5-10 real dialogs documented
docs/analysis/real-world-feedback-week1.md
Updated SCENARIOS.md with real-world examples

Week 2

LLM performance optimized (token limits, retry, queue)
config/profiles/prod.yml created
Systemd services configured
Auto-restart tested

Week 3

docs/open-core-model.md published
LICENSE file added (MIT)
CONTRIBUTING.md created

Week 4

docs/dify-integration.md published
/v1/chat/completions endpoint implemented
Dify integration tested
mcp-server/ skeleton created
docs/mcp-integration.md published

🔄 Phase 4 → Phase 5 Transition

Phase 5: Scale & Ecosystem Growth

After Phase 4 completion:

Horizontal scaling (load balancer + multiple Router instances)
Distributed tracing (Jaeger/Zipkin)
On-chain governance integration (proposals, voting)
Public open-source release (GitHub, docs site)
Community growth (Discord, contributor onboarding)

Phase 4 Start Date: TBD
Phase 4 Target Completion: 4 weeks after first deploy
Owner: DAARION Core Team
Version: 0.3.0 (planned)

12 KiB Raw Blame History Unescape Escape

Phase 4: Real-World Rollout & Optimization

🎯 Phase 4 Goals

📊 Stage 1: First Live Deploy + Feedback Loop (Week 1)

1.1 Deploy to Production

1.2 Collect Real Dialogs (5-10 conversations)

1.3 Analyze Patterns

1.4 Update SCENARIOS.md

⚡ Stage 2: Performance & Reliability (Week 2)

2.1 LLM Performance Optimization

2.2 Production Configuration Profile

2.3 Auto-Restart & Watchdog

🌐 Stage 3: Ecosystem Integration (Week 3-4)

3.1 Open Core Model

3.2 Dify Integration

3.3 MCP Server (Model Context Protocol)

📈 Success Metrics

🗂️ Deliverables

Week 1

Week 2

Week 3

Week 4

🔄 Phase 4 → Phase 5 Transition

12 KiB

Raw Blame History