Files

Apple 3de3c8cb36 feat: Add presence heartbeat for Matrix online status

- matrix-gateway: POST /internal/matrix/presence/online endpoint
- usePresenceHeartbeat hook with activity tracking
- Auto away after 5 min inactivity
- Offline on page close/visibility change
- Integrated in MatrixChatRoom component

2025-11-27 00:19:40 -08:00

11 KiB

Raw Blame History

PHASE 3 ROADMAP — Core Agent Services

After Phase 2 Agent Integration

Status: 📋 Planning → ✅ SPEC READY
Master Task: PHASE3_MASTER_TASK.md ⭐
Summary: PHASE3_READY.md
Priority: High
Estimated Time: 6-8 weeks
Dependencies: Phase 2 complete

🎯 Goal

Replace Phase 2 stubs with production-ready services:

Real LLM Proxy (multi-provider routing)
Real Agent Memory (RAG + vector DB)
Tool Registry (agent actions)
Agent Blueprint Management (CRUD + versioning)

📦 Phase 3 Components

1. LLM Proxy Service (2 weeks)

Purpose: Centralized LLM gateway with routing, rate limiting, cost tracking

Features:

Multi-provider support (OpenAI, Anthropic, DeepSeek, Local)
Model selection & routing
Rate limiting per agent/microDAO
Cost tracking & billing
Streaming support
Error handling & retries
Prompt sanitization

API:

POST /internal/llm/proxy
{
  "model": "gpt-4",
  "messages": [...],
  "stream": false,
  "max_tokens": 1000,
  "agent_id": "agent:sofia",
  "microdao_id": "microdao:daarion"
}

GET /internal/llm/models
→ List available models

GET /internal/llm/usage?agent_id=agent:sofia&period=30d
→ Usage statistics

Tech Stack:

FastAPI
httpx for provider calls
Redis for rate limiting
PostgreSQL for usage tracking

Files:

services/llm-proxy/
├── main.py
├── providers/
│   ├── openai.py
│   ├── anthropic.py
│   ├── deepseek.py
│   └── local.py
├── routing.py
├── rate_limiter.py
├── cost_tracker.py
├── models.py
└── config.yaml

2. Agent Memory Service (2 weeks)

Purpose: Persistent memory + RAG for agents

Features:

Short-term memory (recent context)
Mid-term memory (session/task memory)
Long-term memory (knowledge base)
Vector search (RAG)
Memory indexing (from channel history)
Memory pruning (for cost/performance)
Per-agent & per-microDAO isolation

API:

POST /internal/agent-memory/query
{
  "agent_id": "agent:sofia",
  "microdao_id": "microdao:daarion",
  "query": "What did we discuss about Phase 2?",
  "k": 5,
  "memory_types": ["mid_term", "long_term"]
}
→ Top-k relevant memories

POST /internal/agent-memory/store
{
  "agent_id": "agent:sofia",
  "microdao_id": "microdao:daarion",
  "memory_type": "mid_term",
  "content": {
    "user_message": "...",
    "agent_reply": "...",
    "context": {...}
  }
}
→ Store new memory

GET /internal/agent-memory/agents/{agent_id}/stats
→ Memory usage stats

Tech Stack:

FastAPI
PostgreSQL (structured memory)
Qdrant/Weaviate/ChromaDB (vector DB for RAG)
LangChain/LlamaIndex (RAG helpers)

Files:

services/agent-memory/
├── main.py
├── vector_store.py
├── memory_manager.py
├── rag_engine.py
├── indexer.py
├── models.py
└── config.yaml

3. Tool Registry Service (1.5 weeks)

Purpose: Centralized tool definitions & execution for agents

Features:

Tool catalog (list all available tools)
Tool execution (secure sandbox)
Tool permissions (agent → tool mapping)
Tool versioning
Execution logs & auditing

Tools (initial set):

create_task(channel_id, title, description)
create_followup(user_id, message_id, reminder_text, due_date)
search_docs(query)
create_project(microdao_id, name, description)
summarize_channel(channel_id, period)
send_notification(user_id, text)

API:

GET /internal/tools/catalog
→ List all tools

POST /internal/tools/execute
{
  "tool_name": "create_task",
  "agent_id": "agent:sofia",
  "microdao_id": "microdao:daarion",
  "parameters": {
    "channel_id": "...",
    "title": "Review Phase 2",
    "description": "..."
  }
}
→ Execute tool, return result

GET /internal/tools/agents/{agent_id}/permissions
→ List tools agent can use

Tech Stack:

FastAPI
Dynamic tool loading (plugins)
Sandboxed execution (Docker/gVisor)
PostgreSQL (tool definitions, permissions, logs)

Files:

services/tool-registry/
├── main.py
├── catalog.py
├── executor.py
├── sandbox.py
├── permissions.py
├── tools/
│   ├── task_tools.py
│   ├── project_tools.py
│   ├── notification_tools.py
│   └── ...
└── config.yaml

4. Agent Blueprint Service (1 week)

Purpose: CRUD + versioning for agent definitions

Features:

Create/Read/Update/Delete agent blueprints
Blueprint versioning
Blueprint templates (archetypes)
Blueprint validation
Blueprint inheritance

API:

GET /internal/agents/blueprints
→ List all blueprints

POST /internal/agents/blueprints
{
  "code": "sofia_prime_v2",
  "name": "Sofia Prime v2",
  "model": "gpt-4.1",
  "instructions": "...",
  "capabilities": {...},
  "tools": ["create_task", "summarize_channel"]
}
→ Create blueprint

GET /internal/agents/blueprints/{blueprint_id}
→ Get blueprint

GET /internal/agents/{agent_id}/blueprint
→ Get blueprint for specific agent instance

PUT /internal/agents/blueprints/{blueprint_id}
→ Update blueprint (creates new version)

Tech Stack:

FastAPI
PostgreSQL (blueprints, versions)
YAML/JSON schema validation

Files:

services/agents-service/
├── main.py
├── blueprints/
│   ├── crud.py
│   ├── versioning.py
│   ├── validation.py
│   └── templates.py
├── models.py
└── config.yaml

5. Integration Updates (1 week)

Update agent-runtime to use real services:

# Before (Phase 2):
blueprint = await load_agent_blueprint(agent_id)  # Mock
memory = await query_memory(...)  # Stub
llm_response = await generate_response(...)  # Stub

# After (Phase 3):
blueprint = await agents_service.get_blueprint(agent_id)  # Real
memory = await memory_service.query(...)  # Real RAG
llm_response = await llm_proxy.generate(...)  # Real multi-provider

# NEW: Tool usage
if llm_suggests_tool_use:
    tool_result = await tool_registry.execute(tool_name, parameters)
    # Add tool result to context, call LLM again

📅 Timeline

Week 1-2: LLM Proxy

Week 1: Core routing + OpenAI provider
Week 2: Multi-provider + rate limiting + cost tracking

Week 3-4: Agent Memory

Week 3: Vector store setup + basic RAG
Week 4: Memory management + indexing

Week 5-6: Tool Registry

Week 5: Catalog + basic tools (task, followup)
Week 6: Executor + permissions + sandboxing

Week 7: Agent Blueprint Service

CRUD + versioning + validation

Week 8: Integration & Testing

Update agent-runtime
E2E testing
Performance optimization
Documentation

🧪 Testing Strategy

LLM Proxy Testing:

Unit: Each provider (OpenAI, Anthropic, etc.)
Integration: Rate limiting, cost tracking
Load: 100 concurrent requests
Failover: Provider unavailable scenarios

Agent Memory Testing:

RAG accuracy: Retrieve relevant memories
Memory indexing: Auto-index from channels
Vector search performance: < 500ms
Memory pruning: Clean old memories

Tool Registry Testing:

Tool execution: All tools work
Permissions: Agent cannot use unauthorized tools
Sandboxing: Tools cannot escape sandbox
Audit logs: All executions logged

E2E Testing:

User asks agent to create task → Task created
User asks agent to summarize → Summary posted
Agent uses memory correctly in replies
Multiple providers work (switch between OpenAI/DeepSeek)

🎯 Acceptance Criteria

Phase 3 Complete When:

✅ LLM Proxy supports 3+ providers
✅ Agent Memory RAG works (< 500ms queries)
✅ Tool Registry has 5+ working tools
✅ Agent Blueprint CRUD works
✅ agent-runtime integrated with all services
✅ E2E: User → Agent (with tool use) → Result
✅ Cost tracking shows LLM usage per agent
✅ Memory usage shows per agent/microDAO
✅ All services pass health checks
✅ Documentation complete

📊 Success Metrics

Metric	Target
LLM response time	< 2s (non-streaming)
Memory query time	< 500ms
Tool execution time	< 3s
E2E agent reply	< 5s (with tool use)
LLM cost per request	< $0.05
System uptime	> 99.5%

🔗 Dependencies

External Services:

OpenAI API (for GPT-4)
Anthropic API (for Claude, optional)
DeepSeek API (optional)
Qdrant/Weaviate (for vector DB)

Internal Services:

PostgreSQL (for all structured data)
Redis (for rate limiting, caching)
NATS (for events)

💡 Optional Enhancements (Phase 3.5)

LLM Proxy:

Streaming SSE support
Local model support (Ollama, vLLM)
Prompt caching
A/B testing for prompts

Agent Memory:

Hierarchical memory (microDAO → team → agent)
Memory sharing between agents
Memory snapshots (save/restore agent state)
Memory analytics dashboard

Tool Registry:

Tool marketplace (community tools)
Tool composition (chain tools)
Visual tool builder
Tool usage analytics

🚀 Quick Start (After Phase 2)

To prepare for Phase 3:

# 1. Review Phase 3 roadmap
cat docs/tasks/PHASE3_ROADMAP.md

# 2. Set up external services
# - Get OpenAI API key
# - Set up Qdrant (Docker or cloud)
# - Set up Redis

# 3. Start with LLM Proxy
mkdir -p services/llm-proxy
cd services/llm-proxy
# Follow PHASE3_LLM_PROXY_TASK.md (to be created)

📝 Task Files (To Be Created)

After Phase 2 complete, create detailed tasks:

TASK_PHASE3_LLM_PROXY.md (2 weeks)
TASK_PHASE3_AGENT_MEMORY.md (2 weeks)
TASK_PHASE3_TOOL_REGISTRY.md (1.5 weeks)
TASK_PHASE3_BLUEPRINT_SERVICE.md (1 week)
TASK_PHASE3_INTEGRATION.md (1 week)

🎓 Architecture Evolution

Phase 1 (Complete):

User → Frontend → messaging-service → Matrix → Frontend

Phase 2 (Current):

User → Messenger → agent_filter → Router → agent-runtime (stub) → Reply

Phase 3 (Target):

User → Messenger
    ↓
agent_filter → Router → agent-runtime
    ↓
├─ LLM Proxy → [OpenAI | Anthropic | DeepSeek]
├─ Agent Memory → [Vector DB | PostgreSQL]
├─ Tool Registry → [Task | Project | Notification tools]
└─ Agent Blueprint → [Definitions | Versions]
    ↓
Reply with tool results

✅ Current Status

✅ Phase 1: Messenger Core (Complete)
📋 Phase 2: Agent Integration (In Progress)
📋 Phase 3: Core Services (This Roadmap)
🔜 Phase 4: Advanced Features (TBD)

Ready for Phase 3?

First complete Phase 2, then return to this roadmap for detailed implementation tasks.

Version: 1.0.0
Date: 2025-11-24
Status: Planning

11 KiB Raw Blame History

PHASE 3 ROADMAP — Core Agent Services

🎯 Goal

📦 Phase 3 Components

1. LLM Proxy Service (2 weeks)

2. Agent Memory Service (2 weeks)

3. Tool Registry Service (1.5 weeks)

4. Agent Blueprint Service (1 week)

5. Integration Updates (1 week)

📅 Timeline

Week 1-2: LLM Proxy

Week 3-4: Agent Memory

Week 5-6: Tool Registry

Week 7: Agent Blueprint Service

Week 8: Integration & Testing

🧪 Testing Strategy

LLM Proxy Testing:

Agent Memory Testing:

Tool Registry Testing:

E2E Testing:

🎯 Acceptance Criteria

Phase 3 Complete When:

📊 Success Metrics

🔗 Dependencies

External Services:

Internal Services:

💡 Optional Enhancements (Phase 3.5)

LLM Proxy:

Agent Memory:

Tool Registry:

🚀 Quick Start (After Phase 2)

To prepare for Phase 3:

📝 Task Files (To Be Created)

🎓 Architecture Evolution

Phase 1 (Complete):

Phase 2 (Current):

Phase 3 (Target):

✅ Current Status

11 KiB

Raw Blame History