Some checks failed
Build and Deploy Docs / build-and-deploy (push) Has been cancelled
- Created logs/ structure (sessions, operations, incidents) - Added session-start/log/end scripts - Installed Git hooks for auto-logging commits/pushes - Added shell integration for zsh - Created CHANGELOG.md - Documented today's session (2026-01-10)
416 lines
8.8 KiB
Markdown
416 lines
8.8 KiB
Markdown
# 🚀 PHASE 3 READY — LLM Proxy + Memory + Tools
|
||
|
||
**Status:** 📋 Ready to implement
|
||
**Dependencies:** Phase 2 complete ✅
|
||
**Estimated Time:** 6-8 weeks
|
||
**Priority:** High
|
||
|
||
---
|
||
|
||
## 🎯 Goal
|
||
|
||
Зробити агентів DAARION по-справжньому розумними:
|
||
- **LLM Proxy** — єдина точка для всіх LLM запитів (OpenAI, DeepSeek, Local)
|
||
- **Memory Orchestrator** — єдиний API для short/mid/long-term памʼяті
|
||
- **Toolcore** — реєстр інструментів + безпечне виконання
|
||
|
||
**Phase 3 = Infrastructure for Agent Intelligence**
|
||
|
||
---
|
||
|
||
## 📦 What Will Be Built
|
||
|
||
### 1. LLM Proxy Service
|
||
**Port:** 7007
|
||
**Purpose:** Unified LLM gateway
|
||
|
||
**Features:**
|
||
- ✅ Multi-provider support (OpenAI, DeepSeek, Local)
|
||
- ✅ Model routing (logical → physical models)
|
||
- ✅ Usage logging (tokens, latency per agent)
|
||
- ✅ Rate limiting per agent
|
||
- ✅ Cost tracking hooks
|
||
|
||
**API:**
|
||
```http
|
||
POST /internal/llm/proxy
|
||
{
|
||
"model": "gpt-4.1-mini",
|
||
"messages": [...],
|
||
"metadata": { "agent_id": "...", "microdao_id": "..." }
|
||
}
|
||
```
|
||
|
||
**Deliverables:** 10 files
|
||
- `main.py`, `models.py`, `router.py`
|
||
- `providers/` (OpenAI, DeepSeek, Local)
|
||
- `config.yaml`, `Dockerfile`, `README.md`
|
||
|
||
---
|
||
|
||
### 2. Memory Orchestrator Service
|
||
**Port:** 7008
|
||
**Purpose:** Unified memory API
|
||
|
||
**Features:**
|
||
- ✅ Short-term memory (channel context)
|
||
- ✅ Mid-term memory (agent RAG)
|
||
- ✅ Long-term memory (knowledge base)
|
||
- ✅ Vector search (embeddings)
|
||
- ✅ Memory indexing pipeline
|
||
|
||
**API:**
|
||
```http
|
||
POST /internal/agent-memory/query
|
||
{
|
||
"agent_id": "agent:sofia",
|
||
"microdao_id": "microdao:7",
|
||
"query": "What were recent changes?",
|
||
"limit": 5
|
||
}
|
||
|
||
POST /internal/agent-memory/store
|
||
{
|
||
"agent_id": "...",
|
||
"content": { "user_message": "...", "agent_reply": "..." }
|
||
}
|
||
```
|
||
|
||
**Deliverables:** 9 files
|
||
- `main.py`, `models.py`, `router.py`
|
||
- `backends/` (PostgreSQL, Vector Store, KB)
|
||
- `embedding_client.py`, `config.yaml`, `README.md`
|
||
|
||
---
|
||
|
||
### 3. Toolcore Service
|
||
**Port:** 7009
|
||
**Purpose:** Tool registry + execution
|
||
|
||
**Features:**
|
||
- ✅ Tool registry (config-based → DB-backed later)
|
||
- ✅ Permission checks (agent → tool mapping)
|
||
- ✅ HTTP executor (call external services)
|
||
- ✅ Python executor (optional, for internal functions)
|
||
- ✅ Error handling + timeouts
|
||
|
||
**API:**
|
||
```http
|
||
GET /internal/tools
|
||
→ List available tools
|
||
|
||
POST /internal/tools/call
|
||
{
|
||
"tool_id": "projects.list",
|
||
"agent_id": "agent:sofia",
|
||
"args": { "microdao_id": "microdao:7" }
|
||
}
|
||
```
|
||
|
||
**Deliverables:** 8 files
|
||
- `main.py`, `models.py`, `registry.py`
|
||
- `executors/` (HTTP, Python)
|
||
- `config.yaml`, `Dockerfile`, `README.md`
|
||
|
||
---
|
||
|
||
## 🔄 Updated Architecture
|
||
|
||
### Before (Phase 2):
|
||
```
|
||
agent-runtime:
|
||
- Mock LLM responses
|
||
- Optional memory
|
||
- No tools
|
||
```
|
||
|
||
### After (Phase 3):
|
||
```
|
||
agent-runtime:
|
||
↓
|
||
├─ LLM Proxy → [OpenAI | DeepSeek | Local]
|
||
├─ Memory Orchestrator → [Vector DB | PostgreSQL]
|
||
└─ Toolcore → [projects.list | task.create | ...]
|
||
```
|
||
|
||
---
|
||
|
||
## 🎯 Acceptance Criteria
|
||
|
||
### LLM Proxy:
|
||
- ✅ 2+ providers working (e.g., OpenAI + Local stub)
|
||
- ✅ Model routing from config
|
||
- ✅ Usage logging per agent
|
||
- ✅ Health checks pass
|
||
|
||
### Memory Orchestrator:
|
||
- ✅ Query returns relevant memories
|
||
- ✅ Store saves new memories
|
||
- ✅ Vector search works (simple cosine)
|
||
- ✅ agent-runtime integration
|
||
|
||
### Toolcore:
|
||
- ✅ Tool registry loaded from config
|
||
- ✅ 1+ tool working (e.g., projects.list)
|
||
- ✅ Permission checks work
|
||
- ✅ HTTP executor functional
|
||
|
||
### E2E:
|
||
- ✅ Agent uses real LLM (not mock)
|
||
- ✅ Agent uses memory (RAG)
|
||
- ✅ Agent can call tools
|
||
- ✅ Full flow: User → Agent (with tool) → Reply
|
||
|
||
---
|
||
|
||
## 📅 Timeline
|
||
|
||
| Week | Focus | Deliverables |
|
||
|------|-------|--------------|
|
||
| 1-2 | LLM Proxy | Service + 2 providers |
|
||
| 3-4 | Memory Orchestrator | Service + vector search |
|
||
| 5-6 | Toolcore | Service + 1 tool |
|
||
| 7 | Integration | Update agent-runtime |
|
||
| 8 | Testing | E2E + optimization |
|
||
|
||
**Total:** 8 weeks (6-8 weeks realistic)
|
||
|
||
---
|
||
|
||
## 🚀 How to Start
|
||
|
||
### Option 1: Cursor AI
|
||
|
||
```bash
|
||
# Copy Phase 3 master task
|
||
cat docs/tasks/PHASE3_MASTER_TASK.md | pbcopy
|
||
|
||
# Paste into Cursor AI
|
||
# Wait for implementation (~1-2 hours per service)
|
||
```
|
||
|
||
### Option 2: Manual
|
||
|
||
```bash
|
||
# 1. Start with LLM Proxy
|
||
mkdir -p services/llm-proxy
|
||
cd services/llm-proxy
|
||
# Follow PHASE3_MASTER_TASK.md
|
||
|
||
# 2. Then Memory Orchestrator
|
||
mkdir -p services/memory-orchestrator
|
||
# ...
|
||
|
||
# 3. Then Toolcore
|
||
mkdir -p services/toolcore
|
||
# ...
|
||
```
|
||
|
||
---
|
||
|
||
## 🔗 Key Files
|
||
|
||
### Specification:
|
||
- [PHASE3_MASTER_TASK.md](docs/tasks/PHASE3_MASTER_TASK.md) ⭐ **Main task**
|
||
- [PHASE3_ROADMAP.md](docs/tasks/PHASE3_ROADMAP.md) — Detailed planning
|
||
|
||
### Phase 2 (Complete):
|
||
- [PHASE2_COMPLETE.md](PHASE2_COMPLETE.md) — What's already built
|
||
- [IMPLEMENTATION_SUMMARY.md](IMPLEMENTATION_SUMMARY.md)
|
||
|
||
---
|
||
|
||
## 💡 Key Concepts
|
||
|
||
### LLM Proxy:
|
||
- **Logical models** (gpt-4.1-mini) → **Physical providers** (OpenAI API)
|
||
- Routing via config
|
||
- Cost tracking per agent
|
||
- Graceful fallbacks
|
||
|
||
### Memory Orchestrator:
|
||
- **Short-term:** Recent channel messages
|
||
- **Mid-term:** RAG embeddings (conversations, tasks)
|
||
- **Long-term:** Knowledge base (docs, roadmaps)
|
||
- Vector search for relevance
|
||
|
||
### Toolcore:
|
||
- **Static registry** (config.yaml) → **Dynamic registry** (DB) later
|
||
- **HTTP executor:** Call external services
|
||
- **Permission model:** Agent → Tool allowlist
|
||
- **Error handling:** Timeouts, retries
|
||
|
||
---
|
||
|
||
## 📊 Service Ports
|
||
|
||
| Service | Port | Purpose |
|
||
|---------|------|---------|
|
||
| messaging-service | 7004 | REST + WebSocket |
|
||
| agent-filter | 7005 | Filtering |
|
||
| agent-runtime | 7006 | Agent execution |
|
||
| **llm-proxy** | **7007** | **LLM gateway** ✨ |
|
||
| **memory-orchestrator** | **7008** | **Memory API** ✨ |
|
||
| **toolcore** | **7009** | **Tool execution** ✨ |
|
||
| router | 8000 | Event routing |
|
||
|
||
---
|
||
|
||
## 🎓 What You'll Learn
|
||
|
||
### Technologies:
|
||
- LLM API integration (OpenAI, DeepSeek)
|
||
- Vector embeddings + similarity search
|
||
- Tool execution patterns
|
||
- Provider abstraction
|
||
- Cost tracking
|
||
- Rate limiting
|
||
|
||
### Architecture:
|
||
- Gateway pattern (LLM Proxy)
|
||
- Orchestrator pattern (Memory)
|
||
- Registry pattern (Toolcore)
|
||
- Multi-provider routing
|
||
- Graceful degradation
|
||
|
||
---
|
||
|
||
## 🐛 Expected Challenges
|
||
|
||
### LLM Proxy:
|
||
- API key management
|
||
- Rate limits from providers
|
||
- Cost control
|
||
- Streaming support (Phase 3.5)
|
||
|
||
**Mitigation:**
|
||
- Environment variables for keys
|
||
- In-memory rate limiting
|
||
- Usage logging
|
||
- Streaming as TODO
|
||
|
||
### Memory Orchestrator:
|
||
- Vector search performance
|
||
- Embedding generation latency
|
||
- Memory indexing pipeline
|
||
- Relevance tuning
|
||
|
||
**Mitigation:**
|
||
- Simple cosine similarity first
|
||
- Async embedding generation
|
||
- Background indexing jobs
|
||
- A/B testing for relevance
|
||
|
||
### Toolcore:
|
||
- Tool permission model
|
||
- Execution sandboxing
|
||
- Error handling
|
||
- Tool discovery
|
||
|
||
**Mitigation:**
|
||
- Config-based permissions v1
|
||
- HTTP executor with timeouts
|
||
- Comprehensive error types
|
||
- Static registry → DB later
|
||
|
||
---
|
||
|
||
## 🔜 After Phase 3
|
||
|
||
### Phase 3.5 (Optional Enhancements):
|
||
- Streaming LLM responses
|
||
- Advanced memory strategies
|
||
- Tool composition
|
||
- Agent-to-agent communication
|
||
|
||
### Phase 4 (Next Major):
|
||
- Usage & Billing system
|
||
- Security (PDP/PEP)
|
||
- Advanced monitoring
|
||
- Agent marketplace
|
||
|
||
---
|
||
|
||
## ✅ Checklist Before Starting
|
||
|
||
### Prerequisites:
|
||
- ✅ Phase 2 complete and tested
|
||
- ✅ NATS running
|
||
- ✅ PostgreSQL running
|
||
- ✅ Docker Compose working
|
||
- ✅ OpenAI API key (optional, can use local)
|
||
|
||
### Recommended:
|
||
- Local LLM setup (Ollama/vLLM) for testing
|
||
- Vector DB exploration (pgvector extension)
|
||
- Review existing tools in your stack
|
||
|
||
---
|
||
|
||
## 🎉 Success Looks Like
|
||
|
||
**After Phase 3:**
|
||
- ✅ Agent Sofia uses real GPT-4 (not mock)
|
||
- ✅ Agent remembers past conversations (RAG)
|
||
- ✅ Agent can list projects (tool execution)
|
||
- ✅ All flows < 5s latency
|
||
- ✅ Usage tracked per agent
|
||
- ✅ Production ready
|
||
|
||
**Example Flow:**
|
||
```
|
||
User: "Sofia, що нового в проєкті X?"
|
||
↓
|
||
agent-runtime:
|
||
1. Query memory (past discussions about project X)
|
||
2. Call tool: projects.list(microdao_id)
|
||
3. Build prompt with context + tool results
|
||
4. Call LLM Proxy (GPT-4)
|
||
5. Post reply
|
||
↓
|
||
Sofia: "В проєкті X є 3 нові задачі:
|
||
1. Завершити Phase 2 тестування
|
||
2. Почати Phase 3 LLM integration
|
||
3. Оновити документацію
|
||
Останнє оновлення було вчора."
|
||
```
|
||
|
||
---
|
||
|
||
## 📞 Next Actions
|
||
|
||
### This Week:
|
||
1. ✅ Review PHASE3_MASTER_TASK.md
|
||
2. ✅ Decide: Cursor AI or manual
|
||
3. ✅ Set up OpenAI API key (or local LLM)
|
||
4. ✅ Review tool requirements
|
||
|
||
### Next Week:
|
||
1. 🔜 Start LLM Proxy implementation
|
||
2. 🔜 Test with 2 providers
|
||
3. 🔜 Integrate with agent-runtime
|
||
|
||
---
|
||
|
||
**Status:** 📋 ALL SPECS READY
|
||
**Version:** 1.0.0
|
||
**Last Updated:** 2025-11-24
|
||
|
||
**READY TO BUILD PHASE 3!** 🚀
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|