# 🚀 PHASE 3 READY — LLM Proxy + Memory + Tools

**Status:** 📋 Ready to implement  
**Dependencies:** Phase 2 complete ✅  
**Estimated Time:** 6-8 weeks  
**Priority:** High

---

## 🎯 Goal

Зробити агентів DAARION по-справжньому розумними:
- **LLM Proxy** — єдина точка для всіх LLM запитів (OpenAI, DeepSeek, Local)
- **Memory Orchestrator** — єдиний API для short/mid/long-term памʼяті
- **Toolcore** — реєстр інструментів + безпечне виконання

**Phase 3 = Infrastructure for Agent Intelligence**

---

## 📦 What Will Be Built

### 1. LLM Proxy Service
**Port:** 7007  
**Purpose:** Unified LLM gateway

**Features:**
- ✅ Multi-provider support (OpenAI, DeepSeek, Local)
- ✅ Model routing (logical → physical models)
- ✅ Usage logging (tokens, latency per agent)
- ✅ Rate limiting per agent
- ✅ Cost tracking hooks

**API:**
```http
POST /internal/llm/proxy
{
  "model": "gpt-4.1-mini",
  "messages": [...],
  "metadata": { "agent_id": "...", "microdao_id": "..." }
}
```

**Deliverables:** 10 files
- `main.py`, `models.py`, `router.py`
- `providers/` (OpenAI, DeepSeek, Local)
- `config.yaml`, `Dockerfile`, `README.md`

---

### 2. Memory Orchestrator Service
**Port:** 7008  
**Purpose:** Unified memory API

**Features:**
- ✅ Short-term memory (channel context)
- ✅ Mid-term memory (agent RAG)
- ✅ Long-term memory (knowledge base)
- ✅ Vector search (embeddings)
- ✅ Memory indexing pipeline

**API:**
```http
POST /internal/agent-memory/query
{
  "agent_id": "agent:sofia",
  "microdao_id": "microdao:7",
  "query": "What were recent changes?",
  "limit": 5
}

POST /internal/agent-memory/store
{
  "agent_id": "...",
  "content": { "user_message": "...", "agent_reply": "..." }
}
```

**Deliverables:** 9 files
- `main.py`, `models.py`, `router.py`
- `backends/` (PostgreSQL, Vector Store, KB)
- `embedding_client.py`, `config.yaml`, `README.md`

---

### 3. Toolcore Service
**Port:** 7009  
**Purpose:** Tool registry + execution

**Features:**
- ✅ Tool registry (config-based → DB-backed later)
- ✅ Permission checks (agent → tool mapping)
- ✅ HTTP executor (call external services)
- ✅ Python executor (optional, for internal functions)
- ✅ Error handling + timeouts

**API:**
```http
GET /internal/tools
→ List available tools

POST /internal/tools/call
{
  "tool_id": "projects.list",
  "agent_id": "agent:sofia",
  "args": { "microdao_id": "microdao:7" }
}
```

**Deliverables:** 8 files
- `main.py`, `models.py`, `registry.py`
- `executors/` (HTTP, Python)
- `config.yaml`, `Dockerfile`, `README.md`

---

## 🔄 Updated Architecture

### Before (Phase 2):
```
agent-runtime:
  - Mock LLM responses
  - Optional memory
  - No tools
```

### After (Phase 3):
```
agent-runtime:
  ↓
  ├─ LLM Proxy → [OpenAI | DeepSeek | Local]
  ├─ Memory Orchestrator → [Vector DB | PostgreSQL]
  └─ Toolcore → [projects.list | task.create | ...]
```

---

## 🎯 Acceptance Criteria

### LLM Proxy:
- ✅ 2+ providers working (e.g., OpenAI + Local stub)
- ✅ Model routing from config
- ✅ Usage logging per agent
- ✅ Health checks pass

### Memory Orchestrator:
- ✅ Query returns relevant memories
- ✅ Store saves new memories
- ✅ Vector search works (simple cosine)
- ✅ agent-runtime integration

### Toolcore:
- ✅ Tool registry loaded from config
- ✅ 1+ tool working (e.g., projects.list)
- ✅ Permission checks work
- ✅ HTTP executor functional

### E2E:
- ✅ Agent uses real LLM (not mock)
- ✅ Agent uses memory (RAG)
- ✅ Agent can call tools
- ✅ Full flow: User → Agent (with tool) → Reply

---

## 📅 Timeline

| Week | Focus | Deliverables |
|------|-------|--------------|
| 1-2 | LLM Proxy | Service + 2 providers |
| 3-4 | Memory Orchestrator | Service + vector search |
| 5-6 | Toolcore | Service + 1 tool |
| 7 | Integration | Update agent-runtime |
| 8 | Testing | E2E + optimization |

**Total:** 8 weeks (6-8 weeks realistic)

---

## 🚀 How to Start

### Option 1: Cursor AI

```bash
# Copy Phase 3 master task
cat docs/tasks/PHASE3_MASTER_TASK.md | pbcopy

# Paste into Cursor AI
# Wait for implementation (~1-2 hours per service)
```

### Option 2: Manual

```bash
# 1. Start with LLM Proxy
mkdir -p services/llm-proxy
cd services/llm-proxy
# Follow PHASE3_MASTER_TASK.md

# 2. Then Memory Orchestrator
mkdir -p services/memory-orchestrator
# ...

# 3. Then Toolcore
mkdir -p services/toolcore
# ...
```

---

## 🔗 Key Files

### Specification:
- [PHASE3_MASTER_TASK.md](docs/tasks/PHASE3_MASTER_TASK.md) ⭐ **Main task**
- [PHASE3_ROADMAP.md](docs/tasks/PHASE3_ROADMAP.md) — Detailed planning

### Phase 2 (Complete):
- [PHASE2_COMPLETE.md](PHASE2_COMPLETE.md) — What's already built
- [IMPLEMENTATION_SUMMARY.md](IMPLEMENTATION_SUMMARY.md)

---

## 💡 Key Concepts

### LLM Proxy:
- **Logical models** (gpt-4.1-mini) → **Physical providers** (OpenAI API)
- Routing via config
- Cost tracking per agent
- Graceful fallbacks

### Memory Orchestrator:
- **Short-term:** Recent channel messages
- **Mid-term:** RAG embeddings (conversations, tasks)
- **Long-term:** Knowledge base (docs, roadmaps)
- Vector search for relevance

### Toolcore:
- **Static registry** (config.yaml) → **Dynamic registry** (DB) later
- **HTTP executor:** Call external services
- **Permission model:** Agent → Tool allowlist
- **Error handling:** Timeouts, retries

---

## 📊 Service Ports

| Service | Port | Purpose |
|---------|------|---------|
| messaging-service | 7004 | REST + WebSocket |
| agent-filter | 7005 | Filtering |
| agent-runtime | 7006 | Agent execution |
| **llm-proxy** | **7007** | **LLM gateway** ✨ |
| **memory-orchestrator** | **7008** | **Memory API** ✨ |
| **toolcore** | **7009** | **Tool execution** ✨ |
| router | 8000 | Event routing |

---

## 🎓 What You'll Learn

### Technologies:
- LLM API integration (OpenAI, DeepSeek)
- Vector embeddings + similarity search
- Tool execution patterns
- Provider abstraction
- Cost tracking
- Rate limiting

### Architecture:
- Gateway pattern (LLM Proxy)
- Orchestrator pattern (Memory)
- Registry pattern (Toolcore)
- Multi-provider routing
- Graceful degradation

---

## 🐛 Expected Challenges

### LLM Proxy:
- API key management
- Rate limits from providers
- Cost control
- Streaming support (Phase 3.5)

**Mitigation:**
- Environment variables for keys
- In-memory rate limiting
- Usage logging
- Streaming as TODO

### Memory Orchestrator:
- Vector search performance
- Embedding generation latency
- Memory indexing pipeline
- Relevance tuning

**Mitigation:**
- Simple cosine similarity first
- Async embedding generation
- Background indexing jobs
- A/B testing for relevance

### Toolcore:
- Tool permission model
- Execution sandboxing
- Error handling
- Tool discovery

**Mitigation:**
- Config-based permissions v1
- HTTP executor with timeouts
- Comprehensive error types
- Static registry → DB later

---

## 🔜 After Phase 3

### Phase 3.5 (Optional Enhancements):
- Streaming LLM responses
- Advanced memory strategies
- Tool composition
- Agent-to-agent communication

### Phase 4 (Next Major):
- Usage & Billing system
- Security (PDP/PEP)
- Advanced monitoring
- Agent marketplace

---

## ✅ Checklist Before Starting

### Prerequisites:
- ✅ Phase 2 complete and tested
- ✅ NATS running
- ✅ PostgreSQL running
- ✅ Docker Compose working
- ✅ OpenAI API key (optional, can use local)

### Recommended:
- Local LLM setup (Ollama/vLLM) for testing
- Vector DB exploration (pgvector extension)
- Review existing tools in your stack

---

## 🎉 Success Looks Like

**After Phase 3:**
- ✅ Agent Sofia uses real GPT-4 (not mock)
- ✅ Agent remembers past conversations (RAG)
- ✅ Agent can list projects (tool execution)
- ✅ All flows < 5s latency
- ✅ Usage tracked per agent
- ✅ Production ready

**Example Flow:**
```
User: "Sofia, що нового в проєкті X?"
    ↓
agent-runtime:
  1. Query memory (past discussions about project X)
  2. Call tool: projects.list(microdao_id)
  3. Build prompt with context + tool results
  4. Call LLM Proxy (GPT-4)
  5. Post reply
    ↓
Sofia: "В проєкті X є 3 нові задачі:
1. Завершити Phase 2 тестування
2. Почати Phase 3 LLM integration
3. Оновити документацію
Останнє оновлення було вчора."
```

---

## 📞 Next Actions

### This Week:
1. ✅ Review PHASE3_MASTER_TASK.md
2. ✅ Decide: Cursor AI or manual
3. ✅ Set up OpenAI API key (or local LLM)
4. ✅ Review tool requirements

### Next Week:
1. 🔜 Start LLM Proxy implementation
2. 🔜 Test with 2 providers
3. 🔜 Integrate with agent-runtime

---

**Status:** 📋 ALL SPECS READY  
**Version:** 1.0.0  
**Last Updated:** 2025-11-24

**READY TO BUILD PHASE 3!** 🚀