- Node-guardian running on MacBook and updating metrics - NODE2 agents (Atlas, Greeter, Oracle, Builder Bot) assigned to node-2-macbook-m4max - Swapper models displaying correctly (8 models) - DAGI Router agents showing with correct status (3 active, 1 stale) - Router health check using node_cache for remote nodes
338 lines
6.1 KiB
Markdown
338 lines
6.1 KiB
Markdown
# LLM Proxy Service
|
|
|
|
**Port:** 7007
|
|
**Purpose:** Multi-provider LLM gateway for DAARION agents
|
|
|
|
## Features
|
|
|
|
✅ **Multi-provider support:**
|
|
- OpenAI (GPT-4, GPT-4-turbo, etc.)
|
|
- DeepSeek (DeepSeek-R1)
|
|
- Local LLMs (Ollama, vLLM, llama.cpp)
|
|
|
|
✅ **Model routing:**
|
|
- Logical model names → Physical provider models
|
|
- Config-driven routing (`config.yaml`)
|
|
|
|
✅ **Usage tracking:**
|
|
- Token counting
|
|
- Latency monitoring
|
|
- Cost estimation
|
|
- Per-agent/microDAO tracking
|
|
|
|
✅ **Rate limiting:**
|
|
- Per-agent limits (10 req/min default)
|
|
- In-memory (Phase 3), Redis-backed (Phase 4)
|
|
|
|
✅ **Security:**
|
|
- Internal-only API (`X-Internal-Secret` header)
|
|
- API key management via env vars
|
|
|
|
## API
|
|
|
|
### POST /internal/llm/proxy
|
|
|
|
**Request:**
|
|
```json
|
|
{
|
|
"model": "gpt-4.1-mini",
|
|
"messages": [
|
|
{"role": "system", "content": "You are a helpful assistant"},
|
|
{"role": "user", "content": "Hello!"}
|
|
],
|
|
"max_tokens": 100,
|
|
"temperature": 0.7,
|
|
"metadata": {
|
|
"agent_id": "agent:sofia",
|
|
"microdao_id": "microdao:7",
|
|
"channel_id": "channel-uuid"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"content": "Hello! How can I help you today?",
|
|
"usage": {
|
|
"prompt_tokens": 25,
|
|
"completion_tokens": 10,
|
|
"total_tokens": 35
|
|
},
|
|
"provider": "openai",
|
|
"model_resolved": "gpt-4-1106-preview",
|
|
"latency_ms": 1234.5
|
|
}
|
|
```
|
|
|
|
### GET /internal/llm/models
|
|
|
|
List available models:
|
|
```json
|
|
{
|
|
"models": [
|
|
{
|
|
"name": "gpt-4.1-mini",
|
|
"provider": "openai",
|
|
"physical_name": "gpt-4-1106-preview",
|
|
"max_tokens": 4096
|
|
},
|
|
...
|
|
]
|
|
}
|
|
```
|
|
|
|
### GET /internal/llm/usage?agent_id=agent:sofia
|
|
|
|
Get usage statistics:
|
|
```json
|
|
{
|
|
"total_requests": 42,
|
|
"total_tokens": 12345,
|
|
"avg_latency_ms": 987.6,
|
|
"success_rate": 0.98
|
|
}
|
|
```
|
|
|
|
## Configuration
|
|
|
|
Edit `config.yaml`:
|
|
|
|
```yaml
|
|
providers:
|
|
openai:
|
|
base_url: "https://api.openai.com/v1"
|
|
api_key_env: "OPENAI_API_KEY"
|
|
|
|
local:
|
|
base_url: "http://localhost:11434"
|
|
|
|
models:
|
|
gpt-4.1-mini:
|
|
provider: "openai"
|
|
physical_name: "gpt-4-1106-preview"
|
|
cost_per_1k_prompt: 0.01
|
|
cost_per_1k_completion: 0.03
|
|
```
|
|
|
|
## Environment Variables
|
|
|
|
```bash
|
|
OPENAI_API_KEY=sk-... # OpenAI API key
|
|
DEEPSEEK_API_KEY=sk-... # DeepSeek API key
|
|
LLM_PROXY_SECRET=dev-secret-token # Internal auth token
|
|
```
|
|
|
|
## Setup
|
|
|
|
### Local Development
|
|
|
|
```bash
|
|
cd services/llm-proxy
|
|
|
|
# Install dependencies
|
|
pip install -r requirements.txt
|
|
|
|
# Set API keys
|
|
export OPENAI_API_KEY="sk-..."
|
|
|
|
# Run
|
|
python main.py
|
|
```
|
|
|
|
### Docker
|
|
|
|
```bash
|
|
docker build -t llm-proxy .
|
|
docker run -p 7007:7007 \
|
|
-e OPENAI_API_KEY="sk-..." \
|
|
llm-proxy
|
|
```
|
|
|
|
### With docker-compose
|
|
|
|
```bash
|
|
docker-compose -f docker-compose.phase3.yml up llm-proxy
|
|
```
|
|
|
|
## Testing
|
|
|
|
### Test OpenAI
|
|
|
|
```bash
|
|
curl -X POST http://localhost:7007/internal/llm/proxy \
|
|
-H "Content-Type: application/json" \
|
|
-H "X-Internal-Secret: dev-secret-token" \
|
|
-d '{
|
|
"model": "gpt-4.1-mini",
|
|
"messages": [
|
|
{"role": "user", "content": "Say hello!"}
|
|
],
|
|
"metadata": {
|
|
"agent_id": "agent:test"
|
|
}
|
|
}'
|
|
```
|
|
|
|
### Test Local LLM
|
|
|
|
```bash
|
|
# Start Ollama
|
|
ollama serve
|
|
ollama pull qwen2.5:8b
|
|
|
|
# Test
|
|
curl -X POST http://localhost:7007/internal/llm/proxy \
|
|
-H "Content-Type: application/json" \
|
|
-H "X-Internal-Secret: dev-secret-token" \
|
|
-d '{
|
|
"model": "dagi-local-8b",
|
|
"messages": [
|
|
{"role": "user", "content": "Test"}
|
|
]
|
|
}'
|
|
```
|
|
|
|
## Adding New Providers
|
|
|
|
1. Create `providers/my_provider.py`:
|
|
|
|
```python
|
|
class MyProvider:
|
|
def __init__(self, config: ProviderConfig):
|
|
self.config = config
|
|
|
|
async def chat(self, messages, model_name, **kwargs) -> LLMResponse:
|
|
# Implement provider logic
|
|
...
|
|
```
|
|
|
|
2. Register in `config.yaml`:
|
|
|
|
```yaml
|
|
providers:
|
|
my_provider:
|
|
base_url: "https://api.myprovider.com"
|
|
api_key_env: "MY_PROVIDER_KEY"
|
|
|
|
models:
|
|
my-model:
|
|
provider: "my_provider"
|
|
physical_name: "my-model-v1"
|
|
```
|
|
|
|
3. Initialize in `main.py`:
|
|
|
|
```python
|
|
from providers.my_provider import MyProvider
|
|
|
|
providers["my_provider"] = MyProvider(provider_config)
|
|
```
|
|
|
|
## Integration with agent-runtime
|
|
|
|
In `agent-runtime`:
|
|
|
|
```python
|
|
import httpx
|
|
|
|
async def call_llm(agent_blueprint, messages):
|
|
async with httpx.AsyncClient() as client:
|
|
response = await client.post(
|
|
"http://llm-proxy:7007/internal/llm/proxy",
|
|
headers={"X-Internal-Secret": "dev-secret-token"},
|
|
json={
|
|
"model": agent_blueprint.llm_model,
|
|
"messages": messages,
|
|
"metadata": {
|
|
"agent_id": agent_blueprint.id,
|
|
"microdao_id": agent_blueprint.microdao_id
|
|
}
|
|
}
|
|
)
|
|
return response.json()
|
|
```
|
|
|
|
## Roadmap
|
|
|
|
### Phase 3 (Current):
|
|
- ✅ Multi-provider support
|
|
- ✅ Basic rate limiting
|
|
- ✅ Usage logging
|
|
- ✅ OpenAI + DeepSeek + Local
|
|
|
|
### Phase 3.5:
|
|
- 🔜 Streaming responses
|
|
- 🔜 Response caching
|
|
- 🔜 Function calling support
|
|
- 🔜 Redis-backed rate limiting
|
|
|
|
### Phase 4:
|
|
- 🔜 Database-backed usage logs
|
|
- 🔜 Cost analytics
|
|
- 🔜 Billing integration
|
|
- 🔜 Advanced routing (fallbacks, load balancing)
|
|
|
|
## Troubleshooting
|
|
|
|
**Provider not working?**
|
|
```bash
|
|
# Check API key
|
|
docker logs llm-proxy | grep "api_key"
|
|
|
|
# Test directly
|
|
curl https://api.openai.com/v1/models \
|
|
-H "Authorization: Bearer $OPENAI_API_KEY"
|
|
```
|
|
|
|
**Rate limit issues?**
|
|
```bash
|
|
# Check current limits
|
|
curl http://localhost:7007/internal/llm/usage?agent_id=agent:sofia \
|
|
-H "X-Internal-Secret: dev-secret-token"
|
|
```
|
|
|
|
**Local LLM not responding?**
|
|
```bash
|
|
# Check Ollama
|
|
curl http://localhost:11434/api/version
|
|
|
|
# Check logs
|
|
docker logs llm-proxy | grep "local"
|
|
```
|
|
|
|
## Architecture
|
|
|
|
```
|
|
agent-runtime
|
|
↓
|
|
POST /internal/llm/proxy
|
|
↓
|
|
llm-proxy:
|
|
├─ Rate limiter (check agent quota)
|
|
├─ Model router (logical → physical)
|
|
├─ Provider selector (OpenAI/DeepSeek/Local)
|
|
└─ Usage tracker (log tokens, cost, latency)
|
|
↓
|
|
[OpenAI API | DeepSeek API | Local Ollama]
|
|
↓
|
|
Response → agent-runtime
|
|
```
|
|
|
|
## License
|
|
|
|
Internal DAARION service
|
|
|
|
---
|
|
|
|
**Status:** ✅ Phase 3 Ready
|
|
**Version:** 1.0.0
|
|
**Last Updated:** 2025-11-24
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|