Files
microdao-daarion/services/llm-proxy/README.md
Apple 6bd769ef40 feat(city-map): Add 2D City Map with coordinates and agent presence
- Add migration 013_city_map_coordinates.sql with map coordinates, zones, and agents table
- Add /city/map API endpoint in city-service
- Add /city/agents and /city/agents/online endpoints
- Extend presence aggregator to include agents[] in snapshot
- Add AgentsSource for fetching agent data from DB
- Create CityMap component with interactive room tiles
- Add useCityMap hook for fetching map data
- Update useGlobalPresence to include agents
- Add map/list view toggle on /city page
- Add agent badges to room cards and map tiles
2025-11-27 07:00:47 -08:00

336 lines
6.1 KiB
Markdown

# LLM Proxy Service
**Port:** 7007
**Purpose:** Multi-provider LLM gateway for DAARION agents
## Features
**Multi-provider support:**
- OpenAI (GPT-4, GPT-4-turbo, etc.)
- DeepSeek (DeepSeek-R1)
- Local LLMs (Ollama, vLLM, llama.cpp)
**Model routing:**
- Logical model names → Physical provider models
- Config-driven routing (`config.yaml`)
**Usage tracking:**
- Token counting
- Latency monitoring
- Cost estimation
- Per-agent/microDAO tracking
**Rate limiting:**
- Per-agent limits (10 req/min default)
- In-memory (Phase 3), Redis-backed (Phase 4)
**Security:**
- Internal-only API (`X-Internal-Secret` header)
- API key management via env vars
## API
### POST /internal/llm/proxy
**Request:**
```json
{
"model": "gpt-4.1-mini",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello!"}
],
"max_tokens": 100,
"temperature": 0.7,
"metadata": {
"agent_id": "agent:sofia",
"microdao_id": "microdao:7",
"channel_id": "channel-uuid"
}
}
```
**Response:**
```json
{
"content": "Hello! How can I help you today?",
"usage": {
"prompt_tokens": 25,
"completion_tokens": 10,
"total_tokens": 35
},
"provider": "openai",
"model_resolved": "gpt-4-1106-preview",
"latency_ms": 1234.5
}
```
### GET /internal/llm/models
List available models:
```json
{
"models": [
{
"name": "gpt-4.1-mini",
"provider": "openai",
"physical_name": "gpt-4-1106-preview",
"max_tokens": 4096
},
...
]
}
```
### GET /internal/llm/usage?agent_id=agent:sofia
Get usage statistics:
```json
{
"total_requests": 42,
"total_tokens": 12345,
"avg_latency_ms": 987.6,
"success_rate": 0.98
}
```
## Configuration
Edit `config.yaml`:
```yaml
providers:
openai:
base_url: "https://api.openai.com/v1"
api_key_env: "OPENAI_API_KEY"
local:
base_url: "http://localhost:11434"
models:
gpt-4.1-mini:
provider: "openai"
physical_name: "gpt-4-1106-preview"
cost_per_1k_prompt: 0.01
cost_per_1k_completion: 0.03
```
## Environment Variables
```bash
OPENAI_API_KEY=sk-... # OpenAI API key
DEEPSEEK_API_KEY=sk-... # DeepSeek API key
LLM_PROXY_SECRET=dev-secret-token # Internal auth token
```
## Setup
### Local Development
```bash
cd services/llm-proxy
# Install dependencies
pip install -r requirements.txt
# Set API keys
export OPENAI_API_KEY="sk-..."
# Run
python main.py
```
### Docker
```bash
docker build -t llm-proxy .
docker run -p 7007:7007 \
-e OPENAI_API_KEY="sk-..." \
llm-proxy
```
### With docker-compose
```bash
docker-compose -f docker-compose.phase3.yml up llm-proxy
```
## Testing
### Test OpenAI
```bash
curl -X POST http://localhost:7007/internal/llm/proxy \
-H "Content-Type: application/json" \
-H "X-Internal-Secret: dev-secret-token" \
-d '{
"model": "gpt-4.1-mini",
"messages": [
{"role": "user", "content": "Say hello!"}
],
"metadata": {
"agent_id": "agent:test"
}
}'
```
### Test Local LLM
```bash
# Start Ollama
ollama serve
ollama pull qwen2.5:8b
# Test
curl -X POST http://localhost:7007/internal/llm/proxy \
-H "Content-Type: application/json" \
-H "X-Internal-Secret: dev-secret-token" \
-d '{
"model": "dagi-local-8b",
"messages": [
{"role": "user", "content": "Test"}
]
}'
```
## Adding New Providers
1. Create `providers/my_provider.py`:
```python
class MyProvider:
def __init__(self, config: ProviderConfig):
self.config = config
async def chat(self, messages, model_name, **kwargs) -> LLMResponse:
# Implement provider logic
...
```
2. Register in `config.yaml`:
```yaml
providers:
my_provider:
base_url: "https://api.myprovider.com"
api_key_env: "MY_PROVIDER_KEY"
models:
my-model:
provider: "my_provider"
physical_name: "my-model-v1"
```
3. Initialize in `main.py`:
```python
from providers.my_provider import MyProvider
providers["my_provider"] = MyProvider(provider_config)
```
## Integration with agent-runtime
In `agent-runtime`:
```python
import httpx
async def call_llm(agent_blueprint, messages):
async with httpx.AsyncClient() as client:
response = await client.post(
"http://llm-proxy:7007/internal/llm/proxy",
headers={"X-Internal-Secret": "dev-secret-token"},
json={
"model": agent_blueprint.llm_model,
"messages": messages,
"metadata": {
"agent_id": agent_blueprint.id,
"microdao_id": agent_blueprint.microdao_id
}
}
)
return response.json()
```
## Roadmap
### Phase 3 (Current):
- ✅ Multi-provider support
- ✅ Basic rate limiting
- ✅ Usage logging
- ✅ OpenAI + DeepSeek + Local
### Phase 3.5:
- 🔜 Streaming responses
- 🔜 Response caching
- 🔜 Function calling support
- 🔜 Redis-backed rate limiting
### Phase 4:
- 🔜 Database-backed usage logs
- 🔜 Cost analytics
- 🔜 Billing integration
- 🔜 Advanced routing (fallbacks, load balancing)
## Troubleshooting
**Provider not working?**
```bash
# Check API key
docker logs llm-proxy | grep "api_key"
# Test directly
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY"
```
**Rate limit issues?**
```bash
# Check current limits
curl http://localhost:7007/internal/llm/usage?agent_id=agent:sofia \
-H "X-Internal-Secret: dev-secret-token"
```
**Local LLM not responding?**
```bash
# Check Ollama
curl http://localhost:11434/api/version
# Check logs
docker logs llm-proxy | grep "local"
```
## Architecture
```
agent-runtime
POST /internal/llm/proxy
llm-proxy:
├─ Rate limiter (check agent quota)
├─ Model router (logical → physical)
├─ Provider selector (OpenAI/DeepSeek/Local)
└─ Usage tracker (log tokens, cost, latency)
[OpenAI API | DeepSeek API | Local Ollama]
Response → agent-runtime
```
## License
Internal DAARION service
---
**Status:** ✅ Phase 3 Ready
**Version:** 1.0.0
**Last Updated:** 2025-11-24