Some checks failed
Build and Deploy Docs / build-and-deploy (push) Has been cancelled
- Created logs/ structure (sessions, operations, incidents) - Added session-start/log/end scripts - Installed Git hooks for auto-logging commits/pushes - Added shell integration for zsh - Created CHANGELOG.md - Documented today's session (2026-01-10)
LLM Proxy Service
Port: 7007
Purpose: Multi-provider LLM gateway for DAARION agents
Features
✅ Multi-provider support:
- OpenAI (GPT-4, GPT-4-turbo, etc.)
- DeepSeek (DeepSeek-R1)
- Local LLMs (Ollama, vLLM, llama.cpp)
✅ Model routing:
- Logical model names → Physical provider models
- Config-driven routing (
config.yaml)
✅ Usage tracking:
- Token counting
- Latency monitoring
- Cost estimation
- Per-agent/microDAO tracking
✅ Rate limiting:
- Per-agent limits (10 req/min default)
- In-memory (Phase 3), Redis-backed (Phase 4)
✅ Security:
- Internal-only API (
X-Internal-Secretheader) - API key management via env vars
API
POST /internal/llm/proxy
Request:
{
"model": "gpt-4.1-mini",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello!"}
],
"max_tokens": 100,
"temperature": 0.7,
"metadata": {
"agent_id": "agent:sofia",
"microdao_id": "microdao:7",
"channel_id": "channel-uuid"
}
}
Response:
{
"content": "Hello! How can I help you today?",
"usage": {
"prompt_tokens": 25,
"completion_tokens": 10,
"total_tokens": 35
},
"provider": "openai",
"model_resolved": "gpt-4-1106-preview",
"latency_ms": 1234.5
}
GET /internal/llm/models
List available models:
{
"models": [
{
"name": "gpt-4.1-mini",
"provider": "openai",
"physical_name": "gpt-4-1106-preview",
"max_tokens": 4096
},
...
]
}
GET /internal/llm/usage?agent_id=agent:sofia
Get usage statistics:
{
"total_requests": 42,
"total_tokens": 12345,
"avg_latency_ms": 987.6,
"success_rate": 0.98
}
Configuration
Edit config.yaml:
providers:
openai:
base_url: "https://api.openai.com/v1"
api_key_env: "OPENAI_API_KEY"
local:
base_url: "http://localhost:11434"
models:
gpt-4.1-mini:
provider: "openai"
physical_name: "gpt-4-1106-preview"
cost_per_1k_prompt: 0.01
cost_per_1k_completion: 0.03
Environment Variables
OPENAI_API_KEY=sk-... # OpenAI API key
DEEPSEEK_API_KEY=sk-... # DeepSeek API key
LLM_PROXY_SECRET=dev-secret-token # Internal auth token
Setup
Local Development
cd services/llm-proxy
# Install dependencies
pip install -r requirements.txt
# Set API keys
export OPENAI_API_KEY="sk-..."
# Run
python main.py
Docker
docker build -t llm-proxy .
docker run -p 7007:7007 \
-e OPENAI_API_KEY="sk-..." \
llm-proxy
With docker-compose
docker-compose -f docker-compose.phase3.yml up llm-proxy
Testing
Test OpenAI
curl -X POST http://localhost:7007/internal/llm/proxy \
-H "Content-Type: application/json" \
-H "X-Internal-Secret: dev-secret-token" \
-d '{
"model": "gpt-4.1-mini",
"messages": [
{"role": "user", "content": "Say hello!"}
],
"metadata": {
"agent_id": "agent:test"
}
}'
Test Local LLM
# Start Ollama
ollama serve
ollama pull qwen2.5:8b
# Test
curl -X POST http://localhost:7007/internal/llm/proxy \
-H "Content-Type: application/json" \
-H "X-Internal-Secret: dev-secret-token" \
-d '{
"model": "dagi-local-8b",
"messages": [
{"role": "user", "content": "Test"}
]
}'
Adding New Providers
- Create
providers/my_provider.py:
class MyProvider:
def __init__(self, config: ProviderConfig):
self.config = config
async def chat(self, messages, model_name, **kwargs) -> LLMResponse:
# Implement provider logic
...
- Register in
config.yaml:
providers:
my_provider:
base_url: "https://api.myprovider.com"
api_key_env: "MY_PROVIDER_KEY"
models:
my-model:
provider: "my_provider"
physical_name: "my-model-v1"
- Initialize in
main.py:
from providers.my_provider import MyProvider
providers["my_provider"] = MyProvider(provider_config)
Integration with agent-runtime
In agent-runtime:
import httpx
async def call_llm(agent_blueprint, messages):
async with httpx.AsyncClient() as client:
response = await client.post(
"http://llm-proxy:7007/internal/llm/proxy",
headers={"X-Internal-Secret": "dev-secret-token"},
json={
"model": agent_blueprint.llm_model,
"messages": messages,
"metadata": {
"agent_id": agent_blueprint.id,
"microdao_id": agent_blueprint.microdao_id
}
}
)
return response.json()
Roadmap
Phase 3 (Current):
- ✅ Multi-provider support
- ✅ Basic rate limiting
- ✅ Usage logging
- ✅ OpenAI + DeepSeek + Local
Phase 3.5:
- 🔜 Streaming responses
- 🔜 Response caching
- 🔜 Function calling support
- 🔜 Redis-backed rate limiting
Phase 4:
- 🔜 Database-backed usage logs
- 🔜 Cost analytics
- 🔜 Billing integration
- 🔜 Advanced routing (fallbacks, load balancing)
Troubleshooting
Provider not working?
# Check API key
docker logs llm-proxy | grep "api_key"
# Test directly
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY"
Rate limit issues?
# Check current limits
curl http://localhost:7007/internal/llm/usage?agent_id=agent:sofia \
-H "X-Internal-Secret: dev-secret-token"
Local LLM not responding?
# Check Ollama
curl http://localhost:11434/api/version
# Check logs
docker logs llm-proxy | grep "local"
Architecture
agent-runtime
↓
POST /internal/llm/proxy
↓
llm-proxy:
├─ Rate limiter (check agent quota)
├─ Model router (logical → physical)
├─ Provider selector (OpenAI/DeepSeek/Local)
└─ Usage tracker (log tokens, cost, latency)
↓
[OpenAI API | DeepSeek API | Local Ollama]
↓
Response → agent-runtime
License
Internal DAARION service
Status: ✅ Phase 3 Ready
Version: 1.0.0
Last Updated: 2025-11-24