Files

Build and Deploy Docs / build-and-deploy (push) Has been cancelled

Details

- Created logs/ structure (sessions, operations, incidents)
- Added session-start/log/end scripts
- Installed Git hooks for auto-logging commits/pushes
- Added shell integration for zsh
- Created CHANGELOG.md
- Documented today's session (2026-01-10)

2026-01-10 04:53:17 -08:00

6.1 KiB

Raw Blame History

LLM Proxy Service

Port: 7007
Purpose: Multi-provider LLM gateway for DAARION agents

Features

✅ Multi-provider support:

OpenAI (GPT-4, GPT-4-turbo, etc.)
DeepSeek (DeepSeek-R1)
Local LLMs (Ollama, vLLM, llama.cpp)

✅ Model routing:

Logical model names → Physical provider models
Config-driven routing (config.yaml)

✅ Usage tracking:

Token counting
Latency monitoring
Cost estimation
Per-agent/microDAO tracking

✅ Rate limiting:

Per-agent limits (10 req/min default)
In-memory (Phase 3), Redis-backed (Phase 4)

✅ Security:

Internal-only API (X-Internal-Secret header)
API key management via env vars

API

POST /internal/llm/proxy

Request:

{
  "model": "gpt-4.1-mini",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hello!"}
  ],
  "max_tokens": 100,
  "temperature": 0.7,
  "metadata": {
    "agent_id": "agent:sofia",
    "microdao_id": "microdao:7",
    "channel_id": "channel-uuid"
  }
}

Response:

{
  "content": "Hello! How can I help you today?",
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 10,
    "total_tokens": 35
  },
  "provider": "openai",
  "model_resolved": "gpt-4-1106-preview",
  "latency_ms": 1234.5
}

GET /internal/llm/models

List available models:

{
  "models": [
    {
      "name": "gpt-4.1-mini",
      "provider": "openai",
      "physical_name": "gpt-4-1106-preview",
      "max_tokens": 4096
    },
    ...
  ]
}

GET /internal/llm/usage?agent_id=agent:sofia

Get usage statistics:

{
  "total_requests": 42,
  "total_tokens": 12345,
  "avg_latency_ms": 987.6,
  "success_rate": 0.98
}

Configuration

Edit config.yaml:

providers:
  openai:
    base_url: "https://api.openai.com/v1"
    api_key_env: "OPENAI_API_KEY"
  
  local:
    base_url: "http://localhost:11434"

models:
  gpt-4.1-mini:
    provider: "openai"
    physical_name: "gpt-4-1106-preview"
    cost_per_1k_prompt: 0.01
    cost_per_1k_completion: 0.03

Environment Variables

OPENAI_API_KEY=sk-...           # OpenAI API key
DEEPSEEK_API_KEY=sk-...         # DeepSeek API key
LLM_PROXY_SECRET=dev-secret-token  # Internal auth token

Setup

Local Development

cd services/llm-proxy

# Install dependencies
pip install -r requirements.txt

# Set API keys
export OPENAI_API_KEY="sk-..."

# Run
python main.py

Docker

docker build -t llm-proxy .
docker run -p 7007:7007 \
  -e OPENAI_API_KEY="sk-..." \
  llm-proxy

With docker-compose

docker-compose -f docker-compose.phase3.yml up llm-proxy

Testing

Test OpenAI

curl -X POST http://localhost:7007/internal/llm/proxy \
  -H "Content-Type: application/json" \
  -H "X-Internal-Secret: dev-secret-token" \
  -d '{
    "model": "gpt-4.1-mini",
    "messages": [
      {"role": "user", "content": "Say hello!"}
    ],
    "metadata": {
      "agent_id": "agent:test"
    }
  }'

Test Local LLM

# Start Ollama
ollama serve
ollama pull qwen2.5:8b

# Test
curl -X POST http://localhost:7007/internal/llm/proxy \
  -H "Content-Type: application/json" \
  -H "X-Internal-Secret: dev-secret-token" \
  -d '{
    "model": "dagi-local-8b",
    "messages": [
      {"role": "user", "content": "Test"}
    ]
  }'

Adding New Providers

Create providers/my_provider.py:

class MyProvider:
    def __init__(self, config: ProviderConfig):
        self.config = config
    
    async def chat(self, messages, model_name, **kwargs) -> LLMResponse:
        # Implement provider logic
        ...

providers:
  my_provider:
    base_url: "https://api.myprovider.com"
    api_key_env: "MY_PROVIDER_KEY"

models:
  my-model:
    provider: "my_provider"
    physical_name: "my-model-v1"

Initialize in main.py:

from providers.my_provider import MyProvider

providers["my_provider"] = MyProvider(provider_config)

Integration with agent-runtime

In agent-runtime:

import httpx

async def call_llm(agent_blueprint, messages):
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "http://llm-proxy:7007/internal/llm/proxy",
            headers={"X-Internal-Secret": "dev-secret-token"},
            json={
                "model": agent_blueprint.llm_model,
                "messages": messages,
                "metadata": {
                    "agent_id": agent_blueprint.id,
                    "microdao_id": agent_blueprint.microdao_id
                }
            }
        )
        return response.json()

Roadmap

Phase 3 (Current):

✅ Multi-provider support
✅ Basic rate limiting
✅ Usage logging
✅ OpenAI + DeepSeek + Local

Phase 3.5:

🔜 Streaming responses
🔜 Response caching
🔜 Function calling support
🔜 Redis-backed rate limiting

Phase 4:

🔜 Database-backed usage logs
🔜 Cost analytics
🔜 Billing integration
🔜 Advanced routing (fallbacks, load balancing)

Troubleshooting

Provider not working?

# Check API key
docker logs llm-proxy | grep "api_key"

# Test directly
curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY"

Rate limit issues?

# Check current limits
curl http://localhost:7007/internal/llm/usage?agent_id=agent:sofia \
  -H "X-Internal-Secret: dev-secret-token"

Local LLM not responding?

# Check Ollama
curl http://localhost:11434/api/version

# Check logs
docker logs llm-proxy | grep "local"

Architecture

agent-runtime
    ↓
    POST /internal/llm/proxy
    ↓
llm-proxy:
    ├─ Rate limiter (check agent quota)
    ├─ Model router (logical → physical)
    ├─ Provider selector (OpenAI/DeepSeek/Local)
    └─ Usage tracker (log tokens, cost, latency)
    ↓
[OpenAI API | DeepSeek API | Local Ollama]
    ↓
Response → agent-runtime

License

Internal DAARION service

Status: ✅ Phase 3 Ready
Version: 1.0.0
Last Updated: 2025-11-24

6.1 KiB Raw Blame History