feat: Add presence heartbeat for Matrix online status
- matrix-gateway: POST /internal/matrix/presence/online endpoint - usePresenceHeartbeat hook with activity tracking - Auto away after 5 min inactivity - Offline on page close/visibility change - Integrated in MatrixChatRoom component
This commit is contained in:
334
services/llm-proxy/README.md
Normal file
334
services/llm-proxy/README.md
Normal file
@@ -0,0 +1,334 @@
|
||||
# LLM Proxy Service
|
||||
|
||||
**Port:** 7007
|
||||
**Purpose:** Multi-provider LLM gateway for DAARION agents
|
||||
|
||||
## Features
|
||||
|
||||
✅ **Multi-provider support:**
|
||||
- OpenAI (GPT-4, GPT-4-turbo, etc.)
|
||||
- DeepSeek (DeepSeek-R1)
|
||||
- Local LLMs (Ollama, vLLM, llama.cpp)
|
||||
|
||||
✅ **Model routing:**
|
||||
- Logical model names → Physical provider models
|
||||
- Config-driven routing (`config.yaml`)
|
||||
|
||||
✅ **Usage tracking:**
|
||||
- Token counting
|
||||
- Latency monitoring
|
||||
- Cost estimation
|
||||
- Per-agent/microDAO tracking
|
||||
|
||||
✅ **Rate limiting:**
|
||||
- Per-agent limits (10 req/min default)
|
||||
- In-memory (Phase 3), Redis-backed (Phase 4)
|
||||
|
||||
✅ **Security:**
|
||||
- Internal-only API (`X-Internal-Secret` header)
|
||||
- API key management via env vars
|
||||
|
||||
## API
|
||||
|
||||
### POST /internal/llm/proxy
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"model": "gpt-4.1-mini",
|
||||
"messages": [
|
||||
{"role": "system", "content": "You are a helpful assistant"},
|
||||
{"role": "user", "content": "Hello!"}
|
||||
],
|
||||
"max_tokens": 100,
|
||||
"temperature": 0.7,
|
||||
"metadata": {
|
||||
"agent_id": "agent:sofia",
|
||||
"microdao_id": "microdao:7",
|
||||
"channel_id": "channel-uuid"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"content": "Hello! How can I help you today?",
|
||||
"usage": {
|
||||
"prompt_tokens": 25,
|
||||
"completion_tokens": 10,
|
||||
"total_tokens": 35
|
||||
},
|
||||
"provider": "openai",
|
||||
"model_resolved": "gpt-4-1106-preview",
|
||||
"latency_ms": 1234.5
|
||||
}
|
||||
```
|
||||
|
||||
### GET /internal/llm/models
|
||||
|
||||
List available models:
|
||||
```json
|
||||
{
|
||||
"models": [
|
||||
{
|
||||
"name": "gpt-4.1-mini",
|
||||
"provider": "openai",
|
||||
"physical_name": "gpt-4-1106-preview",
|
||||
"max_tokens": 4096
|
||||
},
|
||||
...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### GET /internal/llm/usage?agent_id=agent:sofia
|
||||
|
||||
Get usage statistics:
|
||||
```json
|
||||
{
|
||||
"total_requests": 42,
|
||||
"total_tokens": 12345,
|
||||
"avg_latency_ms": 987.6,
|
||||
"success_rate": 0.98
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Edit `config.yaml`:
|
||||
|
||||
```yaml
|
||||
providers:
|
||||
openai:
|
||||
base_url: "https://api.openai.com/v1"
|
||||
api_key_env: "OPENAI_API_KEY"
|
||||
|
||||
local:
|
||||
base_url: "http://localhost:11434"
|
||||
|
||||
models:
|
||||
gpt-4.1-mini:
|
||||
provider: "openai"
|
||||
physical_name: "gpt-4-1106-preview"
|
||||
cost_per_1k_prompt: 0.01
|
||||
cost_per_1k_completion: 0.03
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
OPENAI_API_KEY=sk-... # OpenAI API key
|
||||
DEEPSEEK_API_KEY=sk-... # DeepSeek API key
|
||||
LLM_PROXY_SECRET=dev-secret-token # Internal auth token
|
||||
```
|
||||
|
||||
## Setup
|
||||
|
||||
### Local Development
|
||||
|
||||
```bash
|
||||
cd services/llm-proxy
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Set API keys
|
||||
export OPENAI_API_KEY="sk-..."
|
||||
|
||||
# Run
|
||||
python main.py
|
||||
```
|
||||
|
||||
### Docker
|
||||
|
||||
```bash
|
||||
docker build -t llm-proxy .
|
||||
docker run -p 7007:7007 \
|
||||
-e OPENAI_API_KEY="sk-..." \
|
||||
llm-proxy
|
||||
```
|
||||
|
||||
### With docker-compose
|
||||
|
||||
```bash
|
||||
docker-compose -f docker-compose.phase3.yml up llm-proxy
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
### Test OpenAI
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:7007/internal/llm/proxy \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-Internal-Secret: dev-secret-token" \
|
||||
-d '{
|
||||
"model": "gpt-4.1-mini",
|
||||
"messages": [
|
||||
{"role": "user", "content": "Say hello!"}
|
||||
],
|
||||
"metadata": {
|
||||
"agent_id": "agent:test"
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
### Test Local LLM
|
||||
|
||||
```bash
|
||||
# Start Ollama
|
||||
ollama serve
|
||||
ollama pull qwen2.5:8b
|
||||
|
||||
# Test
|
||||
curl -X POST http://localhost:7007/internal/llm/proxy \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-Internal-Secret: dev-secret-token" \
|
||||
-d '{
|
||||
"model": "dagi-local-8b",
|
||||
"messages": [
|
||||
{"role": "user", "content": "Test"}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
## Adding New Providers
|
||||
|
||||
1. Create `providers/my_provider.py`:
|
||||
|
||||
```python
|
||||
class MyProvider:
|
||||
def __init__(self, config: ProviderConfig):
|
||||
self.config = config
|
||||
|
||||
async def chat(self, messages, model_name, **kwargs) -> LLMResponse:
|
||||
# Implement provider logic
|
||||
...
|
||||
```
|
||||
|
||||
2. Register in `config.yaml`:
|
||||
|
||||
```yaml
|
||||
providers:
|
||||
my_provider:
|
||||
base_url: "https://api.myprovider.com"
|
||||
api_key_env: "MY_PROVIDER_KEY"
|
||||
|
||||
models:
|
||||
my-model:
|
||||
provider: "my_provider"
|
||||
physical_name: "my-model-v1"
|
||||
```
|
||||
|
||||
3. Initialize in `main.py`:
|
||||
|
||||
```python
|
||||
from providers.my_provider import MyProvider
|
||||
|
||||
providers["my_provider"] = MyProvider(provider_config)
|
||||
```
|
||||
|
||||
## Integration with agent-runtime
|
||||
|
||||
In `agent-runtime`:
|
||||
|
||||
```python
|
||||
import httpx
|
||||
|
||||
async def call_llm(agent_blueprint, messages):
|
||||
async with httpx.AsyncClient() as client:
|
||||
response = await client.post(
|
||||
"http://llm-proxy:7007/internal/llm/proxy",
|
||||
headers={"X-Internal-Secret": "dev-secret-token"},
|
||||
json={
|
||||
"model": agent_blueprint.llm_model,
|
||||
"messages": messages,
|
||||
"metadata": {
|
||||
"agent_id": agent_blueprint.id,
|
||||
"microdao_id": agent_blueprint.microdao_id
|
||||
}
|
||||
}
|
||||
)
|
||||
return response.json()
|
||||
```
|
||||
|
||||
## Roadmap
|
||||
|
||||
### Phase 3 (Current):
|
||||
- ✅ Multi-provider support
|
||||
- ✅ Basic rate limiting
|
||||
- ✅ Usage logging
|
||||
- ✅ OpenAI + DeepSeek + Local
|
||||
|
||||
### Phase 3.5:
|
||||
- 🔜 Streaming responses
|
||||
- 🔜 Response caching
|
||||
- 🔜 Function calling support
|
||||
- 🔜 Redis-backed rate limiting
|
||||
|
||||
### Phase 4:
|
||||
- 🔜 Database-backed usage logs
|
||||
- 🔜 Cost analytics
|
||||
- 🔜 Billing integration
|
||||
- 🔜 Advanced routing (fallbacks, load balancing)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Provider not working?**
|
||||
```bash
|
||||
# Check API key
|
||||
docker logs llm-proxy | grep "api_key"
|
||||
|
||||
# Test directly
|
||||
curl https://api.openai.com/v1/models \
|
||||
-H "Authorization: Bearer $OPENAI_API_KEY"
|
||||
```
|
||||
|
||||
**Rate limit issues?**
|
||||
```bash
|
||||
# Check current limits
|
||||
curl http://localhost:7007/internal/llm/usage?agent_id=agent:sofia \
|
||||
-H "X-Internal-Secret: dev-secret-token"
|
||||
```
|
||||
|
||||
**Local LLM not responding?**
|
||||
```bash
|
||||
# Check Ollama
|
||||
curl http://localhost:11434/api/version
|
||||
|
||||
# Check logs
|
||||
docker logs llm-proxy | grep "local"
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
agent-runtime
|
||||
↓
|
||||
POST /internal/llm/proxy
|
||||
↓
|
||||
llm-proxy:
|
||||
├─ Rate limiter (check agent quota)
|
||||
├─ Model router (logical → physical)
|
||||
├─ Provider selector (OpenAI/DeepSeek/Local)
|
||||
└─ Usage tracker (log tokens, cost, latency)
|
||||
↓
|
||||
[OpenAI API | DeepSeek API | Local Ollama]
|
||||
↓
|
||||
Response → agent-runtime
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
Internal DAARION service
|
||||
|
||||
---
|
||||
|
||||
**Status:** ✅ Phase 3 Ready
|
||||
**Version:** 1.0.0
|
||||
**Last Updated:** 2025-11-24
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user