feat: Add presence heartbeat for Matrix online status
- matrix-gateway: POST /internal/matrix/presence/online endpoint - usePresenceHeartbeat hook with activity tracking - Auto away after 5 min inactivity - Offline on page close/visibility change - Integrated in MatrixChatRoom component
This commit is contained in:
24
services/usage-engine/Dockerfile
Normal file
24
services/usage-engine/Dockerfile
Normal file
@@ -0,0 +1,24 @@
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Install dependencies
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Copy application
|
||||
COPY . .
|
||||
|
||||
# Expose port
|
||||
EXPOSE 7013
|
||||
|
||||
# Health check
|
||||
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
|
||||
CMD python -c "import httpx; httpx.get('http://localhost:7013/health').raise_for_status()"
|
||||
|
||||
# Run
|
||||
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7013"]
|
||||
|
||||
|
||||
|
||||
|
||||
363
services/usage-engine/README.md
Normal file
363
services/usage-engine/README.md
Normal file
@@ -0,0 +1,363 @@
|
||||
# Usage Engine
|
||||
|
||||
**Port:** 7013
|
||||
**Purpose:** Collect and report usage metrics for DAARION
|
||||
|
||||
## Features
|
||||
|
||||
✅ **Collectors (NATS):**
|
||||
- `usage.llm` — LLM call tracking
|
||||
- `usage.tool` — Tool execution tracking
|
||||
- `usage.agent` — Agent invocation tracking
|
||||
- `messaging.message.created` — Message tracking
|
||||
|
||||
✅ **Aggregators (PostgreSQL):**
|
||||
- Summary by microDAO, agent, time period
|
||||
- Model usage breakdown
|
||||
- Agent activity breakdown
|
||||
- Tool usage breakdown
|
||||
|
||||
✅ **API:**
|
||||
- `/internal/usage/summary` — Comprehensive usage report
|
||||
- `/internal/usage/models` — Model-specific usage
|
||||
- `/internal/usage/agents` — Agent-specific usage
|
||||
- `/internal/usage/tools` — Tool-specific usage
|
||||
|
||||
## API
|
||||
|
||||
### GET /internal/usage/summary
|
||||
|
||||
Get comprehensive usage summary:
|
||||
|
||||
```bash
|
||||
curl "http://localhost:7013/internal/usage/summary?microdao_id=microdao:7&period_hours=24"
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"summary": {
|
||||
"period_start": "2025-11-23T12:00:00Z",
|
||||
"period_end": "2025-11-24T12:00:00Z",
|
||||
"microdao_id": "microdao:7",
|
||||
"llm_calls_total": 145,
|
||||
"llm_tokens_total": 87432,
|
||||
"tool_calls_total": 23,
|
||||
"agent_invocations_total": 56,
|
||||
"messages_sent": 342
|
||||
},
|
||||
"models": [
|
||||
{
|
||||
"model": "gpt-4.1-mini",
|
||||
"provider": "openai",
|
||||
"calls": 120,
|
||||
"tokens": 75000,
|
||||
"avg_latency_ms": 1250
|
||||
}
|
||||
],
|
||||
"agents": [
|
||||
{
|
||||
"agent_id": "agent:sofia",
|
||||
"invocations": 45,
|
||||
"llm_calls": 120,
|
||||
"tool_calls": 15,
|
||||
"total_tokens": 60000
|
||||
}
|
||||
],
|
||||
"tools": [
|
||||
{
|
||||
"tool_id": "projects.list",
|
||||
"tool_name": "List Projects",
|
||||
"calls": 12,
|
||||
"success_rate": 0.95,
|
||||
"avg_latency_ms": 450
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Query Parameters
|
||||
|
||||
- `microdao_id` — Filter by microDAO (optional)
|
||||
- `agent_id` — Filter by agent (optional)
|
||||
- `period_hours` — Time period (1-720 hours, default 24)
|
||||
|
||||
### GET /internal/usage/models
|
||||
|
||||
Model usage breakdown:
|
||||
|
||||
```bash
|
||||
curl "http://localhost:7013/internal/usage/models?period_hours=168"
|
||||
```
|
||||
|
||||
### GET /internal/usage/agents
|
||||
|
||||
Agent activity breakdown:
|
||||
|
||||
```bash
|
||||
curl "http://localhost:7013/internal/usage/agents?microdao_id=microdao:7"
|
||||
```
|
||||
|
||||
### GET /internal/usage/tools
|
||||
|
||||
Tool execution breakdown:
|
||||
|
||||
```bash
|
||||
curl "http://localhost:7013/internal/usage/tools?period_hours=24"
|
||||
```
|
||||
|
||||
## NATS Integration
|
||||
|
||||
### Published Events (None)
|
||||
Usage Engine only consumes events.
|
||||
|
||||
### Consumed Events
|
||||
|
||||
#### 1. usage.llm
|
||||
From: `llm-proxy`
|
||||
|
||||
```json
|
||||
{
|
||||
"event_id": "evt-123",
|
||||
"timestamp": "2025-11-24T12:00:00Z",
|
||||
"actor_id": "user:93",
|
||||
"actor_type": "human",
|
||||
"agent_id": "agent:sofia",
|
||||
"microdao_id": "microdao:7",
|
||||
"model": "gpt-4.1-mini",
|
||||
"provider": "openai",
|
||||
"prompt_tokens": 450,
|
||||
"completion_tokens": 120,
|
||||
"total_tokens": 570,
|
||||
"latency_ms": 1250,
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
|
||||
#### 2. usage.tool
|
||||
From: `toolcore`
|
||||
|
||||
```json
|
||||
{
|
||||
"event_id": "evt-456",
|
||||
"timestamp": "2025-11-24T12:01:00Z",
|
||||
"actor_id": "agent:sofia",
|
||||
"actor_type": "agent",
|
||||
"agent_id": "agent:sofia",
|
||||
"microdao_id": "microdao:7",
|
||||
"tool_id": "projects.list",
|
||||
"tool_name": "List Projects",
|
||||
"success": true,
|
||||
"latency_ms": 450
|
||||
}
|
||||
```
|
||||
|
||||
#### 3. usage.agent
|
||||
From: `agent-runtime`
|
||||
|
||||
```json
|
||||
{
|
||||
"event_id": "evt-789",
|
||||
"timestamp": "2025-11-24T12:02:00Z",
|
||||
"agent_id": "agent:sofia",
|
||||
"microdao_id": "microdao:7",
|
||||
"channel_id": "channel-uuid",
|
||||
"trigger": "message",
|
||||
"duration_ms": 3450,
|
||||
"llm_calls": 2,
|
||||
"tool_calls": 1,
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
|
||||
#### 4. messaging.message.created
|
||||
From: `messaging-service`
|
||||
|
||||
```json
|
||||
{
|
||||
"channel_id": "channel-uuid",
|
||||
"message_id": "msg-uuid",
|
||||
"sender_id": "user:93",
|
||||
"sender_type": "human",
|
||||
"microdao_id": "microdao:7",
|
||||
"created_at": "2025-11-24T12:03:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
## Database Schema
|
||||
|
||||
### usage_llm
|
||||
```sql
|
||||
CREATE TABLE usage_llm (
|
||||
event_id TEXT PRIMARY KEY,
|
||||
timestamp TIMESTAMPTZ NOT NULL,
|
||||
actor_id TEXT NOT NULL,
|
||||
actor_type TEXT NOT NULL,
|
||||
agent_id TEXT,
|
||||
microdao_id TEXT,
|
||||
model TEXT NOT NULL,
|
||||
provider TEXT NOT NULL,
|
||||
prompt_tokens INT NOT NULL,
|
||||
completion_tokens INT NOT NULL,
|
||||
total_tokens INT NOT NULL,
|
||||
latency_ms INT NOT NULL,
|
||||
success BOOLEAN NOT NULL DEFAULT true,
|
||||
error TEXT,
|
||||
metadata JSONB
|
||||
);
|
||||
|
||||
CREATE INDEX idx_usage_llm_timestamp ON usage_llm(timestamp DESC);
|
||||
CREATE INDEX idx_usage_llm_microdao ON usage_llm(microdao_id, timestamp DESC);
|
||||
CREATE INDEX idx_usage_llm_agent ON usage_llm(agent_id, timestamp DESC);
|
||||
```
|
||||
|
||||
### usage_tool
|
||||
```sql
|
||||
CREATE TABLE usage_tool (
|
||||
event_id TEXT PRIMARY KEY,
|
||||
timestamp TIMESTAMPTZ NOT NULL,
|
||||
actor_id TEXT NOT NULL,
|
||||
actor_type TEXT NOT NULL,
|
||||
agent_id TEXT,
|
||||
microdao_id TEXT,
|
||||
tool_id TEXT NOT NULL,
|
||||
tool_name TEXT NOT NULL,
|
||||
success BOOLEAN NOT NULL,
|
||||
latency_ms INT NOT NULL,
|
||||
error TEXT,
|
||||
metadata JSONB
|
||||
);
|
||||
|
||||
CREATE INDEX idx_usage_tool_timestamp ON usage_tool(timestamp DESC);
|
||||
CREATE INDEX idx_usage_tool_microdao ON usage_tool(microdao_id, timestamp DESC);
|
||||
```
|
||||
|
||||
### usage_agent
|
||||
```sql
|
||||
CREATE TABLE usage_agent (
|
||||
event_id TEXT PRIMARY KEY,
|
||||
timestamp TIMESTAMPTZ NOT NULL,
|
||||
agent_id TEXT NOT NULL,
|
||||
microdao_id TEXT,
|
||||
channel_id TEXT,
|
||||
trigger TEXT NOT NULL,
|
||||
duration_ms INT NOT NULL,
|
||||
llm_calls INT DEFAULT 0,
|
||||
tool_calls INT DEFAULT 0,
|
||||
success BOOLEAN NOT NULL DEFAULT true,
|
||||
error TEXT,
|
||||
metadata JSONB
|
||||
);
|
||||
|
||||
CREATE INDEX idx_usage_agent_timestamp ON usage_agent(timestamp DESC);
|
||||
CREATE INDEX idx_usage_agent_id ON usage_agent(agent_id, timestamp DESC);
|
||||
```
|
||||
|
||||
### usage_message
|
||||
```sql
|
||||
CREATE TABLE usage_message (
|
||||
event_id TEXT PRIMARY KEY,
|
||||
timestamp TIMESTAMPTZ NOT NULL,
|
||||
actor_id TEXT NOT NULL,
|
||||
actor_type TEXT NOT NULL,
|
||||
microdao_id TEXT NOT NULL,
|
||||
channel_id TEXT NOT NULL,
|
||||
message_length INT NOT NULL,
|
||||
metadata JSONB
|
||||
);
|
||||
|
||||
CREATE INDEX idx_usage_message_timestamp ON usage_message(timestamp DESC);
|
||||
CREATE INDEX idx_usage_message_microdao ON usage_message(microdao_id, timestamp DESC);
|
||||
```
|
||||
|
||||
## Setup
|
||||
|
||||
### Local Development
|
||||
```bash
|
||||
cd services/usage-engine
|
||||
pip install -r requirements.txt
|
||||
export DATABASE_URL="postgresql://..."
|
||||
export NATS_URL="nats://localhost:4222"
|
||||
python main.py
|
||||
```
|
||||
|
||||
### Docker
|
||||
```bash
|
||||
docker build -t usage-engine .
|
||||
docker run -p 7013:7013 \
|
||||
-e DATABASE_URL="postgresql://..." \
|
||||
-e NATS_URL="nats://nats:4222" \
|
||||
usage-engine
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
### Publish Test Events
|
||||
```bash
|
||||
# LLM event
|
||||
nats pub usage.llm '{"event_id":"test-1","timestamp":"2025-11-24T12:00:00Z",...}'
|
||||
|
||||
# Check aggregation
|
||||
curl "http://localhost:7013/internal/usage/summary?period_hours=1"
|
||||
```
|
||||
|
||||
## Integration
|
||||
|
||||
### llm-proxy Integration
|
||||
After every LLM call:
|
||||
```python
|
||||
await publish_nats_event("usage.llm", {
|
||||
"event_id": str(uuid4()),
|
||||
"timestamp": datetime.utcnow().isoformat(),
|
||||
"model": model,
|
||||
"total_tokens": usage.total_tokens,
|
||||
# ...
|
||||
})
|
||||
```
|
||||
|
||||
### toolcore Integration
|
||||
After every tool execution:
|
||||
```python
|
||||
await publish_nats_event("usage.tool", {
|
||||
"event_id": str(uuid4()),
|
||||
"tool_id": tool_id,
|
||||
"success": success,
|
||||
# ...
|
||||
})
|
||||
```
|
||||
|
||||
### agent-runtime Integration
|
||||
After every agent invocation:
|
||||
```python
|
||||
await publish_nats_event("usage.agent", {
|
||||
"event_id": str(uuid4()),
|
||||
"agent_id": agent_id,
|
||||
"llm_calls": llm_call_count,
|
||||
"tool_calls": tool_call_count,
|
||||
# ...
|
||||
})
|
||||
```
|
||||
|
||||
## Roadmap
|
||||
|
||||
### Phase 4 (Current):
|
||||
- ✅ NATS collectors
|
||||
- ✅ PostgreSQL storage
|
||||
- ✅ Basic aggregation API
|
||||
|
||||
### Phase 5:
|
||||
- 🔜 Real-time dashboards (WebSockets)
|
||||
- 🔜 Cost estimation (per model)
|
||||
- 🔜 Billing integration
|
||||
- 🔜 Quota management
|
||||
- 🔜 Anomaly detection
|
||||
|
||||
---
|
||||
|
||||
**Status:** ✅ Phase 4 Ready
|
||||
**Version:** 1.0.0
|
||||
**Last Updated:** 2025-11-24
|
||||
|
||||
|
||||
|
||||
|
||||
239
services/usage-engine/aggregators.py
Normal file
239
services/usage-engine/aggregators.py
Normal file
@@ -0,0 +1,239 @@
|
||||
"""
|
||||
Usage Data Aggregators
|
||||
Queries and aggregates usage data from database
|
||||
"""
|
||||
import asyncpg
|
||||
from datetime import datetime, timedelta
|
||||
from typing import Optional, List
|
||||
|
||||
from models import (
|
||||
UsageSummary,
|
||||
ModelUsage,
|
||||
AgentUsage,
|
||||
ToolUsage,
|
||||
UsageQueryRequest
|
||||
)
|
||||
|
||||
class UsageAggregator:
|
||||
"""Aggregates usage data for reporting"""
|
||||
|
||||
def __init__(self, db_pool: asyncpg.Pool):
|
||||
self.db_pool = db_pool
|
||||
|
||||
async def get_summary(
|
||||
self,
|
||||
microdao_id: Optional[str] = None,
|
||||
agent_id: Optional[str] = None,
|
||||
period_hours: int = 24
|
||||
) -> UsageSummary:
|
||||
"""Get aggregated usage summary"""
|
||||
|
||||
period_start = datetime.utcnow() - timedelta(hours=period_hours)
|
||||
period_end = datetime.utcnow()
|
||||
|
||||
async with self.db_pool.acquire() as conn:
|
||||
# LLM stats
|
||||
llm_stats = await conn.fetchrow("""
|
||||
SELECT
|
||||
COUNT(*) as calls,
|
||||
SUM(total_tokens) as tokens_total,
|
||||
SUM(prompt_tokens) as tokens_prompt,
|
||||
SUM(completion_tokens) as tokens_completion,
|
||||
AVG(latency_ms) as latency_avg
|
||||
FROM usage_llm
|
||||
WHERE timestamp >= $1 AND timestamp <= $2
|
||||
AND ($3::text IS NULL OR microdao_id = $3)
|
||||
AND ($4::text IS NULL OR agent_id = $4)
|
||||
""", period_start, period_end, microdao_id, agent_id)
|
||||
|
||||
# Tool stats
|
||||
tool_stats = await conn.fetchrow("""
|
||||
SELECT
|
||||
COUNT(*) as calls,
|
||||
SUM(CASE WHEN success THEN 1 ELSE 0 END) as success,
|
||||
SUM(CASE WHEN NOT success THEN 1 ELSE 0 END) as failed,
|
||||
AVG(latency_ms) as latency_avg
|
||||
FROM usage_tool
|
||||
WHERE timestamp >= $1 AND timestamp <= $2
|
||||
AND ($3::text IS NULL OR microdao_id = $3)
|
||||
AND ($4::text IS NULL OR agent_id = $4)
|
||||
""", period_start, period_end, microdao_id, agent_id)
|
||||
|
||||
# Agent stats
|
||||
agent_stats = await conn.fetchrow("""
|
||||
SELECT
|
||||
COUNT(*) as invocations,
|
||||
SUM(CASE WHEN success THEN 1 ELSE 0 END) as success,
|
||||
SUM(CASE WHEN NOT success THEN 1 ELSE 0 END) as failed
|
||||
FROM usage_agent
|
||||
WHERE timestamp >= $1 AND timestamp <= $2
|
||||
AND ($3::text IS NULL OR microdao_id = $3)
|
||||
AND ($4::text IS NULL OR agent_id = $4)
|
||||
""", period_start, period_end, microdao_id, agent_id)
|
||||
|
||||
# Message stats
|
||||
message_stats = await conn.fetchrow("""
|
||||
SELECT
|
||||
COUNT(*) as sent,
|
||||
SUM(message_length) as total_length
|
||||
FROM usage_message
|
||||
WHERE timestamp >= $1 AND timestamp <= $2
|
||||
AND ($3::text IS NULL OR microdao_id = $3)
|
||||
""", period_start, period_end, microdao_id)
|
||||
|
||||
return UsageSummary(
|
||||
period_start=period_start,
|
||||
period_end=period_end,
|
||||
microdao_id=microdao_id,
|
||||
agent_id=agent_id,
|
||||
|
||||
llm_calls_total=llm_stats['calls'] or 0,
|
||||
llm_tokens_total=llm_stats['tokens_total'] or 0,
|
||||
llm_tokens_prompt=llm_stats['tokens_prompt'] or 0,
|
||||
llm_tokens_completion=llm_stats['tokens_completion'] or 0,
|
||||
llm_latency_avg_ms=float(llm_stats['latency_avg'] or 0),
|
||||
|
||||
tool_calls_total=tool_stats['calls'] or 0,
|
||||
tool_calls_success=tool_stats['success'] or 0,
|
||||
tool_calls_failed=tool_stats['failed'] or 0,
|
||||
tool_latency_avg_ms=float(tool_stats['latency_avg'] or 0),
|
||||
|
||||
agent_invocations_total=agent_stats['invocations'] or 0,
|
||||
agent_invocations_success=agent_stats['success'] or 0,
|
||||
agent_invocations_failed=agent_stats['failed'] or 0,
|
||||
|
||||
messages_sent=message_stats['sent'] or 0,
|
||||
messages_total_length=message_stats['total_length'] or 0
|
||||
)
|
||||
|
||||
async def get_model_breakdown(
|
||||
self,
|
||||
microdao_id: Optional[str] = None,
|
||||
period_hours: int = 24
|
||||
) -> List[ModelUsage]:
|
||||
"""Get usage breakdown by model"""
|
||||
|
||||
period_start = datetime.utcnow() - timedelta(hours=period_hours)
|
||||
period_end = datetime.utcnow()
|
||||
|
||||
async with self.db_pool.acquire() as conn:
|
||||
rows = await conn.fetch("""
|
||||
SELECT
|
||||
model,
|
||||
provider,
|
||||
COUNT(*) as calls,
|
||||
SUM(total_tokens) as tokens,
|
||||
AVG(latency_ms) as latency_avg
|
||||
FROM usage_llm
|
||||
WHERE timestamp >= $1 AND timestamp <= $2
|
||||
AND ($3::text IS NULL OR microdao_id = $3)
|
||||
GROUP BY model, provider
|
||||
ORDER BY tokens DESC
|
||||
LIMIT 20
|
||||
""", period_start, period_end, microdao_id)
|
||||
|
||||
return [
|
||||
ModelUsage(
|
||||
model=row['model'],
|
||||
provider=row['provider'],
|
||||
calls=row['calls'],
|
||||
tokens=row['tokens'] or 0,
|
||||
avg_latency_ms=float(row['latency_avg'] or 0)
|
||||
)
|
||||
for row in rows
|
||||
]
|
||||
|
||||
async def get_agent_breakdown(
|
||||
self,
|
||||
microdao_id: Optional[str] = None,
|
||||
period_hours: int = 24
|
||||
) -> List[AgentUsage]:
|
||||
"""Get usage breakdown by agent"""
|
||||
|
||||
period_start = datetime.utcnow() - timedelta(hours=period_hours)
|
||||
period_end = datetime.utcnow()
|
||||
|
||||
async with self.db_pool.acquire() as conn:
|
||||
rows = await conn.fetch("""
|
||||
SELECT
|
||||
a.agent_id,
|
||||
COUNT(DISTINCT a.event_id) as invocations,
|
||||
COALESCE(SUM(a.llm_calls), 0) as llm_calls,
|
||||
COALESCE(SUM(a.tool_calls), 0) as tool_calls,
|
||||
COALESCE(llm.tokens, 0) as total_tokens,
|
||||
COALESCE(msg.messages, 0) as messages_sent
|
||||
FROM usage_agent a
|
||||
LEFT JOIN (
|
||||
SELECT agent_id, SUM(total_tokens) as tokens
|
||||
FROM usage_llm
|
||||
WHERE timestamp >= $1 AND timestamp <= $2
|
||||
AND ($3::text IS NULL OR microdao_id = $3)
|
||||
GROUP BY agent_id
|
||||
) llm ON llm.agent_id = a.agent_id
|
||||
LEFT JOIN (
|
||||
SELECT actor_id, COUNT(*) as messages
|
||||
FROM usage_message
|
||||
WHERE timestamp >= $1 AND timestamp <= $2
|
||||
AND actor_type = 'agent'
|
||||
AND ($3::text IS NULL OR microdao_id = $3)
|
||||
GROUP BY actor_id
|
||||
) msg ON msg.actor_id = a.agent_id
|
||||
WHERE a.timestamp >= $1 AND a.timestamp <= $2
|
||||
AND ($3::text IS NULL OR a.microdao_id = $3)
|
||||
GROUP BY a.agent_id, llm.tokens, msg.messages
|
||||
ORDER BY invocations DESC
|
||||
LIMIT 20
|
||||
""", period_start, period_end, microdao_id)
|
||||
|
||||
return [
|
||||
AgentUsage(
|
||||
agent_id=row['agent_id'],
|
||||
invocations=row['invocations'],
|
||||
llm_calls=row['llm_calls'],
|
||||
tool_calls=row['tool_calls'],
|
||||
messages_sent=row['messages_sent'],
|
||||
total_tokens=row['total_tokens']
|
||||
)
|
||||
for row in rows
|
||||
]
|
||||
|
||||
async def get_tool_breakdown(
|
||||
self,
|
||||
microdao_id: Optional[str] = None,
|
||||
period_hours: int = 24
|
||||
) -> List[ToolUsage]:
|
||||
"""Get usage breakdown by tool"""
|
||||
|
||||
period_start = datetime.utcnow() - timedelta(hours=period_hours)
|
||||
period_end = datetime.utcnow()
|
||||
|
||||
async with self.db_pool.acquire() as conn:
|
||||
rows = await conn.fetch("""
|
||||
SELECT
|
||||
tool_id,
|
||||
tool_name,
|
||||
COUNT(*) as calls,
|
||||
AVG(CASE WHEN success THEN 1.0 ELSE 0.0 END) as success_rate,
|
||||
AVG(latency_ms) as latency_avg
|
||||
FROM usage_tool
|
||||
WHERE timestamp >= $1 AND timestamp <= $2
|
||||
AND ($3::text IS NULL OR microdao_id = $3)
|
||||
GROUP BY tool_id, tool_name
|
||||
ORDER BY calls DESC
|
||||
LIMIT 20
|
||||
""", period_start, period_end, microdao_id)
|
||||
|
||||
return [
|
||||
ToolUsage(
|
||||
tool_id=row['tool_id'],
|
||||
tool_name=row['tool_name'],
|
||||
calls=row['calls'],
|
||||
success_rate=float(row['success_rate'] or 0),
|
||||
avg_latency_ms=float(row['latency_avg'] or 0)
|
||||
)
|
||||
for row in rows
|
||||
]
|
||||
|
||||
|
||||
|
||||
|
||||
184
services/usage-engine/collectors.py
Normal file
184
services/usage-engine/collectors.py
Normal file
@@ -0,0 +1,184 @@
|
||||
"""
|
||||
Usage Event Collectors (NATS Listeners)
|
||||
Collects usage events from various services via NATS
|
||||
"""
|
||||
import json
|
||||
import asyncio
|
||||
import asyncpg
|
||||
from datetime import datetime
|
||||
from typing import Optional
|
||||
|
||||
from models import (
|
||||
LlmUsageEvent,
|
||||
ToolUsageEvent,
|
||||
AgentInvocationEvent,
|
||||
MessageUsageEvent,
|
||||
UsageEventType
|
||||
)
|
||||
|
||||
class UsageCollector:
|
||||
"""Collects and stores usage events from NATS"""
|
||||
|
||||
def __init__(self, nats_client, db_pool: asyncpg.Pool):
|
||||
self.nc = nats_client
|
||||
self.db_pool = db_pool
|
||||
self.subscriptions = []
|
||||
|
||||
async def start(self):
|
||||
"""Subscribe to all usage subjects"""
|
||||
print("🎧 Starting usage collectors...")
|
||||
|
||||
# Subscribe to LLM usage
|
||||
sub_llm = await self.nc.subscribe("usage.llm", cb=self._handle_llm_event)
|
||||
self.subscriptions.append(sub_llm)
|
||||
print("✅ Subscribed to usage.llm")
|
||||
|
||||
# Subscribe to Tool usage
|
||||
sub_tool = await self.nc.subscribe("usage.tool", cb=self._handle_tool_event)
|
||||
self.subscriptions.append(sub_tool)
|
||||
print("✅ Subscribed to usage.tool")
|
||||
|
||||
# Subscribe to Agent invocations
|
||||
sub_agent = await self.nc.subscribe("usage.agent", cb=self._handle_agent_event)
|
||||
self.subscriptions.append(sub_agent)
|
||||
print("✅ Subscribed to usage.agent")
|
||||
|
||||
# Subscribe to Message events
|
||||
sub_message = await self.nc.subscribe("messaging.message.created", cb=self._handle_message_event)
|
||||
self.subscriptions.append(sub_message)
|
||||
print("✅ Subscribed to messaging.message.created")
|
||||
|
||||
print("🎧 All collectors active")
|
||||
|
||||
async def stop(self):
|
||||
"""Unsubscribe from all subjects"""
|
||||
for sub in self.subscriptions:
|
||||
await sub.unsubscribe()
|
||||
print("🛑 All collectors stopped")
|
||||
|
||||
# ========================================================================
|
||||
# Event Handlers
|
||||
# ========================================================================
|
||||
|
||||
async def _handle_llm_event(self, msg):
|
||||
"""Handle LLM usage event"""
|
||||
try:
|
||||
data = json.loads(msg.data.decode())
|
||||
event = LlmUsageEvent(**data)
|
||||
await self._store_llm_event(event)
|
||||
print(f"📊 LLM usage: {event.model} | {event.total_tokens} tokens | {event.latency_ms}ms")
|
||||
except Exception as e:
|
||||
print(f"❌ Error handling LLM event: {e}")
|
||||
|
||||
async def _handle_tool_event(self, msg):
|
||||
"""Handle tool usage event"""
|
||||
try:
|
||||
data = json.loads(msg.data.decode())
|
||||
event = ToolUsageEvent(**data)
|
||||
await self._store_tool_event(event)
|
||||
print(f"📊 Tool usage: {event.tool_id} | success={event.success} | {event.latency_ms}ms")
|
||||
except Exception as e:
|
||||
print(f"❌ Error handling tool event: {e}")
|
||||
|
||||
async def _handle_agent_event(self, msg):
|
||||
"""Handle agent invocation event"""
|
||||
try:
|
||||
data = json.loads(msg.data.decode())
|
||||
event = AgentInvocationEvent(**data)
|
||||
await self._store_agent_event(event)
|
||||
print(f"📊 Agent invocation: {event.agent_id} | {event.duration_ms}ms | LLM:{event.llm_calls} Tool:{event.tool_calls}")
|
||||
except Exception as e:
|
||||
print(f"❌ Error handling agent event: {e}")
|
||||
|
||||
async def _handle_message_event(self, msg):
|
||||
"""Handle message sent event"""
|
||||
try:
|
||||
data = json.loads(msg.data.decode())
|
||||
# Convert messaging event to usage event
|
||||
event = MessageUsageEvent(
|
||||
event_id=data.get("message_id", "unknown"),
|
||||
timestamp=datetime.fromisoformat(data.get("created_at", datetime.utcnow().isoformat())),
|
||||
actor_id=data.get("sender_id", "unknown"),
|
||||
actor_type=data.get("sender_type", "human"),
|
||||
microdao_id=data.get("microdao_id", "unknown"),
|
||||
channel_id=data.get("channel_id", "unknown"),
|
||||
message_length=len(data.get("content_preview", "")),
|
||||
metadata={"matrix_event_id": data.get("matrix_event_id")}
|
||||
)
|
||||
await self._store_message_event(event)
|
||||
print(f"📊 Message sent: {event.actor_id} | {event.message_length} chars")
|
||||
except Exception as e:
|
||||
print(f"❌ Error handling message event: {e}")
|
||||
|
||||
# ========================================================================
|
||||
# Database Storage
|
||||
# ========================================================================
|
||||
|
||||
async def _store_llm_event(self, event: LlmUsageEvent):
|
||||
"""Store LLM usage event to database"""
|
||||
async with self.db_pool.acquire() as conn:
|
||||
await conn.execute("""
|
||||
INSERT INTO usage_llm
|
||||
(event_id, timestamp, actor_id, actor_type, agent_id, microdao_id,
|
||||
model, provider, prompt_tokens, completion_tokens, total_tokens,
|
||||
latency_ms, success, error, metadata)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15)
|
||||
ON CONFLICT (event_id) DO NOTHING
|
||||
""",
|
||||
event.event_id, event.timestamp, event.actor_id, event.actor_type.value,
|
||||
event.agent_id, event.microdao_id, event.model, event.provider,
|
||||
event.prompt_tokens, event.completion_tokens, event.total_tokens,
|
||||
event.latency_ms, event.success, event.error,
|
||||
json.dumps(event.metadata or {})
|
||||
)
|
||||
|
||||
async def _store_tool_event(self, event: ToolUsageEvent):
|
||||
"""Store tool usage event to database"""
|
||||
async with self.db_pool.acquire() as conn:
|
||||
await conn.execute("""
|
||||
INSERT INTO usage_tool
|
||||
(event_id, timestamp, actor_id, actor_type, agent_id, microdao_id,
|
||||
tool_id, tool_name, success, latency_ms, error, metadata)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12)
|
||||
ON CONFLICT (event_id) DO NOTHING
|
||||
""",
|
||||
event.event_id, event.timestamp, event.actor_id, event.actor_type.value,
|
||||
event.agent_id, event.microdao_id, event.tool_id, event.tool_name,
|
||||
event.success, event.latency_ms, event.error,
|
||||
json.dumps(event.metadata or {})
|
||||
)
|
||||
|
||||
async def _store_agent_event(self, event: AgentInvocationEvent):
|
||||
"""Store agent invocation event to database"""
|
||||
async with self.db_pool.acquire() as conn:
|
||||
await conn.execute("""
|
||||
INSERT INTO usage_agent
|
||||
(event_id, timestamp, agent_id, microdao_id, channel_id,
|
||||
trigger, duration_ms, llm_calls, tool_calls, success, error, metadata)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12)
|
||||
ON CONFLICT (event_id) DO NOTHING
|
||||
""",
|
||||
event.event_id, event.timestamp, event.agent_id, event.microdao_id,
|
||||
event.channel_id, event.trigger, event.duration_ms, event.llm_calls,
|
||||
event.tool_calls, event.success, event.error,
|
||||
json.dumps(event.metadata or {})
|
||||
)
|
||||
|
||||
async def _store_message_event(self, event: MessageUsageEvent):
|
||||
"""Store message usage event to database"""
|
||||
async with self.db_pool.acquire() as conn:
|
||||
await conn.execute("""
|
||||
INSERT INTO usage_message
|
||||
(event_id, timestamp, actor_id, actor_type, microdao_id, channel_id,
|
||||
message_length, metadata)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
|
||||
ON CONFLICT (event_id) DO NOTHING
|
||||
""",
|
||||
event.event_id, event.timestamp, event.actor_id, event.actor_type.value,
|
||||
event.microdao_id, event.channel_id, event.message_length,
|
||||
json.dumps(event.metadata or {})
|
||||
)
|
||||
|
||||
|
||||
|
||||
|
||||
221
services/usage-engine/main.py
Normal file
221
services/usage-engine/main.py
Normal file
@@ -0,0 +1,221 @@
|
||||
"""
|
||||
DAARION Usage Engine
|
||||
Port: 7013
|
||||
Collects and reports usage metrics (LLM, Tools, Agents, Messages)
|
||||
"""
|
||||
import os
|
||||
import asyncio
|
||||
import asyncpg
|
||||
import nats
|
||||
from contextlib import asynccontextmanager
|
||||
from fastapi import FastAPI, HTTPException, Query
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from typing import Optional
|
||||
|
||||
from models import UsageQueryRequest, UsageQueryResponse
|
||||
from collectors import UsageCollector
|
||||
from aggregators import UsageAggregator
|
||||
|
||||
# ============================================================================
|
||||
# Configuration
|
||||
# ============================================================================
|
||||
|
||||
DATABASE_URL = os.getenv("DATABASE_URL", "postgresql://postgres:postgres@localhost:5432/daarion")
|
||||
NATS_URL = os.getenv("NATS_URL", "nats://nats:4222")
|
||||
|
||||
# ============================================================================
|
||||
# Global State
|
||||
# ============================================================================
|
||||
|
||||
db_pool: Optional[asyncpg.Pool] = None
|
||||
nc: Optional[nats.NATS] = None
|
||||
collector: Optional[UsageCollector] = None
|
||||
aggregator: Optional[UsageAggregator] = None
|
||||
|
||||
# ============================================================================
|
||||
# App Setup
|
||||
# ============================================================================
|
||||
|
||||
@asynccontextmanager
|
||||
async def lifespan(app: FastAPI):
|
||||
"""Startup and shutdown"""
|
||||
global db_pool, nc, collector, aggregator
|
||||
|
||||
print("🚀 Starting Usage Engine...")
|
||||
|
||||
# Database
|
||||
db_pool = await asyncpg.create_pool(DATABASE_URL, min_size=2, max_size=10)
|
||||
print("✅ Database pool created")
|
||||
|
||||
# NATS
|
||||
try:
|
||||
nc = await nats.connect(NATS_URL)
|
||||
print(f"✅ Connected to NATS at {NATS_URL}")
|
||||
except Exception as e:
|
||||
print(f"❌ Failed to connect to NATS: {e}")
|
||||
nc = None
|
||||
|
||||
# Collector
|
||||
if nc:
|
||||
collector = UsageCollector(nc, db_pool)
|
||||
await collector.start()
|
||||
else:
|
||||
print("⚠️ NATS not available, collector disabled")
|
||||
|
||||
# Aggregator
|
||||
aggregator = UsageAggregator(db_pool)
|
||||
print("✅ Aggregator ready")
|
||||
|
||||
print("✅ Usage Engine ready")
|
||||
|
||||
yield
|
||||
|
||||
# Shutdown
|
||||
print("🛑 Shutting down Usage Engine...")
|
||||
if collector:
|
||||
await collector.stop()
|
||||
if nc:
|
||||
await nc.close()
|
||||
if db_pool:
|
||||
await db_pool.close()
|
||||
|
||||
app = FastAPI(
|
||||
title="DAARION Usage Engine",
|
||||
version="1.0.0",
|
||||
description="Usage tracking and reporting for LLM, Tools, Agents",
|
||||
lifespan=lifespan
|
||||
)
|
||||
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
allow_credentials=True,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
# ============================================================================
|
||||
# API Endpoints
|
||||
# ============================================================================
|
||||
|
||||
@app.get("/internal/usage/summary", response_model=UsageQueryResponse)
|
||||
async def get_usage_summary(
|
||||
microdao_id: Optional[str] = Query(None),
|
||||
agent_id: Optional[str] = Query(None),
|
||||
period_hours: int = Query(24, ge=1, le=720)
|
||||
):
|
||||
"""
|
||||
Get aggregated usage summary
|
||||
|
||||
Query parameters:
|
||||
- microdao_id: Filter by microDAO (optional)
|
||||
- agent_id: Filter by agent (optional)
|
||||
- period_hours: Time period (1-720 hours, default 24)
|
||||
"""
|
||||
|
||||
if not aggregator:
|
||||
raise HTTPException(500, "Aggregator not initialized")
|
||||
|
||||
# Get summary
|
||||
summary = await aggregator.get_summary(
|
||||
microdao_id=microdao_id,
|
||||
agent_id=agent_id,
|
||||
period_hours=period_hours
|
||||
)
|
||||
|
||||
# Get breakdowns
|
||||
models = await aggregator.get_model_breakdown(
|
||||
microdao_id=microdao_id,
|
||||
period_hours=period_hours
|
||||
)
|
||||
|
||||
agents = await aggregator.get_agent_breakdown(
|
||||
microdao_id=microdao_id,
|
||||
period_hours=period_hours
|
||||
)
|
||||
|
||||
tools = await aggregator.get_tool_breakdown(
|
||||
microdao_id=microdao_id,
|
||||
period_hours=period_hours
|
||||
)
|
||||
|
||||
return UsageQueryResponse(
|
||||
summary=summary,
|
||||
models=models,
|
||||
agents=agents,
|
||||
tools=tools
|
||||
)
|
||||
|
||||
@app.get("/internal/usage/models")
|
||||
async def get_model_usage(
|
||||
microdao_id: Optional[str] = Query(None),
|
||||
period_hours: int = Query(24, ge=1, le=720)
|
||||
):
|
||||
"""Get usage breakdown by model"""
|
||||
|
||||
if not aggregator:
|
||||
raise HTTPException(500, "Aggregator not initialized")
|
||||
|
||||
models = await aggregator.get_model_breakdown(
|
||||
microdao_id=microdao_id,
|
||||
period_hours=period_hours
|
||||
)
|
||||
|
||||
return {"models": models}
|
||||
|
||||
@app.get("/internal/usage/agents")
|
||||
async def get_agent_usage(
|
||||
microdao_id: Optional[str] = Query(None),
|
||||
period_hours: int = Query(24, ge=1, le=720)
|
||||
):
|
||||
"""Get usage breakdown by agent"""
|
||||
|
||||
if not aggregator:
|
||||
raise HTTPException(500, "Aggregator not initialized")
|
||||
|
||||
agents = await aggregator.get_agent_breakdown(
|
||||
microdao_id=microdao_id,
|
||||
period_hours=period_hours
|
||||
)
|
||||
|
||||
return {"agents": agents}
|
||||
|
||||
@app.get("/internal/usage/tools")
|
||||
async def get_tool_usage(
|
||||
microdao_id: Optional[str] = Query(None),
|
||||
period_hours: int = Query(24, ge=1, le=720)
|
||||
):
|
||||
"""Get usage breakdown by tool"""
|
||||
|
||||
if not aggregator:
|
||||
raise HTTPException(500, "Aggregator not initialized")
|
||||
|
||||
tools = await aggregator.get_tool_breakdown(
|
||||
microdao_id=microdao_id,
|
||||
period_hours=period_hours
|
||||
)
|
||||
|
||||
return {"tools": tools}
|
||||
|
||||
@app.get("/health")
|
||||
async def health():
|
||||
"""Health check"""
|
||||
return {
|
||||
"status": "ok",
|
||||
"service": "usage-engine",
|
||||
"nats_connected": nc is not None,
|
||||
"collector_active": collector is not None,
|
||||
"aggregator_ready": aggregator is not None
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Run
|
||||
# ============================================================================
|
||||
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
uvicorn.run(app, host="0.0.0.0", port=7013)
|
||||
|
||||
|
||||
|
||||
|
||||
161
services/usage-engine/models.py
Normal file
161
services/usage-engine/models.py
Normal file
@@ -0,0 +1,161 @@
|
||||
"""
|
||||
Usage Engine Data Models
|
||||
Tracks LLM calls, tool executions, agent invocations
|
||||
"""
|
||||
from pydantic import BaseModel, Field
|
||||
from datetime import datetime
|
||||
from typing import Optional, Dict, Any
|
||||
from enum import Enum
|
||||
|
||||
class UsageEventType(str, Enum):
|
||||
LLM_CALL = "llm_call"
|
||||
TOOL_CALL = "tool_call"
|
||||
AGENT_INVOCATION = "agent_invocation"
|
||||
MESSAGE_SENT = "message_sent"
|
||||
|
||||
class ActorType(str, Enum):
|
||||
HUMAN = "human"
|
||||
AGENT = "agent"
|
||||
SERVICE = "service"
|
||||
|
||||
# ============================================================================
|
||||
# Usage Events (inbound from NATS)
|
||||
# ============================================================================
|
||||
|
||||
class LlmUsageEvent(BaseModel):
|
||||
"""LLM call usage event from llm-proxy"""
|
||||
event_id: str
|
||||
timestamp: datetime
|
||||
actor_id: str
|
||||
actor_type: ActorType
|
||||
agent_id: Optional[str] = None
|
||||
microdao_id: Optional[str] = None
|
||||
model: str
|
||||
provider: str # "openai", "deepseek", "local"
|
||||
prompt_tokens: int
|
||||
completion_tokens: int
|
||||
total_tokens: int
|
||||
latency_ms: int
|
||||
success: bool = True
|
||||
error: Optional[str] = None
|
||||
metadata: Optional[Dict[str, Any]] = None
|
||||
|
||||
class ToolUsageEvent(BaseModel):
|
||||
"""Tool execution usage event from toolcore"""
|
||||
event_id: str
|
||||
timestamp: datetime
|
||||
actor_id: str
|
||||
actor_type: ActorType
|
||||
agent_id: Optional[str] = None
|
||||
microdao_id: Optional[str] = None
|
||||
tool_id: str
|
||||
tool_name: str
|
||||
success: bool
|
||||
latency_ms: int
|
||||
error: Optional[str] = None
|
||||
metadata: Optional[Dict[str, Any]] = None
|
||||
|
||||
class AgentInvocationEvent(BaseModel):
|
||||
"""Agent invocation usage event from agent-runtime"""
|
||||
event_id: str
|
||||
timestamp: datetime
|
||||
agent_id: str
|
||||
microdao_id: Optional[str] = None
|
||||
channel_id: Optional[str] = None
|
||||
trigger: str # "message", "scheduled", "manual"
|
||||
duration_ms: int
|
||||
llm_calls: int = 0
|
||||
tool_calls: int = 0
|
||||
success: bool = True
|
||||
error: Optional[str] = None
|
||||
metadata: Optional[Dict[str, Any]] = None
|
||||
|
||||
class MessageUsageEvent(BaseModel):
|
||||
"""Message sent usage event from messaging-service"""
|
||||
event_id: str
|
||||
timestamp: datetime
|
||||
actor_id: str
|
||||
actor_type: ActorType
|
||||
microdao_id: str
|
||||
channel_id: str
|
||||
message_length: int
|
||||
metadata: Optional[Dict[str, Any]] = None
|
||||
|
||||
# ============================================================================
|
||||
# Aggregated Usage Reports (outbound API)
|
||||
# ============================================================================
|
||||
|
||||
class UsageSummary(BaseModel):
|
||||
"""Aggregated usage summary"""
|
||||
period_start: datetime
|
||||
period_end: datetime
|
||||
microdao_id: Optional[str] = None
|
||||
agent_id: Optional[str] = None
|
||||
|
||||
# LLM stats
|
||||
llm_calls_total: int = 0
|
||||
llm_tokens_total: int = 0
|
||||
llm_tokens_prompt: int = 0
|
||||
llm_tokens_completion: int = 0
|
||||
llm_latency_avg_ms: float = 0.0
|
||||
|
||||
# Tool stats
|
||||
tool_calls_total: int = 0
|
||||
tool_calls_success: int = 0
|
||||
tool_calls_failed: int = 0
|
||||
tool_latency_avg_ms: float = 0.0
|
||||
|
||||
# Agent stats
|
||||
agent_invocations_total: int = 0
|
||||
agent_invocations_success: int = 0
|
||||
agent_invocations_failed: int = 0
|
||||
|
||||
# Message stats
|
||||
messages_sent: int = 0
|
||||
messages_total_length: int = 0
|
||||
|
||||
class ModelUsage(BaseModel):
|
||||
"""Usage by model"""
|
||||
model: str
|
||||
provider: str
|
||||
calls: int
|
||||
tokens: int
|
||||
avg_latency_ms: float
|
||||
|
||||
class AgentUsage(BaseModel):
|
||||
"""Usage by agent"""
|
||||
agent_id: str
|
||||
invocations: int
|
||||
llm_calls: int
|
||||
tool_calls: int
|
||||
messages_sent: int
|
||||
total_tokens: int
|
||||
|
||||
class ToolUsage(BaseModel):
|
||||
"""Usage by tool"""
|
||||
tool_id: str
|
||||
tool_name: str
|
||||
calls: int
|
||||
success_rate: float
|
||||
avg_latency_ms: float
|
||||
|
||||
# ============================================================================
|
||||
# API Request/Response Models
|
||||
# ============================================================================
|
||||
|
||||
class UsageQueryRequest(BaseModel):
|
||||
"""Request for usage summary"""
|
||||
microdao_id: Optional[str] = None
|
||||
agent_id: Optional[str] = None
|
||||
period_hours: int = Field(24, ge=1, le=720) # 1h - 30 days
|
||||
|
||||
class UsageQueryResponse(BaseModel):
|
||||
"""Response for usage summary"""
|
||||
summary: UsageSummary
|
||||
models: list[ModelUsage] = []
|
||||
agents: list[AgentUsage] = []
|
||||
tools: list[ToolUsage] = []
|
||||
|
||||
|
||||
|
||||
|
||||
10
services/usage-engine/requirements.txt
Normal file
10
services/usage-engine/requirements.txt
Normal file
@@ -0,0 +1,10 @@
|
||||
fastapi==0.109.0
|
||||
uvicorn[standard]==0.27.0
|
||||
pydantic==2.5.3
|
||||
asyncpg==0.29.0
|
||||
nats-py==2.6.0
|
||||
python-multipart==0.0.6
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user