feat: Add presence heartbeat for Matrix online status

- matrix-gateway: POST /internal/matrix/presence/online endpoint - usePresenceHeartbeat hook with activity tracking - Auto away after 5 min inactivity - Offline on page close/visibility change - Integrated in MatrixChatRoom component
2025-11-27 00:19:40 -08:00
parent 5bed515852
commit 3de3c8cb36
6371 changed files with 1317450 additions and 932 deletions
--- a/docs/PHASE4_DETAILED_PLAN.md
+++ b/docs/PHASE4_DETAILED_PLAN.md
@@ -0,0 +1,606 @@
+# 📋 PHASE 4: SECURITY LAYER — Детальний План
+
+**Мета:** Повноцінний безпековий шар для DAARION  
+**Термін:** 4-6 тижнів (або 3-4 години automated)  
+**Залежності:** Phase 1-3 complete
+
+---
+
+## 🎯 OVERVIEW
+
+Phase 4 додає критичну інфраструктуру безпеки:
+
+```
+┌─────────────────────────────────────────┐
+│ SECURITY LAYER (Phase 4)                │
+├─────────────────────────────────────────┤
+│                                         │
+│  1. AUTH SERVICE                        │
+│     └─ Identity & Sessions              │
+│                                         │
+│  2. PDP SERVICE (Policy Decision)       │
+│     └─ Centralized access control       │
+│                                         │
+│  3. PEP HOOKS (Policy Enforcement)      │
+│     └─ Enforce decisions in services    │
+│                                         │
+│  4. USAGE ENGINE                        │
+│     └─ Track LLM/Tools/Agent usage      │
+│                                         │
+│  5. AUDIT LOG                           │
+│     └─ Security events & compliance     │
+│                                         │
+└─────────────────────────────────────────┘
+```
+
+---
+
+## 📦 DELIVERABLES (40+ files)
+
+### 1. **auth-service** (8 files) ✅ COMPLETE
+```
+services/auth-service/
+├── models.py              ✅ ActorIdentity, SessionToken, ApiKey
+├── actor_context.py       ✅ build_actor_context, require_actor
+├── routes_sessions.py     ✅ /auth/login, /me, /logout
+├── routes_api_keys.py     ✅ /auth/api-keys CRUD
+├── main.py                ✅ FastAPI app + DB tables
+├── requirements.txt       ✅
+├── Dockerfile             ✅
+└── README.md              ✅ Complete documentation
+```
+
+**Port:** 7011  
+**Status:** ✅ Working  
+**Features:**
+- Mock login (3 test users)
+- Session tokens (7-day expiry)
+- API keys with optional expiration
+- ActorContext helper for other services
+
+---
+
+### 2. **pdp-service** (8 files) 🔄 20% COMPLETE
+```
+services/pdp-service/
+├── models.py              ✅ PolicyRequest, PolicyDecision
+├── engine.py              🔜 Policy evaluation logic
+├── policy_store.py        🔜 Config-based policy storage
+├── main.py                🔜 FastAPI app
+├── config.yaml            🔜 microDAO/channel policies
+├── requirements.txt       🔜
+├── Dockerfile             🔜
+└── README.md              🔜 Complete documentation
+```
+
+**Port:** 7012  
+**Purpose:** Centralized Policy Decision Point
+
+**Key Features:**
+- Evaluate access requests (actor + action + resource)
+- Config-based policies (v1)
+- Support for:
+  - MicroDAO access (owner/admin/member)
+  - Channel access (SEND_MESSAGE, READ)
+  - Tool execution (EXEC_TOOL)
+  - Agent management (MANAGE)
+  - Usage viewing (VIEW_USAGE)
+
+**Policy Types:**
+
+#### MicroDAO Policies
+```yaml
+microdao_policies:
+  - microdao_id: "microdao:daarion"
+    owners: ["user:1"]
+    admins: ["user:1", "user:93"]
+    members: ["user:*"]  # All users
+```
+
+#### Channel Policies
+```yaml
+channel_policies:
+  - channel_id: "channel-uuid-123"
+    microdao_id: "microdao:daarion"
+    allowed_roles: ["member", "admin", "owner"]
+    blocked_users: []
+```
+
+#### Tool Policies
+```yaml
+tool_policies:
+  - tool_id: "projects.list"
+    allowed_agents: ["agent:sofia", "agent:pm"]
+    allowed_user_roles: ["admin", "owner"]
+```
+
+**Policy Evaluation Logic:**
+
+```python
+def evaluate(request: PolicyRequest) -> PolicyDecision:
+    # 1. System Admin bypass (careful!)
+    if "system_admin" in request.actor.roles:
+        return permit("system_admin")
+    
+    # 2. Resource-specific rules
+    if request.resource.type == "microdao":
+        if is_microdao_owner(actor, resource):
+            return permit("microdao_owner")
+        if is_microdao_admin(actor, resource):
+            return permit("microdao_admin")
+        if request.action == "read" and is_member(actor, resource):
+            return permit("member")
+        return deny("not_authorized")
+    
+    if request.resource.type == "channel":
+        if not is_channel_member(actor, resource):
+            return deny("not_channel_member")
+        if request.action == "send_message":
+            if is_blocked(actor, resource):
+                return deny("blocked")
+            return permit("channel_member")
+    
+    if request.resource.type == "tool":
+        if actor.actor_id in tool.allowed_agents:
+            return permit("allowed_agent")
+        return deny("tool_not_allowed")
+    
+    # Default deny
+    return deny("no_matching_policy")
+```
+
+---
+
+### 3. **usage-engine** (8 files) 🔜 0% COMPLETE
+```
+services/usage-engine/
+├── models.py              🔜 LlmUsageEvent, ToolUsageEvent
+├── collectors.py          🔜 NATS listeners
+├── aggregators.py         🔜 Aggregate stats
+├── reporters.py           🔜 API endpoints
+├── main.py                🔜 FastAPI app
+├── requirements.txt       🔜
+├── Dockerfile             🔜
+└── README.md              🔜 Complete documentation
+```
+
+**Port:** 7013  
+**Purpose:** Usage tracking & billing foundation
+
+**NATS Subjects:**
+- `usage.llm` — LLM calls (from llm-proxy)
+- `usage.tool` — Tool executions (from toolcore)
+- `usage.agent` — Agent invocations (from agent-runtime)
+
+**Events:**
+
+#### LLM Usage Event
+```json
+{
+  "event_id": "evt-123",
+  "timestamp": "2025-11-24T12:34:56Z",
+  "actor": {
+    "actor_id": "user:93",
+    "actor_type": "human",
+    "microdao_ids": ["microdao:7"]
+  },
+  "agent_id": "agent:sofia",
+  "microdao_id": "microdao:7",
+  "model": "gpt-4.1-mini",
+  "provider": "openai",
+  "prompt_tokens": 1234,
+  "completion_tokens": 567,
+  "total_tokens": 1801,
+  "latency_ms": 2345,
+  "cost_usd": 0.0234
+}
+```
+
+#### Tool Usage Event
+```json
+{
+  "event_id": "evt-456",
+  "timestamp": "2025-11-24T12:35:00Z",
+  "actor": {
+    "actor_id": "agent:sofia",
+    "actor_type": "agent"
+  },
+  "agent_id": "agent:sofia",
+  "microdao_id": "microdao:7",
+  "tool_id": "projects.list",
+  "success": true,
+  "latency_ms": 123,
+  "result_size_bytes": 4567
+}
+```
+
+**API Endpoints:**
+
+```http
+GET /internal/usage/summary?microdao_id=microdao:7&period=24h
+→ Aggregate stats (tokens, calls, cost)
+
+GET /internal/usage/agents?microdao_id=microdao:7&period=7d
+→ Top agents by usage
+
+GET /internal/usage/models?period=24h
+→ Model distribution
+
+GET /internal/usage/costs?microdao_id=microdao:7&period=30d
+→ Cost breakdown
+```
+
+**Database Tables:**
+
+```sql
+CREATE TABLE usage_llm (
+    id UUID PRIMARY KEY,
+    timestamp TIMESTAMPTZ NOT NULL,
+    actor_id TEXT NOT NULL,
+    agent_id TEXT,
+    microdao_id TEXT,
+    model TEXT NOT NULL,
+    provider TEXT NOT NULL,
+    prompt_tokens INT NOT NULL,
+    completion_tokens INT NOT NULL,
+    total_tokens INT NOT NULL,
+    latency_ms INT,
+    cost_usd DECIMAL(10, 6)
+);
+
+CREATE TABLE usage_tool (
+    id UUID PRIMARY KEY,
+    timestamp TIMESTAMPTZ NOT NULL,
+    actor_id TEXT NOT NULL,
+    agent_id TEXT,
+    microdao_id TEXT,
+    tool_id TEXT NOT NULL,
+    success BOOLEAN NOT NULL,
+    latency_ms INT,
+    result_size_bytes INT
+);
+
+-- Indexes for fast queries
+CREATE INDEX idx_usage_llm_microdao_time ON usage_llm(microdao_id, timestamp DESC);
+CREATE INDEX idx_usage_llm_agent ON usage_llm(agent_id, timestamp DESC);
+CREATE INDEX idx_usage_tool_microdao ON usage_tool(microdao_id, timestamp DESC);
+```
+
+---
+
+### 4. **PEP Integration** (3 services) 🔜 0% COMPLETE
+
+#### 4.1 messaging-service PEP
+**File:** `services/messaging-service/pep_middleware.py`
+
+```python
+from auth_service_client import get_actor_context
+from pdp_service_client import evaluate_policy
+
+async def check_send_message_permission(
+    actor_id: str,
+    channel_id: str,
+    db_pool: asyncpg.Pool
+) -> bool:
+    """Check if actor can send message to channel"""
+    
+    # 1. Get actor context
+    actor = await get_actor_context(actor_id, db_pool)
+    
+    # 2. Evaluate policy
+    decision = await evaluate_policy(
+        actor=actor,
+        action="send_message",
+        resource={"type": "channel", "id": channel_id}
+    )
+    
+    # 3. Return decision
+    return decision.effect == "permit"
+```
+
+**Integration Points:**
+- `POST /api/messaging/channels/{channel_id}/messages` — check before send
+- `POST /api/messaging/channels` — check MANAGE permission
+- `POST /api/messaging/channels/{channel_id}/members` — check INVITE permission
+
+#### 4.2 agent-runtime PEP
+**File:** `services/agent-runtime/pep_client.py`
+
+```python
+async def check_tool_execution_permission(
+    agent_id: str,
+    tool_id: str,
+    microdao_id: str
+) -> bool:
+    """Check if agent can execute tool"""
+    
+    # Build agent actor
+    actor = ActorIdentity(
+        actor_id=agent_id,
+        actor_type="agent",
+        microdao_ids=[microdao_id],
+        roles=["agent"]
+    )
+    
+    # Evaluate
+    decision = await evaluate_policy(
+        actor=actor,
+        action="exec_tool",
+        resource={"type": "tool", "id": tool_id}
+    )
+    
+    return decision.effect == "permit"
+```
+
+**Integration:** Before calling toolcore in `handle_invocation()`
+
+#### 4.3 toolcore PEP
+**Already has:** `allowed_agents` in registry  
+**Additional:** Cross-check with PDP for user-initiated tool calls
+
+---
+
+### 5. **Audit Log** (1 migration) 🔜 0% COMPLETE
+
+**File:** `migrations/004_create_security_audit.sql`
+
+```sql
+CREATE TABLE security_audit (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+    actor_id TEXT NOT NULL,
+    actor_type TEXT NOT NULL,
+    action TEXT NOT NULL,
+    resource_type TEXT NOT NULL,
+    resource_id TEXT NOT NULL,
+    decision TEXT NOT NULL,  -- permit/deny
+    reason TEXT,
+    context JSONB,
+    ip_address INET,
+    user_agent TEXT
+);
+
+CREATE INDEX idx_audit_timestamp ON security_audit(timestamp DESC);
+CREATE INDEX idx_audit_actor ON security_audit(actor_id, timestamp DESC);
+CREATE INDEX idx_audit_decision ON security_audit(decision, timestamp DESC);
+CREATE INDEX idx_audit_resource ON security_audit(resource_type, resource_id);
+```
+
+**PDP Integration:**
+After every `evaluate()` call, write to audit log:
+
+```python
+async def log_audit_event(
+    request: PolicyRequest,
+    decision: PolicyDecision,
+    context: dict = None
+):
+    """Write audit log entry"""
+    await db.execute("""
+        INSERT INTO security_audit
+        (actor_id, actor_type, action, resource_type, resource_id, 
+         decision, reason, context)
+        VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
+    """,
+        request.actor.actor_id,
+        request.actor.actor_type,
+        request.action,
+        request.resource.type,
+        request.resource.id,
+        decision.effect,
+        decision.reason,
+        json.dumps(context or {})
+    )
+```
+
+**NATS Security Events:**
+- `security.suspicious` — Publish on:
+  - Multiple deny events (>5 in 1 min)
+  - Unusual tool execution attempts
+  - Privilege escalation attempts
+
+---
+
+### 6. **Infrastructure** (3 files) 🔜 0% COMPLETE
+
+#### 6.1 docker-compose.phase4.yml
+```yaml
+services:
+  auth-service:
+    build: ./services/auth-service
+    ports: ["7011:7011"]
+    environment:
+      - DATABASE_URL=postgresql://...
+  
+  pdp-service:
+    build: ./services/pdp-service
+    ports: ["7012:7012"]
+    environment:
+      - DATABASE_URL=postgresql://...
+  
+  usage-engine:
+    build: ./services/usage-engine
+    ports: ["7013:7013"]
+    environment:
+      - DATABASE_URL=postgresql://...
+      - NATS_URL=nats://nats:4222
+  
+  # + All Phase 3 services
+  llm-proxy:
+    environment:
+      - AUTH_SERVICE_URL=http://auth-service:7011
+  
+  # etc...
+```
+
+#### 6.2 scripts/start-phase4.sh
+#### 6.3 scripts/stop-phase4.sh
+
+---
+
+### 7. **Documentation** (4 files) 🔜 0% COMPLETE
+
+#### 7.1 docs/AUTH_SERVICE_SPEC.md
+- Actor model
+- Session management
+- API keys
+- Integration guide
+
+#### 7.2 docs/PDP_SPEC.md
+- Policy model
+- Evaluation logic
+- Policy configuration
+- Adding new rules
+
+#### 7.3 docs/USAGE_ENGINE_SPEC.md
+- Event model
+- NATS integration
+- Aggregation queries
+- Billing foundation
+
+#### 7.4 PHASE4_READY.md
+- Overview
+- Quick start
+- Testing guide
+- Production readiness
+
+---
+
+## 📊 IMPLEMENTATION ROADMAP
+
+### Week 1: Core Services
+- ✅ auth-service (complete)
+- 🔄 pdp-service (20% → 100%)
+- 🔜 usage-engine (0% → 100%)
+
+### Week 2: Integration
+- 🔜 PEP hooks (messaging-service)
+- 🔜 PEP hooks (agent-runtime)
+- 🔜 PEP hooks (toolcore)
+
+### Week 3: Audit & Testing
+- 🔜 Audit log migration
+- 🔜 Security events (NATS)
+- 🔜 E2E testing
+
+### Week 4: Documentation & Polish
+- 🔜 All docs (4 files)
+- 🔜 docker-compose
+- 🔜 Scripts
+- 🔜 PHASE4_READY.md
+
+---
+
+## 🎯 ACCEPTANCE CRITERIA
+
+### Auth Service: ✅
+- [x] Login works with mock users
+- [x] Session tokens created & validated
+- [x] API keys CRUD functional
+- [x] actor_context helper ready
+
+### PDP Service: 🔜
+- [ ] /internal/pdp/evaluate works
+- [ ] MicroDAO access rules
+- [ ] Channel access rules
+- [ ] Tool execution rules
+- [ ] 10+ unit tests
+
+### PEP Integration: 🔜
+- [ ] messaging-service blocks unauthorized sends
+- [ ] agent-runtime checks tool permissions
+- [ ] toolcore enforces allowed_agents
+
+### Usage Engine: 🔜
+- [ ] usage.llm events collected
+- [ ] usage.tool events collected
+- [ ] /internal/usage/summary works
+- [ ] Database tables created
+
+### Audit Log: 🔜
+- [ ] security_audit table exists
+- [ ] PDP writes every decision
+- [ ] Can query last 100 events
+- [ ] security.suspicious events published
+
+### Infrastructure: 🔜
+- [ ] docker-compose.phase4.yml works
+- [ ] All services healthy
+- [ ] Start/stop scripts functional
+- [ ] Documentation complete
+
+---
+
+## 🚀 QUICK START (After Complete)
+
+```bash
+# 1. Start Phase 4
+./scripts/start-phase4.sh
+
+# 2. Test Auth
+curl -X POST http://localhost:7011/auth/login \
+  -d '{"email": "user@daarion.city"}'
+
+# 3. Test PDP
+curl -X POST http://localhost:7012/internal/pdp/evaluate \
+  -d '{
+    "actor": {...},
+    "action": "send_message",
+    "resource": {"type": "channel", "id": "..."}
+  }'
+
+# 4. Check Usage
+curl http://localhost:7013/internal/usage/summary?period=24h
+
+# 5. View Audit
+docker exec daarion-postgres psql -U postgres -d daarion \
+  -c "SELECT * FROM security_audit ORDER BY timestamp DESC LIMIT 10;"
+```
+
+---
+
+## 🔜 AFTER PHASE 4
+
+### Phase 5: Advanced Features
+- Real Passkey integration
+- OAuth2 providers
+- Advanced policy language (ABAC)
+- Dynamic policy updates
+- Cost allocation & billing
+- Security analytics dashboard
+
+### Phase 6: Production Hardening
+- Rate limiting (Redis)
+- DDoS protection
+- Penetration testing
+- Security audit
+- Compliance certification
+
+---
+
+## 📚 RESOURCES
+
+**Specs:**
+- Phase 4 Master Task (user-provided)
+- [PHASE4_STARTED.md](../PHASE4_STARTED.md)
+
+**Related:**
+- [PHASE3_IMPLEMENTATION_COMPLETE.md](../PHASE3_IMPLEMENTATION_COMPLETE.md)
+- [ALL_PHASES_STATUS.md](../ALL_PHASES_STATUS.md)
+
+**Standards:**
+- RBAC (Role-Based Access Control)
+- ABAC (Attribute-Based Access Control)
+- OAuth 2.0 / OpenID Connect
+- Audit logging best practices
+
+---
+
+**Status:** 📋 Detailed Plan Complete  
+**Next:** Continue Implementation  
+**Version:** 1.0.0  
+**Last Updated:** 2025-11-24
+
+
+
+