feat: Add presence heartbeat for Matrix online status

- matrix-gateway: POST /internal/matrix/presence/online endpoint
- usePresenceHeartbeat hook with activity tracking
- Auto away after 5 min inactivity
- Offline on page close/visibility change
- Integrated in MatrixChatRoom component
This commit is contained in:
Apple
2025-11-27 00:19:40 -08:00
parent 5bed515852
commit 3de3c8cb36
6371 changed files with 1317450 additions and 932 deletions

View File

@@ -0,0 +1,606 @@
# 📋 PHASE 4: SECURITY LAYER — Детальний План
**Мета:** Повноцінний безпековий шар для DAARION
**Термін:** 4-6 тижнів (або 3-4 години automated)
**Залежності:** Phase 1-3 complete
---
## 🎯 OVERVIEW
Phase 4 додає критичну інфраструктуру безпеки:
```
┌─────────────────────────────────────────┐
│ SECURITY LAYER (Phase 4) │
├─────────────────────────────────────────┤
│ │
│ 1. AUTH SERVICE │
│ └─ Identity & Sessions │
│ │
│ 2. PDP SERVICE (Policy Decision) │
│ └─ Centralized access control │
│ │
│ 3. PEP HOOKS (Policy Enforcement) │
│ └─ Enforce decisions in services │
│ │
│ 4. USAGE ENGINE │
│ └─ Track LLM/Tools/Agent usage │
│ │
│ 5. AUDIT LOG │
│ └─ Security events & compliance │
│ │
└─────────────────────────────────────────┘
```
---
## 📦 DELIVERABLES (40+ files)
### 1. **auth-service** (8 files) ✅ COMPLETE
```
services/auth-service/
├── models.py ✅ ActorIdentity, SessionToken, ApiKey
├── actor_context.py ✅ build_actor_context, require_actor
├── routes_sessions.py ✅ /auth/login, /me, /logout
├── routes_api_keys.py ✅ /auth/api-keys CRUD
├── main.py ✅ FastAPI app + DB tables
├── requirements.txt ✅
├── Dockerfile ✅
└── README.md ✅ Complete documentation
```
**Port:** 7011
**Status:** ✅ Working
**Features:**
- Mock login (3 test users)
- Session tokens (7-day expiry)
- API keys with optional expiration
- ActorContext helper for other services
---
### 2. **pdp-service** (8 files) 🔄 20% COMPLETE
```
services/pdp-service/
├── models.py ✅ PolicyRequest, PolicyDecision
├── engine.py 🔜 Policy evaluation logic
├── policy_store.py 🔜 Config-based policy storage
├── main.py 🔜 FastAPI app
├── config.yaml 🔜 microDAO/channel policies
├── requirements.txt 🔜
├── Dockerfile 🔜
└── README.md 🔜 Complete documentation
```
**Port:** 7012
**Purpose:** Centralized Policy Decision Point
**Key Features:**
- Evaluate access requests (actor + action + resource)
- Config-based policies (v1)
- Support for:
- MicroDAO access (owner/admin/member)
- Channel access (SEND_MESSAGE, READ)
- Tool execution (EXEC_TOOL)
- Agent management (MANAGE)
- Usage viewing (VIEW_USAGE)
**Policy Types:**
#### MicroDAO Policies
```yaml
microdao_policies:
- microdao_id: "microdao:daarion"
owners: ["user:1"]
admins: ["user:1", "user:93"]
members: ["user:*"] # All users
```
#### Channel Policies
```yaml
channel_policies:
- channel_id: "channel-uuid-123"
microdao_id: "microdao:daarion"
allowed_roles: ["member", "admin", "owner"]
blocked_users: []
```
#### Tool Policies
```yaml
tool_policies:
- tool_id: "projects.list"
allowed_agents: ["agent:sofia", "agent:pm"]
allowed_user_roles: ["admin", "owner"]
```
**Policy Evaluation Logic:**
```python
def evaluate(request: PolicyRequest) -> PolicyDecision:
# 1. System Admin bypass (careful!)
if "system_admin" in request.actor.roles:
return permit("system_admin")
# 2. Resource-specific rules
if request.resource.type == "microdao":
if is_microdao_owner(actor, resource):
return permit("microdao_owner")
if is_microdao_admin(actor, resource):
return permit("microdao_admin")
if request.action == "read" and is_member(actor, resource):
return permit("member")
return deny("not_authorized")
if request.resource.type == "channel":
if not is_channel_member(actor, resource):
return deny("not_channel_member")
if request.action == "send_message":
if is_blocked(actor, resource):
return deny("blocked")
return permit("channel_member")
if request.resource.type == "tool":
if actor.actor_id in tool.allowed_agents:
return permit("allowed_agent")
return deny("tool_not_allowed")
# Default deny
return deny("no_matching_policy")
```
---
### 3. **usage-engine** (8 files) 🔜 0% COMPLETE
```
services/usage-engine/
├── models.py 🔜 LlmUsageEvent, ToolUsageEvent
├── collectors.py 🔜 NATS listeners
├── aggregators.py 🔜 Aggregate stats
├── reporters.py 🔜 API endpoints
├── main.py 🔜 FastAPI app
├── requirements.txt 🔜
├── Dockerfile 🔜
└── README.md 🔜 Complete documentation
```
**Port:** 7013
**Purpose:** Usage tracking & billing foundation
**NATS Subjects:**
- `usage.llm` — LLM calls (from llm-proxy)
- `usage.tool` — Tool executions (from toolcore)
- `usage.agent` — Agent invocations (from agent-runtime)
**Events:**
#### LLM Usage Event
```json
{
"event_id": "evt-123",
"timestamp": "2025-11-24T12:34:56Z",
"actor": {
"actor_id": "user:93",
"actor_type": "human",
"microdao_ids": ["microdao:7"]
},
"agent_id": "agent:sofia",
"microdao_id": "microdao:7",
"model": "gpt-4.1-mini",
"provider": "openai",
"prompt_tokens": 1234,
"completion_tokens": 567,
"total_tokens": 1801,
"latency_ms": 2345,
"cost_usd": 0.0234
}
```
#### Tool Usage Event
```json
{
"event_id": "evt-456",
"timestamp": "2025-11-24T12:35:00Z",
"actor": {
"actor_id": "agent:sofia",
"actor_type": "agent"
},
"agent_id": "agent:sofia",
"microdao_id": "microdao:7",
"tool_id": "projects.list",
"success": true,
"latency_ms": 123,
"result_size_bytes": 4567
}
```
**API Endpoints:**
```http
GET /internal/usage/summary?microdao_id=microdao:7&period=24h
Aggregate stats (tokens, calls, cost)
GET /internal/usage/agents?microdao_id=microdao:7&period=7d
Top agents by usage
GET /internal/usage/models?period=24h
Model distribution
GET /internal/usage/costs?microdao_id=microdao:7&period=30d
Cost breakdown
```
**Database Tables:**
```sql
CREATE TABLE usage_llm (
id UUID PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL,
actor_id TEXT NOT NULL,
agent_id TEXT,
microdao_id TEXT,
model TEXT NOT NULL,
provider TEXT NOT NULL,
prompt_tokens INT NOT NULL,
completion_tokens INT NOT NULL,
total_tokens INT NOT NULL,
latency_ms INT,
cost_usd DECIMAL(10, 6)
);
CREATE TABLE usage_tool (
id UUID PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL,
actor_id TEXT NOT NULL,
agent_id TEXT,
microdao_id TEXT,
tool_id TEXT NOT NULL,
success BOOLEAN NOT NULL,
latency_ms INT,
result_size_bytes INT
);
-- Indexes for fast queries
CREATE INDEX idx_usage_llm_microdao_time ON usage_llm(microdao_id, timestamp DESC);
CREATE INDEX idx_usage_llm_agent ON usage_llm(agent_id, timestamp DESC);
CREATE INDEX idx_usage_tool_microdao ON usage_tool(microdao_id, timestamp DESC);
```
---
### 4. **PEP Integration** (3 services) 🔜 0% COMPLETE
#### 4.1 messaging-service PEP
**File:** `services/messaging-service/pep_middleware.py`
```python
from auth_service_client import get_actor_context
from pdp_service_client import evaluate_policy
async def check_send_message_permission(
actor_id: str,
channel_id: str,
db_pool: asyncpg.Pool
) -> bool:
"""Check if actor can send message to channel"""
# 1. Get actor context
actor = await get_actor_context(actor_id, db_pool)
# 2. Evaluate policy
decision = await evaluate_policy(
actor=actor,
action="send_message",
resource={"type": "channel", "id": channel_id}
)
# 3. Return decision
return decision.effect == "permit"
```
**Integration Points:**
- `POST /api/messaging/channels/{channel_id}/messages` — check before send
- `POST /api/messaging/channels` — check MANAGE permission
- `POST /api/messaging/channels/{channel_id}/members` — check INVITE permission
#### 4.2 agent-runtime PEP
**File:** `services/agent-runtime/pep_client.py`
```python
async def check_tool_execution_permission(
agent_id: str,
tool_id: str,
microdao_id: str
) -> bool:
"""Check if agent can execute tool"""
# Build agent actor
actor = ActorIdentity(
actor_id=agent_id,
actor_type="agent",
microdao_ids=[microdao_id],
roles=["agent"]
)
# Evaluate
decision = await evaluate_policy(
actor=actor,
action="exec_tool",
resource={"type": "tool", "id": tool_id}
)
return decision.effect == "permit"
```
**Integration:** Before calling toolcore in `handle_invocation()`
#### 4.3 toolcore PEP
**Already has:** `allowed_agents` in registry
**Additional:** Cross-check with PDP for user-initiated tool calls
---
### 5. **Audit Log** (1 migration) 🔜 0% COMPLETE
**File:** `migrations/004_create_security_audit.sql`
```sql
CREATE TABLE security_audit (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
actor_id TEXT NOT NULL,
actor_type TEXT NOT NULL,
action TEXT NOT NULL,
resource_type TEXT NOT NULL,
resource_id TEXT NOT NULL,
decision TEXT NOT NULL, -- permit/deny
reason TEXT,
context JSONB,
ip_address INET,
user_agent TEXT
);
CREATE INDEX idx_audit_timestamp ON security_audit(timestamp DESC);
CREATE INDEX idx_audit_actor ON security_audit(actor_id, timestamp DESC);
CREATE INDEX idx_audit_decision ON security_audit(decision, timestamp DESC);
CREATE INDEX idx_audit_resource ON security_audit(resource_type, resource_id);
```
**PDP Integration:**
After every `evaluate()` call, write to audit log:
```python
async def log_audit_event(
request: PolicyRequest,
decision: PolicyDecision,
context: dict = None
):
"""Write audit log entry"""
await db.execute("""
INSERT INTO security_audit
(actor_id, actor_type, action, resource_type, resource_id,
decision, reason, context)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
""",
request.actor.actor_id,
request.actor.actor_type,
request.action,
request.resource.type,
request.resource.id,
decision.effect,
decision.reason,
json.dumps(context or {})
)
```
**NATS Security Events:**
- `security.suspicious` — Publish on:
- Multiple deny events (>5 in 1 min)
- Unusual tool execution attempts
- Privilege escalation attempts
---
### 6. **Infrastructure** (3 files) 🔜 0% COMPLETE
#### 6.1 docker-compose.phase4.yml
```yaml
services:
auth-service:
build: ./services/auth-service
ports: ["7011:7011"]
environment:
- DATABASE_URL=postgresql://...
pdp-service:
build: ./services/pdp-service
ports: ["7012:7012"]
environment:
- DATABASE_URL=postgresql://...
usage-engine:
build: ./services/usage-engine
ports: ["7013:7013"]
environment:
- DATABASE_URL=postgresql://...
- NATS_URL=nats://nats:4222
# + All Phase 3 services
llm-proxy:
environment:
- AUTH_SERVICE_URL=http://auth-service:7011
# etc...
```
#### 6.2 scripts/start-phase4.sh
#### 6.3 scripts/stop-phase4.sh
---
### 7. **Documentation** (4 files) 🔜 0% COMPLETE
#### 7.1 docs/AUTH_SERVICE_SPEC.md
- Actor model
- Session management
- API keys
- Integration guide
#### 7.2 docs/PDP_SPEC.md
- Policy model
- Evaluation logic
- Policy configuration
- Adding new rules
#### 7.3 docs/USAGE_ENGINE_SPEC.md
- Event model
- NATS integration
- Aggregation queries
- Billing foundation
#### 7.4 PHASE4_READY.md
- Overview
- Quick start
- Testing guide
- Production readiness
---
## 📊 IMPLEMENTATION ROADMAP
### Week 1: Core Services
- ✅ auth-service (complete)
- 🔄 pdp-service (20% → 100%)
- 🔜 usage-engine (0% → 100%)
### Week 2: Integration
- 🔜 PEP hooks (messaging-service)
- 🔜 PEP hooks (agent-runtime)
- 🔜 PEP hooks (toolcore)
### Week 3: Audit & Testing
- 🔜 Audit log migration
- 🔜 Security events (NATS)
- 🔜 E2E testing
### Week 4: Documentation & Polish
- 🔜 All docs (4 files)
- 🔜 docker-compose
- 🔜 Scripts
- 🔜 PHASE4_READY.md
---
## 🎯 ACCEPTANCE CRITERIA
### Auth Service: ✅
- [x] Login works with mock users
- [x] Session tokens created & validated
- [x] API keys CRUD functional
- [x] actor_context helper ready
### PDP Service: 🔜
- [ ] /internal/pdp/evaluate works
- [ ] MicroDAO access rules
- [ ] Channel access rules
- [ ] Tool execution rules
- [ ] 10+ unit tests
### PEP Integration: 🔜
- [ ] messaging-service blocks unauthorized sends
- [ ] agent-runtime checks tool permissions
- [ ] toolcore enforces allowed_agents
### Usage Engine: 🔜
- [ ] usage.llm events collected
- [ ] usage.tool events collected
- [ ] /internal/usage/summary works
- [ ] Database tables created
### Audit Log: 🔜
- [ ] security_audit table exists
- [ ] PDP writes every decision
- [ ] Can query last 100 events
- [ ] security.suspicious events published
### Infrastructure: 🔜
- [ ] docker-compose.phase4.yml works
- [ ] All services healthy
- [ ] Start/stop scripts functional
- [ ] Documentation complete
---
## 🚀 QUICK START (After Complete)
```bash
# 1. Start Phase 4
./scripts/start-phase4.sh
# 2. Test Auth
curl -X POST http://localhost:7011/auth/login \
-d '{"email": "user@daarion.city"}'
# 3. Test PDP
curl -X POST http://localhost:7012/internal/pdp/evaluate \
-d '{
"actor": {...},
"action": "send_message",
"resource": {"type": "channel", "id": "..."}
}'
# 4. Check Usage
curl http://localhost:7013/internal/usage/summary?period=24h
# 5. View Audit
docker exec daarion-postgres psql -U postgres -d daarion \
-c "SELECT * FROM security_audit ORDER BY timestamp DESC LIMIT 10;"
```
---
## 🔜 AFTER PHASE 4
### Phase 5: Advanced Features
- Real Passkey integration
- OAuth2 providers
- Advanced policy language (ABAC)
- Dynamic policy updates
- Cost allocation & billing
- Security analytics dashboard
### Phase 6: Production Hardening
- Rate limiting (Redis)
- DDoS protection
- Penetration testing
- Security audit
- Compliance certification
---
## 📚 RESOURCES
**Specs:**
- Phase 4 Master Task (user-provided)
- [PHASE4_STARTED.md](../PHASE4_STARTED.md)
**Related:**
- [PHASE3_IMPLEMENTATION_COMPLETE.md](../PHASE3_IMPLEMENTATION_COMPLETE.md)
- [ALL_PHASES_STATUS.md](../ALL_PHASES_STATUS.md)
**Standards:**
- RBAC (Role-Based Access Control)
- ABAC (Attribute-Based Access Control)
- OAuth 2.0 / OpenID Connect
- Audit logging best practices
---
**Status:** 📋 Detailed Plan Complete
**Next:** Continue Implementation
**Version:** 1.0.0
**Last Updated:** 2025-11-24