Files
microdao-daarion/docs/PHASE4_DETAILED_PLAN.md
Apple 744c149300
Some checks failed
Build and Deploy Docs / build-and-deploy (push) Has been cancelled
Add automated session logging system
- Created logs/ structure (sessions, operations, incidents)
- Added session-start/log/end scripts
- Installed Git hooks for auto-logging commits/pushes
- Added shell integration for zsh
- Created CHANGELOG.md
- Documented today's session (2026-01-10)
2026-01-10 04:53:17 -08:00

619 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 📋 PHASE 4: SECURITY LAYER — Детальний План
**Мета:** Повноцінний безпековий шар для DAARION
**Термін:** 4-6 тижнів (або 3-4 години automated)
**Залежності:** Phase 1-3 complete
---
## 🎯 OVERVIEW
Phase 4 додає критичну інфраструктуру безпеки:
```
┌─────────────────────────────────────────┐
│ SECURITY LAYER (Phase 4) │
├─────────────────────────────────────────┤
│ │
│ 1. AUTH SERVICE │
│ └─ Identity & Sessions │
│ │
│ 2. PDP SERVICE (Policy Decision) │
│ └─ Centralized access control │
│ │
│ 3. PEP HOOKS (Policy Enforcement) │
│ └─ Enforce decisions in services │
│ │
│ 4. USAGE ENGINE │
│ └─ Track LLM/Tools/Agent usage │
│ │
│ 5. AUDIT LOG │
│ └─ Security events & compliance │
│ │
└─────────────────────────────────────────┘
```
---
## 📦 DELIVERABLES (40+ files)
### 1. **auth-service** (8 files) ✅ COMPLETE
```
services/auth-service/
├── models.py ✅ ActorIdentity, SessionToken, ApiKey
├── actor_context.py ✅ build_actor_context, require_actor
├── routes_sessions.py ✅ /auth/login, /me, /logout
├── routes_api_keys.py ✅ /auth/api-keys CRUD
├── main.py ✅ FastAPI app + DB tables
├── requirements.txt ✅
├── Dockerfile ✅
└── README.md ✅ Complete documentation
```
**Port:** 7011
**Status:** ✅ Working
**Features:**
- Mock login (3 test users)
- Session tokens (7-day expiry)
- API keys with optional expiration
- ActorContext helper for other services
---
### 2. **pdp-service** (8 files) 🔄 20% COMPLETE
```
services/pdp-service/
├── models.py ✅ PolicyRequest, PolicyDecision
├── engine.py 🔜 Policy evaluation logic
├── policy_store.py 🔜 Config-based policy storage
├── main.py 🔜 FastAPI app
├── config.yaml 🔜 microDAO/channel policies
├── requirements.txt 🔜
├── Dockerfile 🔜
└── README.md 🔜 Complete documentation
```
**Port:** 7012
**Purpose:** Centralized Policy Decision Point
**Key Features:**
- Evaluate access requests (actor + action + resource)
- Config-based policies (v1)
- Support for:
- MicroDAO access (owner/admin/member)
- Channel access (SEND_MESSAGE, READ)
- Tool execution (EXEC_TOOL)
- Agent management (MANAGE)
- Usage viewing (VIEW_USAGE)
**Policy Types:**
#### MicroDAO Policies
```yaml
microdao_policies:
- microdao_id: "microdao:daarion"
owners: ["user:1"]
admins: ["user:1", "user:93"]
members: ["user:*"] # All users
```
#### Channel Policies
```yaml
channel_policies:
- channel_id: "channel-uuid-123"
microdao_id: "microdao:daarion"
allowed_roles: ["member", "admin", "owner"]
blocked_users: []
```
#### Tool Policies
```yaml
tool_policies:
- tool_id: "projects.list"
allowed_agents: ["agent:sofia", "agent:pm"]
allowed_user_roles: ["admin", "owner"]
```
**Policy Evaluation Logic:**
```python
def evaluate(request: PolicyRequest) -> PolicyDecision:
# 1. System Admin bypass (careful!)
if "system_admin" in request.actor.roles:
return permit("system_admin")
# 2. Resource-specific rules
if request.resource.type == "microdao":
if is_microdao_owner(actor, resource):
return permit("microdao_owner")
if is_microdao_admin(actor, resource):
return permit("microdao_admin")
if request.action == "read" and is_member(actor, resource):
return permit("member")
return deny("not_authorized")
if request.resource.type == "channel":
if not is_channel_member(actor, resource):
return deny("not_channel_member")
if request.action == "send_message":
if is_blocked(actor, resource):
return deny("blocked")
return permit("channel_member")
if request.resource.type == "tool":
if actor.actor_id in tool.allowed_agents:
return permit("allowed_agent")
return deny("tool_not_allowed")
# Default deny
return deny("no_matching_policy")
```
---
### 3. **usage-engine** (8 files) 🔜 0% COMPLETE
```
services/usage-engine/
├── models.py 🔜 LlmUsageEvent, ToolUsageEvent
├── collectors.py 🔜 NATS listeners
├── aggregators.py 🔜 Aggregate stats
├── reporters.py 🔜 API endpoints
├── main.py 🔜 FastAPI app
├── requirements.txt 🔜
├── Dockerfile 🔜
└── README.md 🔜 Complete documentation
```
**Port:** 7013
**Purpose:** Usage tracking & billing foundation
**NATS Subjects:**
- `usage.llm` — LLM calls (from llm-proxy)
- `usage.tool` — Tool executions (from toolcore)
- `usage.agent` — Agent invocations (from agent-runtime)
**Events:**
#### LLM Usage Event
```json
{
"event_id": "evt-123",
"timestamp": "2025-11-24T12:34:56Z",
"actor": {
"actor_id": "user:93",
"actor_type": "human",
"microdao_ids": ["microdao:7"]
},
"agent_id": "agent:sofia",
"microdao_id": "microdao:7",
"model": "gpt-4.1-mini",
"provider": "openai",
"prompt_tokens": 1234,
"completion_tokens": 567,
"total_tokens": 1801,
"latency_ms": 2345,
"cost_usd": 0.0234
}
```
#### Tool Usage Event
```json
{
"event_id": "evt-456",
"timestamp": "2025-11-24T12:35:00Z",
"actor": {
"actor_id": "agent:sofia",
"actor_type": "agent"
},
"agent_id": "agent:sofia",
"microdao_id": "microdao:7",
"tool_id": "projects.list",
"success": true,
"latency_ms": 123,
"result_size_bytes": 4567
}
```
**API Endpoints:**
```http
GET /internal/usage/summary?microdao_id=microdao:7&period=24h
Aggregate stats (tokens, calls, cost)
GET /internal/usage/agents?microdao_id=microdao:7&period=7d
Top agents by usage
GET /internal/usage/models?period=24h
Model distribution
GET /internal/usage/costs?microdao_id=microdao:7&period=30d
Cost breakdown
```
**Database Tables:**
```sql
CREATE TABLE usage_llm (
id UUID PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL,
actor_id TEXT NOT NULL,
agent_id TEXT,
microdao_id TEXT,
model TEXT NOT NULL,
provider TEXT NOT NULL,
prompt_tokens INT NOT NULL,
completion_tokens INT NOT NULL,
total_tokens INT NOT NULL,
latency_ms INT,
cost_usd DECIMAL(10, 6)
);
CREATE TABLE usage_tool (
id UUID PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL,
actor_id TEXT NOT NULL,
agent_id TEXT,
microdao_id TEXT,
tool_id TEXT NOT NULL,
success BOOLEAN NOT NULL,
latency_ms INT,
result_size_bytes INT
);
-- Indexes for fast queries
CREATE INDEX idx_usage_llm_microdao_time ON usage_llm(microdao_id, timestamp DESC);
CREATE INDEX idx_usage_llm_agent ON usage_llm(agent_id, timestamp DESC);
CREATE INDEX idx_usage_tool_microdao ON usage_tool(microdao_id, timestamp DESC);
```
---
### 4. **PEP Integration** (3 services) 🔜 0% COMPLETE
#### 4.1 messaging-service PEP
**File:** `services/messaging-service/pep_middleware.py`
```python
from auth_service_client import get_actor_context
from pdp_service_client import evaluate_policy
async def check_send_message_permission(
actor_id: str,
channel_id: str,
db_pool: asyncpg.Pool
) -> bool:
"""Check if actor can send message to channel"""
# 1. Get actor context
actor = await get_actor_context(actor_id, db_pool)
# 2. Evaluate policy
decision = await evaluate_policy(
actor=actor,
action="send_message",
resource={"type": "channel", "id": channel_id}
)
# 3. Return decision
return decision.effect == "permit"
```
**Integration Points:**
- `POST /api/messaging/channels/{channel_id}/messages` — check before send
- `POST /api/messaging/channels` — check MANAGE permission
- `POST /api/messaging/channels/{channel_id}/members` — check INVITE permission
#### 4.2 agent-runtime PEP
**File:** `services/agent-runtime/pep_client.py`
```python
async def check_tool_execution_permission(
agent_id: str,
tool_id: str,
microdao_id: str
) -> bool:
"""Check if agent can execute tool"""
# Build agent actor
actor = ActorIdentity(
actor_id=agent_id,
actor_type="agent",
microdao_ids=[microdao_id],
roles=["agent"]
)
# Evaluate
decision = await evaluate_policy(
actor=actor,
action="exec_tool",
resource={"type": "tool", "id": tool_id}
)
return decision.effect == "permit"
```
**Integration:** Before calling toolcore in `handle_invocation()`
#### 4.3 toolcore PEP
**Already has:** `allowed_agents` in registry
**Additional:** Cross-check with PDP for user-initiated tool calls
---
### 5. **Audit Log** (1 migration) 🔜 0% COMPLETE
**File:** `migrations/004_create_security_audit.sql`
```sql
CREATE TABLE security_audit (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
actor_id TEXT NOT NULL,
actor_type TEXT NOT NULL,
action TEXT NOT NULL,
resource_type TEXT NOT NULL,
resource_id TEXT NOT NULL,
decision TEXT NOT NULL, -- permit/deny
reason TEXT,
context JSONB,
ip_address INET,
user_agent TEXT
);
CREATE INDEX idx_audit_timestamp ON security_audit(timestamp DESC);
CREATE INDEX idx_audit_actor ON security_audit(actor_id, timestamp DESC);
CREATE INDEX idx_audit_decision ON security_audit(decision, timestamp DESC);
CREATE INDEX idx_audit_resource ON security_audit(resource_type, resource_id);
```
**PDP Integration:**
After every `evaluate()` call, write to audit log:
```python
async def log_audit_event(
request: PolicyRequest,
decision: PolicyDecision,
context: dict = None
):
"""Write audit log entry"""
await db.execute("""
INSERT INTO security_audit
(actor_id, actor_type, action, resource_type, resource_id,
decision, reason, context)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
""",
request.actor.actor_id,
request.actor.actor_type,
request.action,
request.resource.type,
request.resource.id,
decision.effect,
decision.reason,
json.dumps(context or {})
)
```
**NATS Security Events:**
- `security.suspicious` — Publish on:
- Multiple deny events (>5 in 1 min)
- Unusual tool execution attempts
- Privilege escalation attempts
---
### 6. **Infrastructure** (3 files) 🔜 0% COMPLETE
#### 6.1 docker-compose.phase4.yml
```yaml
services:
auth-service:
build: ./services/auth-service
ports: ["7011:7011"]
environment:
- DATABASE_URL=postgresql://...
pdp-service:
build: ./services/pdp-service
ports: ["7012:7012"]
environment:
- DATABASE_URL=postgresql://...
usage-engine:
build: ./services/usage-engine
ports: ["7013:7013"]
environment:
- DATABASE_URL=postgresql://...
- NATS_URL=nats://nats:4222
# + All Phase 3 services
llm-proxy:
environment:
- AUTH_SERVICE_URL=http://auth-service:7011
# etc...
```
#### 6.2 scripts/start-phase4.sh
#### 6.3 scripts/stop-phase4.sh
---
### 7. **Documentation** (4 files) 🔜 0% COMPLETE
#### 7.1 docs/AUTH_SERVICE_SPEC.md
- Actor model
- Session management
- API keys
- Integration guide
#### 7.2 docs/PDP_SPEC.md
- Policy model
- Evaluation logic
- Policy configuration
- Adding new rules
#### 7.3 docs/USAGE_ENGINE_SPEC.md
- Event model
- NATS integration
- Aggregation queries
- Billing foundation
#### 7.4 PHASE4_READY.md
- Overview
- Quick start
- Testing guide
- Production readiness
---
## 📊 IMPLEMENTATION ROADMAP
### Week 1: Core Services
- ✅ auth-service (complete)
- 🔄 pdp-service (20% → 100%)
- 🔜 usage-engine (0% → 100%)
### Week 2: Integration
- 🔜 PEP hooks (messaging-service)
- 🔜 PEP hooks (agent-runtime)
- 🔜 PEP hooks (toolcore)
### Week 3: Audit & Testing
- 🔜 Audit log migration
- 🔜 Security events (NATS)
- 🔜 E2E testing
### Week 4: Documentation & Polish
- 🔜 All docs (4 files)
- 🔜 docker-compose
- 🔜 Scripts
- 🔜 PHASE4_READY.md
---
## 🎯 ACCEPTANCE CRITERIA
### Auth Service: ✅
- [x] Login works with mock users
- [x] Session tokens created & validated
- [x] API keys CRUD functional
- [x] actor_context helper ready
### PDP Service: 🔜
- [ ] /internal/pdp/evaluate works
- [ ] MicroDAO access rules
- [ ] Channel access rules
- [ ] Tool execution rules
- [ ] 10+ unit tests
### PEP Integration: 🔜
- [ ] messaging-service blocks unauthorized sends
- [ ] agent-runtime checks tool permissions
- [ ] toolcore enforces allowed_agents
### Usage Engine: 🔜
- [ ] usage.llm events collected
- [ ] usage.tool events collected
- [ ] /internal/usage/summary works
- [ ] Database tables created
### Audit Log: 🔜
- [ ] security_audit table exists
- [ ] PDP writes every decision
- [ ] Can query last 100 events
- [ ] security.suspicious events published
### Infrastructure: 🔜
- [ ] docker-compose.phase4.yml works
- [ ] All services healthy
- [ ] Start/stop scripts functional
- [ ] Documentation complete
---
## 🚀 QUICK START (After Complete)
```bash
# 1. Start Phase 4
./scripts/start-phase4.sh
# 2. Test Auth
curl -X POST http://localhost:7011/auth/login \
-d '{"email": "user@daarion.city"}'
# 3. Test PDP
curl -X POST http://localhost:7012/internal/pdp/evaluate \
-d '{
"actor": {...},
"action": "send_message",
"resource": {"type": "channel", "id": "..."}
}'
# 4. Check Usage
curl http://localhost:7013/internal/usage/summary?period=24h
# 5. View Audit
docker exec daarion-postgres psql -U postgres -d daarion \
-c "SELECT * FROM security_audit ORDER BY timestamp DESC LIMIT 10;"
```
---
## 🔜 AFTER PHASE 4
### Phase 5: Advanced Features
- Real Passkey integration
- OAuth2 providers
- Advanced policy language (ABAC)
- Dynamic policy updates
- Cost allocation & billing
- Security analytics dashboard
### Phase 6: Production Hardening
- Rate limiting (Redis)
- DDoS protection
- Penetration testing
- Security audit
- Compliance certification
---
## 📚 RESOURCES
**Specs:**
- Phase 4 Master Task (user-provided)
- [PHASE4_STARTED.md](../PHASE4_STARTED.md)
**Related:**
- [PHASE3_IMPLEMENTATION_COMPLETE.md](../PHASE3_IMPLEMENTATION_COMPLETE.md)
- [ALL_PHASES_STATUS.md](../ALL_PHASES_STATUS.md)
**Standards:**
- RBAC (Role-Based Access Control)
- ABAC (Attribute-Based Access Control)
- OAuth 2.0 / OpenID Connect
- Audit logging best practices
---
**Status:** 📋 Detailed Plan Complete
**Next:** Continue Implementation
**Version:** 1.0.0
**Last Updated:** 2025-11-24