Files

Apple fca48b3eb0 feat(node2): Complete NODE2 setup - guardian, agents, swapper models

- Node-guardian running on MacBook and updating metrics
- NODE2 agents (Atlas, Greeter, Oracle, Builder Bot) assigned to node-2-macbook-m4max
- Swapper models displaying correctly (8 models)
- DAGI Router agents showing with correct status (3 active, 1 stale)
- Router health check using node_cache for remote nodes

2025-12-02 07:07:58 -08:00

15 KiB

Raw Blame History

📋 PHASE 4: SECURITY LAYER — Детальний План

Мета: Повноцінний безпековий шар для DAARION
Термін: 4-6 тижнів (або 3-4 години automated)
Залежності: Phase 1-3 complete

🎯 OVERVIEW

Phase 4 додає критичну інфраструктуру безпеки:

┌─────────────────────────────────────────┐
│ SECURITY LAYER (Phase 4)                │
├─────────────────────────────────────────┤
│                                         │
│  1. AUTH SERVICE                        │
│     └─ Identity & Sessions              │
│                                         │
│  2. PDP SERVICE (Policy Decision)       │
│     └─ Centralized access control       │
│                                         │
│  3. PEP HOOKS (Policy Enforcement)      │
│     └─ Enforce decisions in services    │
│                                         │
│  4. USAGE ENGINE                        │
│     └─ Track LLM/Tools/Agent usage      │
│                                         │
│  5. AUDIT LOG                           │
│     └─ Security events & compliance     │
│                                         │
└─────────────────────────────────────────┘

📦 DELIVERABLES (40+ files)

1. auth-service (8 files) ✅ COMPLETE

services/auth-service/
├── models.py              ✅ ActorIdentity, SessionToken, ApiKey
├── actor_context.py       ✅ build_actor_context, require_actor
├── routes_sessions.py     ✅ /auth/login, /me, /logout
├── routes_api_keys.py     ✅ /auth/api-keys CRUD
├── main.py                ✅ FastAPI app + DB tables
├── requirements.txt       ✅
├── Dockerfile             ✅
└── README.md              ✅ Complete documentation

Port: 7011
Status: ✅ Working
Features:

Mock login (3 test users)
Session tokens (7-day expiry)
API keys with optional expiration
ActorContext helper for other services

2. pdp-service (8 files) 🔄 20% COMPLETE

services/pdp-service/
├── models.py              ✅ PolicyRequest, PolicyDecision
├── engine.py              🔜 Policy evaluation logic
├── policy_store.py        🔜 Config-based policy storage
├── main.py                🔜 FastAPI app
├── config.yaml            🔜 microDAO/channel policies
├── requirements.txt       🔜
├── Dockerfile             🔜
└── README.md              🔜 Complete documentation

Port: 7012
Purpose: Centralized Policy Decision Point

Key Features:

Evaluate access requests (actor + action + resource)
Config-based policies (v1)
Support for:
- MicroDAO access (owner/admin/member)
- Channel access (SEND_MESSAGE, READ)
- Tool execution (EXEC_TOOL)
- Agent management (MANAGE)
- Usage viewing (VIEW_USAGE)

Policy Types:

MicroDAO Policies

microdao_policies:
  - microdao_id: "microdao:daarion"
    owners: ["user:1"]
    admins: ["user:1", "user:93"]
    members: ["user:*"]  # All users

Channel Policies

channel_policies:
  - channel_id: "channel-uuid-123"
    microdao_id: "microdao:daarion"
    allowed_roles: ["member", "admin", "owner"]
    blocked_users: []

Tool Policies

tool_policies:
  - tool_id: "projects.list"
    allowed_agents: ["agent:sofia", "agent:pm"]
    allowed_user_roles: ["admin", "owner"]

Policy Evaluation Logic:

def evaluate(request: PolicyRequest) -> PolicyDecision:
    # 1. System Admin bypass (careful!)
    if "system_admin" in request.actor.roles:
        return permit("system_admin")
    
    # 2. Resource-specific rules
    if request.resource.type == "microdao":
        if is_microdao_owner(actor, resource):
            return permit("microdao_owner")
        if is_microdao_admin(actor, resource):
            return permit("microdao_admin")
        if request.action == "read" and is_member(actor, resource):
            return permit("member")
        return deny("not_authorized")
    
    if request.resource.type == "channel":
        if not is_channel_member(actor, resource):
            return deny("not_channel_member")
        if request.action == "send_message":
            if is_blocked(actor, resource):
                return deny("blocked")
            return permit("channel_member")
    
    if request.resource.type == "tool":
        if actor.actor_id in tool.allowed_agents:
            return permit("allowed_agent")
        return deny("tool_not_allowed")
    
    # Default deny
    return deny("no_matching_policy")

3. usage-engine (8 files) 🔜 0% COMPLETE

services/usage-engine/
├── models.py              🔜 LlmUsageEvent, ToolUsageEvent
├── collectors.py          🔜 NATS listeners
├── aggregators.py         🔜 Aggregate stats
├── reporters.py           🔜 API endpoints
├── main.py                🔜 FastAPI app
├── requirements.txt       🔜
├── Dockerfile             🔜
└── README.md              🔜 Complete documentation

Port: 7013
Purpose: Usage tracking & billing foundation

NATS Subjects:

usage.llm — LLM calls (from llm-proxy)
usage.tool — Tool executions (from toolcore)
usage.agent — Agent invocations (from agent-runtime)

Events:

LLM Usage Event

{
  "event_id": "evt-123",
  "timestamp": "2025-11-24T12:34:56Z",
  "actor": {
    "actor_id": "user:93",
    "actor_type": "human",
    "microdao_ids": ["microdao:7"]
  },
  "agent_id": "agent:sofia",
  "microdao_id": "microdao:7",
  "model": "gpt-4.1-mini",
  "provider": "openai",
  "prompt_tokens": 1234,
  "completion_tokens": 567,
  "total_tokens": 1801,
  "latency_ms": 2345,
  "cost_usd": 0.0234
}

Tool Usage Event

{
  "event_id": "evt-456",
  "timestamp": "2025-11-24T12:35:00Z",
  "actor": {
    "actor_id": "agent:sofia",
    "actor_type": "agent"
  },
  "agent_id": "agent:sofia",
  "microdao_id": "microdao:7",
  "tool_id": "projects.list",
  "success": true,
  "latency_ms": 123,
  "result_size_bytes": 4567
}

API Endpoints:

GET /internal/usage/summary?microdao_id=microdao:7&period=24h
→ Aggregate stats (tokens, calls, cost)

GET /internal/usage/agents?microdao_id=microdao:7&period=7d
→ Top agents by usage

GET /internal/usage/models?period=24h
→ Model distribution

GET /internal/usage/costs?microdao_id=microdao:7&period=30d
→ Cost breakdown

Database Tables:

CREATE TABLE usage_llm (
    id UUID PRIMARY KEY,
    timestamp TIMESTAMPTZ NOT NULL,
    actor_id TEXT NOT NULL,
    agent_id TEXT,
    microdao_id TEXT,
    model TEXT NOT NULL,
    provider TEXT NOT NULL,
    prompt_tokens INT NOT NULL,
    completion_tokens INT NOT NULL,
    total_tokens INT NOT NULL,
    latency_ms INT,
    cost_usd DECIMAL(10, 6)
);

CREATE TABLE usage_tool (
    id UUID PRIMARY KEY,
    timestamp TIMESTAMPTZ NOT NULL,
    actor_id TEXT NOT NULL,
    agent_id TEXT,
    microdao_id TEXT,
    tool_id TEXT NOT NULL,
    success BOOLEAN NOT NULL,
    latency_ms INT,
    result_size_bytes INT
);

-- Indexes for fast queries
CREATE INDEX idx_usage_llm_microdao_time ON usage_llm(microdao_id, timestamp DESC);
CREATE INDEX idx_usage_llm_agent ON usage_llm(agent_id, timestamp DESC);
CREATE INDEX idx_usage_tool_microdao ON usage_tool(microdao_id, timestamp DESC);

4. PEP Integration (3 services) 🔜 0% COMPLETE

4.1 messaging-service PEP

File: services/messaging-service/pep_middleware.py

from auth_service_client import get_actor_context
from pdp_service_client import evaluate_policy

async def check_send_message_permission(
    actor_id: str,
    channel_id: str,
    db_pool: asyncpg.Pool
) -> bool:
    """Check if actor can send message to channel"""
    
    # 1. Get actor context
    actor = await get_actor_context(actor_id, db_pool)
    
    # 2. Evaluate policy
    decision = await evaluate_policy(
        actor=actor,
        action="send_message",
        resource={"type": "channel", "id": channel_id}
    )
    
    # 3. Return decision
    return decision.effect == "permit"

Integration Points:

POST /api/messaging/channels/{channel_id}/messages — check before send
POST /api/messaging/channels — check MANAGE permission
POST /api/messaging/channels/{channel_id}/members — check INVITE permission

4.2 agent-runtime PEP

File: services/agent-runtime/pep_client.py

async def check_tool_execution_permission(
    agent_id: str,
    tool_id: str,
    microdao_id: str
) -> bool:
    """Check if agent can execute tool"""
    
    # Build agent actor
    actor = ActorIdentity(
        actor_id=agent_id,
        actor_type="agent",
        microdao_ids=[microdao_id],
        roles=["agent"]
    )
    
    # Evaluate
    decision = await evaluate_policy(
        actor=actor,
        action="exec_tool",
        resource={"type": "tool", "id": tool_id}
    )
    
    return decision.effect == "permit"

Integration: Before calling toolcore in handle_invocation()

4.3 toolcore PEP

Already has: allowed_agents in registry
Additional: Cross-check with PDP for user-initiated tool calls

5. Audit Log (1 migration) 🔜 0% COMPLETE

File: migrations/004_create_security_audit.sql

CREATE TABLE security_audit (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    actor_id TEXT NOT NULL,
    actor_type TEXT NOT NULL,
    action TEXT NOT NULL,
    resource_type TEXT NOT NULL,
    resource_id TEXT NOT NULL,
    decision TEXT NOT NULL,  -- permit/deny
    reason TEXT,
    context JSONB,
    ip_address INET,
    user_agent TEXT
);

CREATE INDEX idx_audit_timestamp ON security_audit(timestamp DESC);
CREATE INDEX idx_audit_actor ON security_audit(actor_id, timestamp DESC);
CREATE INDEX idx_audit_decision ON security_audit(decision, timestamp DESC);
CREATE INDEX idx_audit_resource ON security_audit(resource_type, resource_id);

PDP Integration: After every evaluate() call, write to audit log:

async def log_audit_event(
    request: PolicyRequest,
    decision: PolicyDecision,
    context: dict = None
):
    """Write audit log entry"""
    await db.execute("""
        INSERT INTO security_audit
        (actor_id, actor_type, action, resource_type, resource_id, 
         decision, reason, context)
        VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
    """,
        request.actor.actor_id,
        request.actor.actor_type,
        request.action,
        request.resource.type,
        request.resource.id,
        decision.effect,
        decision.reason,
        json.dumps(context or {})
    )

NATS Security Events:

security.suspicious — Publish on:
- Multiple deny events (>5 in 1 min)
- Unusual tool execution attempts
- Privilege escalation attempts

6. Infrastructure (3 files) 🔜 0% COMPLETE

6.1 docker-compose.phase4.yml

services:
  auth-service:
    build: ./services/auth-service
    ports: ["7011:7011"]
    environment:
      - DATABASE_URL=postgresql://...
  
  pdp-service:
    build: ./services/pdp-service
    ports: ["7012:7012"]
    environment:
      - DATABASE_URL=postgresql://...
  
  usage-engine:
    build: ./services/usage-engine
    ports: ["7013:7013"]
    environment:
      - DATABASE_URL=postgresql://...
      - NATS_URL=nats://nats:4222
  
  # + All Phase 3 services
  llm-proxy:
    environment:
      - AUTH_SERVICE_URL=http://auth-service:7011
  
  # etc...

6.2 scripts/start-phase4.sh

6.3 scripts/stop-phase4.sh

7. Documentation (4 files) 🔜 0% COMPLETE

7.1 docs/AUTH_SERVICE_SPEC.md

Actor model
Session management
API keys
Integration guide

7.2 docs/PDP_SPEC.md

Policy model
Evaluation logic
Policy configuration
Adding new rules

7.3 docs/USAGE_ENGINE_SPEC.md

Event model
NATS integration
Aggregation queries
Billing foundation

7.4 PHASE4_READY.md

Overview
Quick start
Testing guide
Production readiness

📊 IMPLEMENTATION ROADMAP

Week 1: Core Services

✅ auth-service (complete)
🔄 pdp-service (20% → 100%)
🔜 usage-engine (0% → 100%)

Week 2: Integration

🔜 PEP hooks (messaging-service)
🔜 PEP hooks (agent-runtime)
🔜 PEP hooks (toolcore)

Week 3: Audit & Testing

🔜 Audit log migration
🔜 Security events (NATS)
🔜 E2E testing

Week 4: Documentation & Polish

🔜 All docs (4 files)
🔜 docker-compose
🔜 Scripts
🔜 PHASE4_READY.md

🎯 ACCEPTANCE CRITERIA

Auth Service: ✅

Login works with mock users
Session tokens created & validated
API keys CRUD functional
actor_context helper ready

PDP Service: 🔜

/internal/pdp/evaluate works
MicroDAO access rules
Channel access rules
Tool execution rules
10+ unit tests

PEP Integration: 🔜

messaging-service blocks unauthorized sends
agent-runtime checks tool permissions
toolcore enforces allowed_agents

Usage Engine: 🔜

usage.llm events collected
usage.tool events collected
/internal/usage/summary works
Database tables created

Audit Log: 🔜

security_audit table exists
PDP writes every decision
Can query last 100 events
security.suspicious events published

Infrastructure: 🔜

docker-compose.phase4.yml works
All services healthy
Start/stop scripts functional
Documentation complete

🚀 QUICK START (After Complete)

# 1. Start Phase 4
./scripts/start-phase4.sh

# 2. Test Auth
curl -X POST http://localhost:7011/auth/login \
  -d '{"email": "user@daarion.city"}'

# 3. Test PDP
curl -X POST http://localhost:7012/internal/pdp/evaluate \
  -d '{
    "actor": {...},
    "action": "send_message",
    "resource": {"type": "channel", "id": "..."}
  }'

# 4. Check Usage
curl http://localhost:7013/internal/usage/summary?period=24h

# 5. View Audit
docker exec daarion-postgres psql -U postgres -d daarion \
  -c "SELECT * FROM security_audit ORDER BY timestamp DESC LIMIT 10;"

🔜 AFTER PHASE 4

Phase 5: Advanced Features

Real Passkey integration
OAuth2 providers
Advanced policy language (ABAC)
Dynamic policy updates
Cost allocation & billing
Security analytics dashboard

Phase 6: Production Hardening

Rate limiting (Redis)
DDoS protection
Penetration testing
Security audit
Compliance certification

📚 RESOURCES

Specs:

Phase 4 Master Task (user-provided)
PHASE4_STARTED.md

Related:

Standards:

RBAC (Role-Based Access Control)
ABAC (Attribute-Based Access Control)
OAuth 2.0 / OpenID Connect
Audit logging best practices

Status: 📋 Detailed Plan Complete
Next: Continue Implementation
Version: 1.0.0
Last Updated: 2025-11-24

15 KiB Raw Blame History Unescape Escape