Files
microdao-daarion/NODE-REGISTRY-STATUS.md

14 KiB

🔧 Node Registry Service — Status & Deployment

Версія: 1.0.0
Дата створення: 2025-01-17
Останнє оновлення: 2025-01-17
Статус: Complete + Integrated — Full Stack Implementation Ready for Production


📋 Overview

Node Registry Service — централізований реєстр для всіх нод DAGI мережі (Node #1, Node #2, майбутні Node #N).

Призначення

  • Реєстрація нод — автоматична/ручна реєстрація нових нод
  • Heartbeat tracking — моніторинг доступності та здоров'я нод
  • Node discovery — пошук доступних нод та їх можливостей
  • Profile management — збереження профілів нод (LLM configs, services, capabilities)

Що готово (Infrastructure by Warp)

1. Service Structure

services/node-registry/
├── app/
│   └── main.py          # FastAPI stub application
├── migrations/
│   └── init_node_registry.sql  # Database schema
├── Dockerfile           # Docker image configuration
├── requirements.txt     # Python dependencies
└── README.md            # Full service documentation

2. FastAPI Application (app/main.py)

  • Health endpoint: GET /health
  • Metrics endpoint: GET /metrics
  • Root endpoint: GET /
  • 🚧 Stub API endpoints (501 Not Implemented):
    • POST /api/v1/nodes/register
    • POST /api/v1/nodes/{node_id}/heartbeat
    • GET /api/v1/nodes
    • GET /api/v1/nodes/{node_id}

3. PostgreSQL Database

  • Database: node_registry
  • User: node_registry_user
  • Tables created:
    • nodes — Core node registry
    • node_profiles — Node capabilities/configurations
    • heartbeat_log — Historical heartbeat data
  • Initial data: Node #1 and Node #2 pre-registered

4. Docker Configuration

  • Dockerfile with Python 3.11-slim
  • Health check configured
  • Non-root user (noderegistry)
  • Added to docker-compose.yml with dependencies

5. Deployment Script

  • scripts/deploy-node-registry.sh
    • SSH connection check
    • Database initialization
    • Secure password generation
    • Docker image build
    • Service start
    • Firewall configuration
    • Deployment verification

🔌 Service Configuration

Port & Access

  • Port: 9205 (Internal only)
  • Access: Node #1, Node #2, DAGI nodes (LAN/VPN)
  • Public access: Blocked by firewall

Environment Variables

NODE_REGISTRY_DB_HOST=postgres
NODE_REGISTRY_DB_PORT=5432
NODE_REGISTRY_DB_NAME=node_registry
NODE_REGISTRY_DB_USER=node_registry_user
NODE_REGISTRY_DB_PASSWORD=***generated_secure_password***
NODE_REGISTRY_HTTP_PORT=9205
NODE_REGISTRY_ENV=production
NODE_REGISTRY_LOG_LEVEL=info

Firewall Rules (Node #1)

# Allow from local network
ufw allow from 192.168.1.0/24 to any port 9205 proto tcp comment 'Node Registry - LAN'

# Allow from Docker network
ufw allow from 172.16.0.0/12 to any port 9205 proto tcp comment 'Node Registry - Docker'

# Deny from external
ufw deny 9205/tcp comment 'Node Registry - Block external'

🗄️ Database Schema

Table: nodes

Column Type Description
id UUID Primary key
node_id VARCHAR(255) Unique identifier (e.g. node-1-hetzner-gex44)
node_name VARCHAR(255) Human-readable name
node_role VARCHAR(50) production, development, backup
node_type VARCHAR(50) router, gateway, worker
ip_address INET Public IP
local_ip INET Local network IP
hostname VARCHAR(255) DNS hostname
status VARCHAR(50) online, offline, maintenance, degraded
last_heartbeat TIMESTAMP Last heartbeat timestamp
registered_at TIMESTAMP Registration time
updated_at TIMESTAMP Last update time
metadata JSONB Additional metadata

Table: node_profiles

Column Type Description
id UUID Primary key
node_id UUID Foreign key to nodes
profile_name VARCHAR(255) Profile identifier
profile_type VARCHAR(50) llm, service, capability
config JSONB Profile configuration
enabled BOOLEAN Active status

Table: heartbeat_log

Column Type Description
id UUID Primary key
node_id UUID Foreign key to nodes
timestamp TIMESTAMP Heartbeat time
status VARCHAR(50) Node status
metrics JSONB System metrics (CPU, RAM, etc.)

Initial Data

-- Pre-registered nodes
INSERT INTO nodes (node_id, node_name, node_role, node_type, ip_address, local_ip, hostname, status)
VALUES 
    ('node-1-hetzner-gex44', 'Hetzner GEX44 Production', 'production', 'router', '144.76.224.179', NULL, 'gateway.daarion.city', 'offline'),
    ('node-2-macbook-m4max', 'MacBook Pro M4 Max', 'development', 'router', NULL, '192.168.1.244', 'MacBook-Pro.local', 'offline');

🚀 Deployment

Quick Deploy to Node #1 (Production)

# From Node #2 (MacBook)
cd /Users/apple/github-projects/microdao-daarion

# Deploy service
./scripts/deploy-node-registry.sh

# Register Node #1 using bootstrap
python -m tools.dagi_node_agent.bootstrap \
  --role production-router \
  --labels router,gateway,production \
  --registry-url http://144.76.224.179:9205

# Register Node #2 using bootstrap
python -m tools.dagi_node_agent.bootstrap \
  --role development-router \
  --labels router,development,mac,gpu \
  --registry-url http://192.168.1.244:9205

Manual Deployment Steps

1. Initialize Database (on Node #1)

ssh root@144.76.224.179
cd /opt/microdao-daarion

# Copy SQL script to container
docker cp services/node-registry/migrations/init_node_registry.sql dagi-postgres:/tmp/

# Run initialization
docker exec -i dagi-postgres psql -U postgres < /tmp/init_node_registry.sql

2. Generate Secure Password

# Generate and save to .env
PASSWORD=$(openssl rand -base64 32)
echo "NODE_REGISTRY_DB_PASSWORD=$PASSWORD" >> .env

3. Build and Start

# Build Docker image
docker-compose build node-registry

# Start service
docker-compose up -d node-registry

# Check status
docker-compose ps | grep node-registry
docker logs dagi-node-registry

4. Configure Firewall

# Allow internal access
ufw allow from 192.168.1.0/24 to any port 9205 proto tcp
ufw allow from 172.16.0.0/12 to any port 9205 proto tcp

# Deny external
ufw deny 9205/tcp

5. Verify Deployment

# Health check
curl http://localhost:9205/health

# Expected response:
# {"status":"healthy","service":"node-registry","version":"0.1.0-stub",...}

🧪 Testing & Verification

Local Testing (Node #2)

# Install dependencies
cd services/node-registry
pip install -r requirements.txt

# Run locally
export NODE_REGISTRY_ENV=development
python -m app.main

# Test endpoints
curl http://localhost:9205/health
curl http://localhost:9205/metrics
open http://localhost:9205/docs  # Interactive API docs

Production Testing (Node #1)

# From Node #2, test internal access
curl http://144.76.224.179:9205/health

# From Node #1
ssh root@144.76.224.179
curl http://localhost:9205/health
curl http://localhost:9205/metrics

# Check logs
docker logs dagi-node-registry --tail 50

📊 Monitoring

Health Endpoint

GET http://localhost:9205/health

{
  "status": "healthy",
  "service": "node-registry",
  "version": "0.1.0-stub",
  "environment": "production",
  "uptime_seconds": 3600.5,
  "timestamp": "2025-01-17T14:30:00Z",
  "database": {
    "connected": true,
    "host": "postgres",
    "port": 5432,
    "database": "node_registry"
  }
}

Metrics Endpoint

GET http://localhost:9205/metrics

{
  "service": "node-registry",
  "uptime_seconds": 3600.5,
  "total_nodes": 2,
  "active_nodes": 1,
  "timestamp": "2025-01-17T14:30:00Z"
}

Prometheus Integration (Future)

# prometheus.yml
scrape_configs:
  - job_name: 'node-registry'
    static_configs:
      - targets: ['node-registry:9205']
    scrape_interval: 30s

Implemented by Cursor

Completed Features

Priority 1: Database Integration

  • SQLAlchemy ORM models (models.py)
    • Node model (node_id, hostname, ip, role, labels, status, heartbeat)
    • NodeProfile model (role-based configuration profiles)
  • Database connection pool
  • SQL migration (001_create_node_registry_tables.sql)
  • Health check with DB connection

Priority 2: Core API Endpoints

  • POST /api/v1/nodes/register — Register/update node with auto node_id generation
  • POST /api/v1/nodes/heartbeat — Update heartbeat timestamp
  • GET /api/v1/nodes — List all nodes with filters (role, label, status)
  • GET /api/v1/nodes/{node_id} — Get specific node details
  • CRUD operations in crud.py:
    • register_node() — Auto-generate node_id
    • update_heartbeat() — Update heartbeat
    • get_node(), list_nodes() — Query nodes
    • get_node_profile() — Get role profile

Priority 3: Node Profiles

  • GET /api/v1/profiles/{role} — Get role-based configuration profile
  • NodeProfile model with role-based configs
  • Per-node profile management (future enhancement)

Priority 4: Security & Auth ⚠️

  • Request validation (Pydantic schemas in schemas.py)
  • API key authentication (future)
  • JWT tokens for inter-node communication (future)
  • Rate limiting (future)

Priority 5: Monitoring & Metrics

  • Health check endpoint with DB connectivity
  • Metrics endpoint (basic)
  • Prometheus metrics export (prometheus_client) (future)
  • Performance metrics (request duration, DB queries) (future)
  • Structured logging (JSON) (future)

Priority 6: Testing

  • Unit tests (tests/test_crud.py) — CRUD operations
  • Integration tests (tests/test_api.py) — API endpoints
  • Load testing (future)

Priority 7: Bootstrap Tool

  • DAGI Node Agent Bootstrap (tools/dagi_node_agent/bootstrap.py)
    • Automatic hostname and IP detection
    • Registration with Node Registry
    • Local node_id storage (/etc/dagi/node_id or ~/.config/dagi/node_id)
    • Initial heartbeat after registration
    • CLI interface with role and labels support

Priority 8: DAGI Router Integration

  • Node Registry Client (utils/node_registry_client.py)
    • Async HTTP client for Node Registry API
    • Methods: get_nodes(), get_node(), get_nodes_by_role(), get_available_nodes()
    • Graceful degradation when service unavailable
    • Error handling and retries
  • Router Integration (router_app.py)
    • Added get_available_nodes() method
    • Node discovery for routing decisions
  • HTTP API (http_api.py)
    • New endpoint: GET /nodes (with role filter)
    • Proxy to Node Registry service
  • Test Scripts
    • scripts/test_node_registry.sh — API endpoint testing
    • scripts/test_bootstrap.sh — Bootstrap tool testing
    • scripts/init_node_registry_db.sh — Database initialization

🔧 Management Commands

Service Control

# Start
docker-compose up -d node-registry

# Stop
docker-compose stop node-registry

# Restart
docker-compose restart node-registry

# Rebuild
docker-compose up -d --build node-registry

# Logs
docker logs -f dagi-node-registry
docker-compose logs -f node-registry

Database Operations

# Connect to database
docker exec -it dagi-postgres psql -U node_registry_user -d node_registry

# List tables
\dt

# Query nodes
SELECT node_id, node_name, status, last_heartbeat FROM nodes;

# Query profiles
SELECT n.node_name, p.profile_name, p.profile_type, p.enabled 
FROM nodes n 
JOIN node_profiles p ON n.id = p.node_id;

📖 Documentation


Service Port Connection Purpose
PostgreSQL 5432 Required Database storage
DAGI Router 9102 Optional Node info for routing
Prometheus 9090 Optional Metrics scraping
Grafana 3000 Optional Monitoring dashboard

⚠️ Security Considerations

Network Security

  • Port 9205 accessible only from internal network
  • Firewall rules configured (UFW)
  • ⚠️ No authentication yet (to be added by Cursor)

Database Security

  • Secure password generated automatically
  • Dedicated database user with limited privileges
  • Password stored in .env (not committed to git)

Future Improvements

  • API key authentication
  • TLS/SSL for API communication
  • Rate limiting per node
  • Audit logging for node changes

🎯 Acceptance Criteria Status

Criteria Status Notes
Database node_registry created With tables and user
Environment variables configured In docker-compose.yml
Service added to docker-compose With health check
Port 9205 listens locally 🟡 After deployment
Accessible from Node #2 (LAN) 🟡 After deployment
Firewall blocks external 🟡 After deployment
INFRASTRUCTURE.md updated 🟡 See NODE-REGISTRY-STATUS.md
SYSTEM-INVENTORY.md updated 🚧 Todo

Last Updated: 2025-01-17 by WARP AI
Next Steps: Deploy to Node #1, hand over to Cursor for API implementation
Status: Infrastructure Complete — Ready for Cursor Implementation