Files
microdao-daarion/NODE-REGISTRY-DEPLOYMENT-CHECKLIST.md

9.0 KiB

Node Registry Service — Deployment Checklist

Version: 1.0.0
Date: 2025-01-17
Status: Ready for Production


📋 Pre-Deployment Checklist

Local Verification (Node #2)

  • Test service locally

    cd services/node-registry
    pip install -r requirements.txt
    export NODE_REGISTRY_ENV=development
    export NODE_REGISTRY_DB_HOST=localhost
    export NODE_REGISTRY_DB_NAME=node_registry
    python -m app.main
    
  • Verify endpoints

    curl http://localhost:9205/health
    curl http://localhost:9205/metrics
    curl http://localhost:9205/docs  # Interactive API docs
    
  • Test node registration

    curl -X POST http://localhost:9205/api/v1/nodes/register \
      -H "Content-Type: application/json" \
      -d '{"hostname": "test-node", "ip": "192.168.1.100", "role": "test", "labels": ["test"]}'
    
  • Test heartbeat

    curl -X POST http://localhost:9205/api/v1/nodes/heartbeat \
      -H "Content-Type: application/json" \
      -d '{"node_id": "node-test", "status": "online"}'
    
  • List nodes

    curl http://localhost:9205/api/v1/nodes
    
  • Run tests (if available)

    cd services/node-registry
    pytest tests/
    

🚀 Production Deployment (Node #1)

Step 1: Push to GitHub

  • Commit changes
    git add services/node-registry/
    git add docker-compose.yml
    git add scripts/deploy-node-registry.sh
    git add NODE-REGISTRY-*.md
    git commit -m "feat: Node Registry Service - Full implementation"
    git push origin main
    

Step 2: Pull on Node #1

  • SSH to Node #1 and pull latest
    ssh root@144.76.224.179
    cd /opt/microdao-daarion
    git pull origin main
    

Step 3: Initialize Database

  • Run SQL migration

    # On Node #1
    cd /opt/microdao-daarion
    
    # Copy SQL file to container
    docker cp services/node-registry/migrations/001_create_node_registry_tables.sql dagi-postgres:/tmp/
    
    # Execute migration
    docker exec -i dagi-postgres psql -U postgres < /tmp/001_create_node_registry_tables.sql
    
    # Verify tables
    docker exec dagi-postgres psql -U postgres -d node_registry -c "\dt"
    
  • Generate secure password

    # Generate password
    PASSWORD=$(openssl rand -base64 32)
    
    # Add to .env
    echo "NODE_REGISTRY_DB_PASSWORD=$PASSWORD" >> .env
    
    # Verify
    grep NODE_REGISTRY_DB_PASSWORD .env
    

Step 4: Build and Start Service

  • Build Docker image

    docker-compose build node-registry
    
  • Start service

    docker-compose up -d node-registry
    
  • Check container status

    docker-compose ps | grep node-registry
    docker logs dagi-node-registry --tail 50
    

Step 5: Configure Firewall

  • Set UFW rules
    # Allow from local network
    ufw allow from 192.168.1.0/24 to any port 9205 proto tcp comment 'Node Registry - LAN'
    
    # Allow from Docker network
    ufw allow from 172.16.0.0/12 to any port 9205 proto tcp comment 'Node Registry - Docker'
    
    # Deny from external
    ufw deny 9205/tcp comment 'Node Registry - Block external'
    
    # Verify rules
    ufw status | grep 9205
    

Step 6: Verify Deployment

  • Health check

    curl http://localhost:9205/health
    # Expected: {"status":"healthy",...,"database":{"connected":true,...}}
    
  • Metrics check

    curl http://localhost:9205/metrics
    
  • Check database connectivity

    docker exec dagi-postgres psql -U node_registry_user -d node_registry -c "SELECT COUNT(*) FROM nodes;"
    

Step 7: Register Nodes

  • Register Node #1 (Production)

    # Option A: Using bootstrap tool (if installed on Node #1)
    python -m tools.dagi_node_agent.bootstrap \
      --role production-router \
      --labels router,gateway,production \
      --registry-url http://localhost:9205
    
    # Option B: Manual API call
    curl -X POST http://localhost:9205/api/v1/nodes/register \
      -H "Content-Type: application/json" \
      -d '{
        "hostname": "gateway.daarion.city",
        "ip": "144.76.224.179",
        "role": "production-router",
        "labels": ["router", "gateway", "production"]
      }'
    
  • Register Node #2 (Development) from MacBook

    # From Node #2
    python -m tools.dagi_node_agent.bootstrap \
      --role development-router \
      --labels router,development,mac,gpu \
      --registry-url http://192.168.1.244:9205
    
  • Verify node registration

    # List all nodes
    curl http://localhost:9205/api/v1/nodes
    
    # Get specific node
    curl http://localhost:9205/api/v1/nodes/node-1-hetzner-gex44
    

🧪 Post-Deployment Testing

Functional Tests

  • Test node listing

    # All nodes
    curl http://144.76.224.179:9205/api/v1/nodes
    
    # Filter by role
    curl "http://144.76.224.179:9205/api/v1/nodes?role=production-router"
    
    # Filter by label
    curl "http://144.76.224.179:9205/api/v1/nodes?label=gateway"
    
    # Filter by status
    curl "http://144.76.224.179:9205/api/v1/nodes?status=online"
    
  • Test heartbeat updates

    curl -X POST http://144.76.224.179:9205/api/v1/nodes/heartbeat \
      -H "Content-Type: application/json" \
      -d '{"node_id": "node-1-hetzner-gex44", "status": "online"}'
    
    # Verify heartbeat timestamp updated
    curl http://144.76.224.179:9205/api/v1/nodes/node-1-hetzner-gex44 | grep last_heartbeat
    
  • Test role profiles

    curl http://144.76.224.179:9205/api/v1/profiles/production-router
    

Network Access Tests

  • Test from Node #2 (internal network)

    # From MacBook
    curl http://144.76.224.179:9205/health
    
  • Verify external access blocked

    # From external machine (should fail or timeout)
    curl --max-time 5 http://144.76.224.179:9205/health
    

Integration Tests

  • DAGI Router integration (future)

    # Test router can fetch node list
    curl http://dagi-router:9102/api/nodes
    
  • Prometheus scraping (future)

    # Verify metrics endpoint is scrapable
    curl http://144.76.224.179:9205/metrics | grep node_registry
    

📊 Monitoring Setup

Prometheus Configuration

  • Add scrape job to prometheus.yml

    scrape_configs:
      - job_name: 'node-registry'
        static_configs:
          - targets: ['node-registry:9205']
        scrape_interval: 30s
    
  • Reload Prometheus

    docker-compose restart prometheus
    

Grafana Dashboard

  • Create dashboard for Node Registry
    • Panel: Node Registry uptime
    • Panel: Total registered nodes
    • Panel: Active vs offline nodes
    • Panel: Nodes by role
    • Panel: Recent heartbeats

Health Check Alerts

  • Configure alerting (optional)
    # prometheus/alerts/node_registry.yml
    groups:
      - name: node_registry
        rules:
          - alert: NodeRegistryDown
            expr: up{job="node-registry"} == 0
            for: 5m
            labels:
              severity: critical
            annotations:
              summary: "Node Registry is down"
    

🔄 Operational Tasks

Regular Maintenance

  • Weekly: Check node heartbeats

    docker exec dagi-postgres psql -U postgres -d node_registry -c \
      "SELECT node_id, last_heartbeat, status FROM nodes ORDER BY last_heartbeat DESC;"
    
  • Weekly: Clean old heartbeat logs (if needed)

    docker exec dagi-postgres psql -U postgres -d node_registry -c \
      "DELETE FROM heartbeat_log WHERE timestamp < NOW() - INTERVAL '30 days';"
    
  • Monthly: Review registered nodes

    curl http://144.76.224.179:9205/api/v1/nodes | jq '.[] | {node_id, role, status, last_heartbeat}'
    

Backup

  • Backup node_registry database
    docker exec dagi-postgres pg_dump -U postgres node_registry > backups/node_registry_$(date +%Y%m%d).sql
    

📚 Documentation Updates

  • Update INFRASTRUCTURE.md

    • Add Node Registry to services table (Port 9205)
    • Add environment variables section
  • Update SYSTEM-INVENTORY.md

    • Add node-registry service to inventory
    • Update total service count (17 → 18)
  • Update WARP.md

    • Add Node Registry service restart command
    • Add node registration examples

Final Verification

  • Service running on Node #1
  • Database initialized with schema
  • Firewall configured (internal only)
  • Node #1 registered and heartbeat working
  • Node #2 registered and heartbeat working
  • Health endpoint responding
  • Metrics endpoint responding
  • API endpoints functional
  • Documentation updated
  • Monitoring configured

🎉 Deployment Complete!

Node Registry Service is now live and ready for production use.

Next Steps:

  1. Integrate with DAGI Router for node discovery
  2. Set up automated heartbeat cron jobs for each node
  3. Add authentication/authorization
  4. Implement Prometheus metrics export
  5. Create Grafana dashboard

Deployed by: [Your Name]
Date: [Deployment Date]
Status: Production Ready