9.0 KiB
✅ Node Registry Service — Deployment Checklist
Version: 1.0.0
Date: 2025-01-17
Status: Ready for Production
📋 Pre-Deployment Checklist
Local Verification (Node #2)
-
Test service locally
cd services/node-registry pip install -r requirements.txt export NODE_REGISTRY_ENV=development export NODE_REGISTRY_DB_HOST=localhost export NODE_REGISTRY_DB_NAME=node_registry python -m app.main -
Verify endpoints
curl http://localhost:9205/health curl http://localhost:9205/metrics curl http://localhost:9205/docs # Interactive API docs -
Test node registration
curl -X POST http://localhost:9205/api/v1/nodes/register \ -H "Content-Type: application/json" \ -d '{"hostname": "test-node", "ip": "192.168.1.100", "role": "test", "labels": ["test"]}' -
Test heartbeat
curl -X POST http://localhost:9205/api/v1/nodes/heartbeat \ -H "Content-Type: application/json" \ -d '{"node_id": "node-test", "status": "online"}' -
List nodes
curl http://localhost:9205/api/v1/nodes -
Run tests (if available)
cd services/node-registry pytest tests/
🚀 Production Deployment (Node #1)
Step 1: Push to GitHub
- Commit changes
git add services/node-registry/ git add docker-compose.yml git add scripts/deploy-node-registry.sh git add NODE-REGISTRY-*.md git commit -m "feat: Node Registry Service - Full implementation" git push origin main
Step 2: Pull on Node #1
- SSH to Node #1 and pull latest
ssh root@144.76.224.179 cd /opt/microdao-daarion git pull origin main
Step 3: Initialize Database
-
Run SQL migration
# On Node #1 cd /opt/microdao-daarion # Copy SQL file to container docker cp services/node-registry/migrations/001_create_node_registry_tables.sql dagi-postgres:/tmp/ # Execute migration docker exec -i dagi-postgres psql -U postgres < /tmp/001_create_node_registry_tables.sql # Verify tables docker exec dagi-postgres psql -U postgres -d node_registry -c "\dt" -
Generate secure password
# Generate password PASSWORD=$(openssl rand -base64 32) # Add to .env echo "NODE_REGISTRY_DB_PASSWORD=$PASSWORD" >> .env # Verify grep NODE_REGISTRY_DB_PASSWORD .env
Step 4: Build and Start Service
-
Build Docker image
docker-compose build node-registry -
Start service
docker-compose up -d node-registry -
Check container status
docker-compose ps | grep node-registry docker logs dagi-node-registry --tail 50
Step 5: Configure Firewall
- Set UFW rules
# Allow from local network ufw allow from 192.168.1.0/24 to any port 9205 proto tcp comment 'Node Registry - LAN' # Allow from Docker network ufw allow from 172.16.0.0/12 to any port 9205 proto tcp comment 'Node Registry - Docker' # Deny from external ufw deny 9205/tcp comment 'Node Registry - Block external' # Verify rules ufw status | grep 9205
Step 6: Verify Deployment
-
Health check
curl http://localhost:9205/health # Expected: {"status":"healthy",...,"database":{"connected":true,...}} -
Metrics check
curl http://localhost:9205/metrics -
Check database connectivity
docker exec dagi-postgres psql -U node_registry_user -d node_registry -c "SELECT COUNT(*) FROM nodes;"
Step 7: Register Nodes
-
Register Node #1 (Production)
# Option A: Using bootstrap tool (if installed on Node #1) python -m tools.dagi_node_agent.bootstrap \ --role production-router \ --labels router,gateway,production \ --registry-url http://localhost:9205 # Option B: Manual API call curl -X POST http://localhost:9205/api/v1/nodes/register \ -H "Content-Type: application/json" \ -d '{ "hostname": "gateway.daarion.city", "ip": "144.76.224.179", "role": "production-router", "labels": ["router", "gateway", "production"] }' -
Register Node #2 (Development) from MacBook
# From Node #2 python -m tools.dagi_node_agent.bootstrap \ --role development-router \ --labels router,development,mac,gpu \ --registry-url http://192.168.1.244:9205 -
Verify node registration
# List all nodes curl http://localhost:9205/api/v1/nodes # Get specific node curl http://localhost:9205/api/v1/nodes/node-1-hetzner-gex44
🧪 Post-Deployment Testing
Functional Tests
-
Test node listing
# All nodes curl http://144.76.224.179:9205/api/v1/nodes # Filter by role curl "http://144.76.224.179:9205/api/v1/nodes?role=production-router" # Filter by label curl "http://144.76.224.179:9205/api/v1/nodes?label=gateway" # Filter by status curl "http://144.76.224.179:9205/api/v1/nodes?status=online" -
Test heartbeat updates
curl -X POST http://144.76.224.179:9205/api/v1/nodes/heartbeat \ -H "Content-Type: application/json" \ -d '{"node_id": "node-1-hetzner-gex44", "status": "online"}' # Verify heartbeat timestamp updated curl http://144.76.224.179:9205/api/v1/nodes/node-1-hetzner-gex44 | grep last_heartbeat -
Test role profiles
curl http://144.76.224.179:9205/api/v1/profiles/production-router
Network Access Tests
-
Test from Node #2 (internal network)
# From MacBook curl http://144.76.224.179:9205/health -
Verify external access blocked
# From external machine (should fail or timeout) curl --max-time 5 http://144.76.224.179:9205/health
Integration Tests
-
DAGI Router integration (future)
# Test router can fetch node list curl http://dagi-router:9102/api/nodes -
Prometheus scraping (future)
# Verify metrics endpoint is scrapable curl http://144.76.224.179:9205/metrics | grep node_registry
📊 Monitoring Setup
Prometheus Configuration
-
Add scrape job to prometheus.yml
scrape_configs: - job_name: 'node-registry' static_configs: - targets: ['node-registry:9205'] scrape_interval: 30s -
Reload Prometheus
docker-compose restart prometheus
Grafana Dashboard
- Create dashboard for Node Registry
- Panel: Node Registry uptime
- Panel: Total registered nodes
- Panel: Active vs offline nodes
- Panel: Nodes by role
- Panel: Recent heartbeats
Health Check Alerts
- Configure alerting (optional)
# prometheus/alerts/node_registry.yml groups: - name: node_registry rules: - alert: NodeRegistryDown expr: up{job="node-registry"} == 0 for: 5m labels: severity: critical annotations: summary: "Node Registry is down"
🔄 Operational Tasks
Regular Maintenance
-
Weekly: Check node heartbeats
docker exec dagi-postgres psql -U postgres -d node_registry -c \ "SELECT node_id, last_heartbeat, status FROM nodes ORDER BY last_heartbeat DESC;" -
Weekly: Clean old heartbeat logs (if needed)
docker exec dagi-postgres psql -U postgres -d node_registry -c \ "DELETE FROM heartbeat_log WHERE timestamp < NOW() - INTERVAL '30 days';" -
Monthly: Review registered nodes
curl http://144.76.224.179:9205/api/v1/nodes | jq '.[] | {node_id, role, status, last_heartbeat}'
Backup
- Backup node_registry database
docker exec dagi-postgres pg_dump -U postgres node_registry > backups/node_registry_$(date +%Y%m%d).sql
📚 Documentation Updates
-
Update INFRASTRUCTURE.md
- Add Node Registry to services table (Port 9205)
- Add environment variables section
-
Update SYSTEM-INVENTORY.md
- Add node-registry service to inventory
- Update total service count (17 → 18)
-
Update WARP.md
- Add Node Registry service restart command
- Add node registration examples
✅ Final Verification
- Service running on Node #1
- Database initialized with schema
- Firewall configured (internal only)
- Node #1 registered and heartbeat working
- Node #2 registered and heartbeat working
- Health endpoint responding
- Metrics endpoint responding
- API endpoints functional
- Documentation updated
- Monitoring configured
🎉 Deployment Complete!
Node Registry Service is now live and ready for production use.
Next Steps:
- Integrate with DAGI Router for node discovery
- Set up automated heartbeat cron jobs for each node
- Add authentication/authorization
- Implement Prometheus metrics export
- Create Grafana dashboard
Deployed by: [Your Name]
Date: [Deployment Date]
Status: ✅ Production Ready