# โœ… Node Registry Service โ€” Deployment Checklist **Version:** 1.0.0 **Date:** 2025-01-17 **Status:** Ready for Production --- ## ๐Ÿ“‹ Pre-Deployment Checklist ### Local Verification (Node #2) - [ ] **Test service locally** ```bash cd services/node-registry pip install -r requirements.txt export NODE_REGISTRY_ENV=development export NODE_REGISTRY_DB_HOST=localhost export NODE_REGISTRY_DB_NAME=node_registry python -m app.main ``` - [ ] **Verify endpoints** ```bash curl http://localhost:9205/health curl http://localhost:9205/metrics curl http://localhost:9205/docs # Interactive API docs ``` - [ ] **Test node registration** ```bash curl -X POST http://localhost:9205/api/v1/nodes/register \ -H "Content-Type: application/json" \ -d '{"hostname": "test-node", "ip": "192.168.1.100", "role": "test", "labels": ["test"]}' ``` - [ ] **Test heartbeat** ```bash curl -X POST http://localhost:9205/api/v1/nodes/heartbeat \ -H "Content-Type: application/json" \ -d '{"node_id": "node-test", "status": "online"}' ``` - [ ] **List nodes** ```bash curl http://localhost:9205/api/v1/nodes ``` - [ ] **Run tests** (if available) ```bash cd services/node-registry pytest tests/ ``` --- ## ๐Ÿš€ Production Deployment (Node #1) ### Step 1: Push to GitHub - [ ] **Commit changes** ```bash git add services/node-registry/ git add docker-compose.yml git add scripts/deploy-node-registry.sh git add NODE-REGISTRY-*.md git commit -m "feat: Node Registry Service - Full implementation" git push origin main ``` ### Step 2: Pull on Node #1 - [ ] **SSH to Node #1 and pull latest** ```bash ssh root@144.76.224.179 cd /opt/microdao-daarion git pull origin main ``` ### Step 3: Initialize Database - [ ] **Run SQL migration** ```bash # On Node #1 cd /opt/microdao-daarion # Copy SQL file to container docker cp services/node-registry/migrations/001_create_node_registry_tables.sql dagi-postgres:/tmp/ # Execute migration docker exec -i dagi-postgres psql -U postgres < /tmp/001_create_node_registry_tables.sql # Verify tables docker exec dagi-postgres psql -U postgres -d node_registry -c "\dt" ``` - [ ] **Generate secure password** ```bash # Generate password PASSWORD=$(openssl rand -base64 32) # Add to .env echo "NODE_REGISTRY_DB_PASSWORD=$PASSWORD" >> .env # Verify grep NODE_REGISTRY_DB_PASSWORD .env ``` ### Step 4: Build and Start Service - [ ] **Build Docker image** ```bash docker-compose build node-registry ``` - [ ] **Start service** ```bash docker-compose up -d node-registry ``` - [ ] **Check container status** ```bash docker-compose ps | grep node-registry docker logs dagi-node-registry --tail 50 ``` ### Step 5: Configure Firewall - [ ] **Set UFW rules** ```bash # Allow from local network ufw allow from 192.168.1.0/24 to any port 9205 proto tcp comment 'Node Registry - LAN' # Allow from Docker network ufw allow from 172.16.0.0/12 to any port 9205 proto tcp comment 'Node Registry - Docker' # Deny from external ufw deny 9205/tcp comment 'Node Registry - Block external' # Verify rules ufw status | grep 9205 ``` ### Step 6: Verify Deployment - [ ] **Health check** ```bash curl http://localhost:9205/health # Expected: {"status":"healthy",...,"database":{"connected":true,...}} ``` - [ ] **Metrics check** ```bash curl http://localhost:9205/metrics ``` - [ ] **Check database connectivity** ```bash docker exec dagi-postgres psql -U node_registry_user -d node_registry -c "SELECT COUNT(*) FROM nodes;" ``` ### Step 7: Register Nodes - [ ] **Register Node #1 (Production)** ```bash # Option A: Using bootstrap tool (if installed on Node #1) python -m tools.dagi_node_agent.bootstrap \ --role production-router \ --labels router,gateway,production \ --registry-url http://localhost:9205 # Option B: Manual API call curl -X POST http://localhost:9205/api/v1/nodes/register \ -H "Content-Type: application/json" \ -d '{ "hostname": "gateway.daarion.city", "ip": "144.76.224.179", "role": "production-router", "labels": ["router", "gateway", "production"] }' ``` - [ ] **Register Node #2 (Development) from MacBook** ```bash # From Node #2 python -m tools.dagi_node_agent.bootstrap \ --role development-router \ --labels router,development,mac,gpu \ --registry-url http://192.168.1.244:9205 ``` - [ ] **Verify node registration** ```bash # List all nodes curl http://localhost:9205/api/v1/nodes # Get specific node curl http://localhost:9205/api/v1/nodes/node-1-hetzner-gex44 ``` --- ## ๐Ÿงช Post-Deployment Testing ### Functional Tests - [ ] **Test node listing** ```bash # All nodes curl http://144.76.224.179:9205/api/v1/nodes # Filter by role curl "http://144.76.224.179:9205/api/v1/nodes?role=production-router" # Filter by label curl "http://144.76.224.179:9205/api/v1/nodes?label=gateway" # Filter by status curl "http://144.76.224.179:9205/api/v1/nodes?status=online" ``` - [ ] **Test heartbeat updates** ```bash curl -X POST http://144.76.224.179:9205/api/v1/nodes/heartbeat \ -H "Content-Type: application/json" \ -d '{"node_id": "node-1-hetzner-gex44", "status": "online"}' # Verify heartbeat timestamp updated curl http://144.76.224.179:9205/api/v1/nodes/node-1-hetzner-gex44 | grep last_heartbeat ``` - [ ] **Test role profiles** ```bash curl http://144.76.224.179:9205/api/v1/profiles/production-router ``` ### Network Access Tests - [ ] **Test from Node #2 (internal network)** ```bash # From MacBook curl http://144.76.224.179:9205/health ``` - [ ] **Verify external access blocked** ```bash # From external machine (should fail or timeout) curl --max-time 5 http://144.76.224.179:9205/health ``` ### Integration Tests - [ ] **DAGI Router integration** (future) ```bash # Test router can fetch node list curl http://dagi-router:9102/api/nodes ``` - [ ] **Prometheus scraping** (future) ```bash # Verify metrics endpoint is scrapable curl http://144.76.224.179:9205/metrics | grep node_registry ``` --- ## ๐Ÿ“Š Monitoring Setup ### Prometheus Configuration - [ ] **Add scrape job to prometheus.yml** ```yaml scrape_configs: - job_name: 'node-registry' static_configs: - targets: ['node-registry:9205'] scrape_interval: 30s ``` - [ ] **Reload Prometheus** ```bash docker-compose restart prometheus ``` ### Grafana Dashboard - [ ] **Create dashboard for Node Registry** - Panel: Node Registry uptime - Panel: Total registered nodes - Panel: Active vs offline nodes - Panel: Nodes by role - Panel: Recent heartbeats ### Health Check Alerts - [ ] **Configure alerting** (optional) ```yaml # prometheus/alerts/node_registry.yml groups: - name: node_registry rules: - alert: NodeRegistryDown expr: up{job="node-registry"} == 0 for: 5m labels: severity: critical annotations: summary: "Node Registry is down" ``` --- ## ๐Ÿ”„ Operational Tasks ### Regular Maintenance - [ ] **Weekly: Check node heartbeats** ```bash docker exec dagi-postgres psql -U postgres -d node_registry -c \ "SELECT node_id, last_heartbeat, status FROM nodes ORDER BY last_heartbeat DESC;" ``` - [ ] **Weekly: Clean old heartbeat logs** (if needed) ```bash docker exec dagi-postgres psql -U postgres -d node_registry -c \ "DELETE FROM heartbeat_log WHERE timestamp < NOW() - INTERVAL '30 days';" ``` - [ ] **Monthly: Review registered nodes** ```bash curl http://144.76.224.179:9205/api/v1/nodes | jq '.[] | {node_id, role, status, last_heartbeat}' ``` ### Backup - [ ] **Backup node_registry database** ```bash docker exec dagi-postgres pg_dump -U postgres node_registry > backups/node_registry_$(date +%Y%m%d).sql ``` --- ## ๐Ÿ“š Documentation Updates - [ ] **Update INFRASTRUCTURE.md** - Add Node Registry to services table (Port 9205) - Add environment variables section - [ ] **Update SYSTEM-INVENTORY.md** - Add node-registry service to inventory - Update total service count (17 โ†’ 18) - [ ] **Update WARP.md** - Add Node Registry service restart command - Add node registration examples --- ## โœ… Final Verification - [ ] Service running on Node #1 - [ ] Database initialized with schema - [ ] Firewall configured (internal only) - [ ] Node #1 registered and heartbeat working - [ ] Node #2 registered and heartbeat working - [ ] Health endpoint responding - [ ] Metrics endpoint responding - [ ] API endpoints functional - [ ] Documentation updated - [ ] Monitoring configured --- ## ๐ŸŽ‰ Deployment Complete! **Node Registry Service is now live and ready for production use.** ### Next Steps: 1. Integrate with DAGI Router for node discovery 2. Set up automated heartbeat cron jobs for each node 3. Add authentication/authorization 4. Implement Prometheus metrics export 5. Create Grafana dashboard --- **Deployed by:** [Your Name] **Date:** [Deployment Date] **Status:** โœ… Production Ready