9.0 KiB
Node Registry Service
Version: 0.1.0-stub
Status: 🟡 Stub Implementation (Infrastructure Ready)
Port: 9205 (Internal only)
Central registry for DAGI network nodes (Node #1, Node #2, Node #N).
Overview
Node Registry Service provides:
- Node Registration — Register new nodes in DAGI network
- Heartbeat Tracking — Monitor node health and availability
- Node Discovery — Query available nodes and their capabilities
- Profile Management — Store node profiles (LLM configs, services, capabilities)
Current Implementation
✅ Completed (Infrastructure)
- FastAPI application with /health and /metrics endpoints
- Docker container configuration
- PostgreSQL database schema
- docker-compose integration
- Deployment script for Node #1
🚧 To Be Implemented (by Cursor)
- Full REST API endpoints
- Node registration logic
- Heartbeat mechanism
- Database integration (SQLAlchemy models)
- Prometheus metrics export
- Node discovery algorithms
Quick Start
Local Development
# Install dependencies
cd services/node-registry
pip install -r requirements.txt
# Set environment variables
export NODE_REGISTRY_DB_HOST=localhost
export NODE_REGISTRY_DB_PORT=5432
export NODE_REGISTRY_DB_NAME=node_registry
export NODE_REGISTRY_DB_USER=node_registry_user
export NODE_REGISTRY_DB_PASSWORD=your_password
export NODE_REGISTRY_HTTP_PORT=9205
export NODE_REGISTRY_ENV=development
export NODE_REGISTRY_LOG_LEVEL=debug
# Run service
python -m app.main
Service will start on http://localhost:9205
Docker (Recommended)
# Build image
docker-compose build node-registry
# Start service
docker-compose up -d node-registry
# Check logs
docker-compose logs -f node-registry
# Check health
curl http://localhost:9205/health
Deploy to Node #1 (Production)
# From Node #2 (MacBook)
./scripts/deploy-node-registry.sh
This will:
- Initialize PostgreSQL database
- Configure environment variables
- Build Docker image
- Start service
- Configure firewall rules (internal access only)
- Verify deployment
API Endpoints
Health & Monitoring
GET /health
Health check endpoint (used by Docker, Prometheus, etc.)
Response:
{
"status": "healthy",
"service": "node-registry",
"version": "0.1.0-stub",
"environment": "production",
"uptime_seconds": 3600.5,
"timestamp": "2025-01-17T14:30:00Z",
"database": {
"connected": true,
"host": "postgres",
"port": 5432,
"database": "node_registry"
}
}
GET /metrics
Prometheus-compatible metrics endpoint
Response:
{
"service": "node-registry",
"uptime_seconds": 3600.5,
"total_nodes": 2,
"active_nodes": 1,
"timestamp": "2025-01-17T14:30:00Z"
}
Node Management (Stub - To Be Implemented)
POST /api/v1/nodes/register
Register a new node
Status: 501 Not Implemented (stub)
POST /api/v1/nodes/{node_id}/heartbeat
Update node heartbeat
Status: 501 Not Implemented (stub)
GET /api/v1/nodes
List all registered nodes
Status: 501 Not Implemented (stub)
GET /api/v1/nodes/{node_id}
Get specific node information
Status: 501 Not Implemented (stub)
Database Schema
Tables
nodes
Core node registry
| Column | Type | Description |
|---|---|---|
| id | UUID | Primary key |
| node_id | VARCHAR(255) | Unique node identifier (e.g. node-1-hetzner-gex44) |
| node_name | VARCHAR(255) | Human-readable name |
| node_role | VARCHAR(50) | production, development, backup |
| node_type | VARCHAR(50) | router, gateway, worker, etc. |
| ip_address | INET | Public IP |
| local_ip | INET | Local network IP |
| hostname | VARCHAR(255) | DNS hostname |
| status | VARCHAR(50) | online, offline, maintenance, degraded |
| last_heartbeat | TIMESTAMP | Last heartbeat time |
| registered_at | TIMESTAMP | Registration timestamp |
| updated_at | TIMESTAMP | Last update timestamp |
| metadata | JSONB | Additional node metadata |
node_profiles
Node capabilities and configurations
| Column | Type | Description |
|---|---|---|
| id | UUID | Primary key |
| node_id | UUID | Foreign key to nodes.id |
| profile_name | VARCHAR(255) | Profile identifier |
| profile_type | VARCHAR(50) | llm, service, capability |
| config | JSONB | Profile configuration |
| enabled | BOOLEAN | Profile active status |
| created_at | TIMESTAMP | Creation timestamp |
| updated_at | TIMESTAMP | Last update timestamp |
heartbeat_log
Historical heartbeat data
| Column | Type | Description |
|---|---|---|
| id | UUID | Primary key |
| node_id | UUID | Foreign key to nodes.id |
| timestamp | TIMESTAMP | Heartbeat timestamp |
| status | VARCHAR(50) | Node status at heartbeat |
| metrics | JSONB | System metrics (CPU, RAM, etc.) |
Environment Variables
| Variable | Default | Description |
|---|---|---|
| NODE_REGISTRY_DB_HOST | postgres | PostgreSQL host |
| NODE_REGISTRY_DB_PORT | 5432 | PostgreSQL port |
| NODE_REGISTRY_DB_NAME | node_registry | Database name |
| NODE_REGISTRY_DB_USER | node_registry_user | Database user |
| NODE_REGISTRY_DB_PASSWORD | - | Database password (required) |
| NODE_REGISTRY_HTTP_PORT | 9205 | HTTP server port |
| NODE_REGISTRY_ENV | production | Environment (development/production) |
| NODE_REGISTRY_LOG_LEVEL | info | Log level (debug/info/warning/error) |
Security
Network Access
- Port 9205: Internal network only (Node #1, Node #2, DAGI nodes)
- Public Access: Blocked by firewall (UFW rules)
- Authentication: To be implemented (API keys, JWT)
Firewall Rules (Node #1)
# Allow from local network
ufw allow from 192.168.1.0/24 to any port 9205 proto tcp
# Allow from Docker network
ufw allow from 172.16.0.0/12 to any port 9205 proto tcp
# Deny from external
ufw deny 9205/tcp
Database Initialization
Manual Setup
# On Node #1
ssh root@144.76.224.179
# Copy SQL script to container
docker cp services/node-registry/migrations/init_node_registry.sql dagi-postgres:/tmp/
# Run initialization
docker exec -i dagi-postgres psql -U postgres < /tmp/init_node_registry.sql
# Verify
docker exec dagi-postgres psql -U postgres -d node_registry -c "\dt"
Via Deployment Script
The deploy-node-registry.sh script automatically:
- Checks if database exists
- Creates database and user if needed
- Generates secure password
- Saves password to .env
Monitoring & Health
Docker Health Check
docker inspect dagi-node-registry | grep -A 5 Health
Prometheus Scraping
Add to prometheus.yml:
scrape_configs:
- job_name: 'node-registry'
static_configs:
- targets: ['node-registry:9205']
scrape_interval: 30s
Grafana Dashboard
Add panel with query:
up{job="node-registry"}
Development
Testing Locally
# Run with development settings
export NODE_REGISTRY_ENV=development
python -m app.main
# Access interactive API docs
open http://localhost:9205/docs
Adding New Endpoints
- Edit
app/main.py - Add route with
@app.get()or@app.post() - Add Pydantic models for request/response
- Implement database logic (when ready)
- Test via /docs or curl
- Update this README
Troubleshooting
Service won't start
# Check logs
docker logs dagi-node-registry
# Check database connection
docker exec dagi-postgres pg_isready
# Check environment variables
docker exec dagi-node-registry env | grep NODE_REGISTRY
Database connection error
# Verify database exists
docker exec dagi-postgres psql -U postgres -l | grep node_registry
# Verify user exists
docker exec dagi-postgres psql -U postgres -c "\du" | grep node_registry_user
# Test connection
docker exec dagi-postgres psql -U node_registry_user -d node_registry -c "SELECT 1"
Port not accessible
# Check firewall rules
sudo ufw status | grep 9205
# Check if service is listening
netstat -tlnp | grep 9205
# Test from Node #2
curl http://144.76.224.179:9205/health
Next Steps (for Cursor)
-
Implement Database Layer
- SQLAlchemy models for nodes, profiles, heartbeat
- Database connection pool
- Migration system (Alembic)
-
Implement API Endpoints
- Node registration with validation
- Heartbeat updates with metrics
- Node listing with filters
- Profile CRUD operations
-
Add Authentication
- API key-based auth
- JWT tokens for inter-node communication
- Rate limiting
-
Add Monitoring
- Prometheus metrics export
- Health check improvements
- Performance metrics
-
Add Tests
- Unit tests (pytest)
- Integration tests
- API endpoint tests
Links
- INFRASTRUCTURE.md — Infrastructure overview
- WARP.md — Main developer guide
- docker-compose.yml — Service configuration
Last Updated: 2025-01-17
Maintained by: Ivan Tytar & DAARION Team
Status: 🟡 Infrastructure Ready — Awaiting Cursor implementation