405 lines
9.0 KiB
Markdown
405 lines
9.0 KiB
Markdown
# Node Registry Service
|
|
|
|
**Version:** 0.1.0-stub
|
|
**Status:** 🟡 Stub Implementation (Infrastructure Ready)
|
|
**Port:** 9205 (Internal only)
|
|
|
|
Central registry for DAGI network nodes (Node #1, Node #2, Node #N).
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
Node Registry Service provides:
|
|
- **Node Registration** — Register new nodes in DAGI network
|
|
- **Heartbeat Tracking** — Monitor node health and availability
|
|
- **Node Discovery** — Query available nodes and their capabilities
|
|
- **Profile Management** — Store node profiles (LLM configs, services, capabilities)
|
|
|
|
---
|
|
|
|
## Current Implementation
|
|
|
|
### ✅ Completed (Infrastructure)
|
|
- FastAPI application with /health and /metrics endpoints
|
|
- Docker container configuration
|
|
- PostgreSQL database schema
|
|
- docker-compose integration
|
|
- Deployment script for Node #1
|
|
|
|
### 🚧 To Be Implemented (by Cursor)
|
|
- Full REST API endpoints
|
|
- Node registration logic
|
|
- Heartbeat mechanism
|
|
- Database integration (SQLAlchemy models)
|
|
- Prometheus metrics export
|
|
- Node discovery algorithms
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### Local Development
|
|
|
|
```bash
|
|
# Install dependencies
|
|
cd services/node-registry
|
|
pip install -r requirements.txt
|
|
|
|
# Set environment variables
|
|
export NODE_REGISTRY_DB_HOST=localhost
|
|
export NODE_REGISTRY_DB_PORT=5432
|
|
export NODE_REGISTRY_DB_NAME=node_registry
|
|
export NODE_REGISTRY_DB_USER=node_registry_user
|
|
export NODE_REGISTRY_DB_PASSWORD=your_password
|
|
export NODE_REGISTRY_HTTP_PORT=9205
|
|
export NODE_REGISTRY_ENV=development
|
|
export NODE_REGISTRY_LOG_LEVEL=debug
|
|
|
|
# Run service
|
|
python -m app.main
|
|
```
|
|
|
|
Service will start on http://localhost:9205
|
|
|
|
### Docker (Recommended)
|
|
|
|
```bash
|
|
# Build image
|
|
docker-compose build node-registry
|
|
|
|
# Start service
|
|
docker-compose up -d node-registry
|
|
|
|
# Check logs
|
|
docker-compose logs -f node-registry
|
|
|
|
# Check health
|
|
curl http://localhost:9205/health
|
|
```
|
|
|
|
### Deploy to Node #1 (Production)
|
|
|
|
```bash
|
|
# From Node #2 (MacBook)
|
|
./scripts/deploy-node-registry.sh
|
|
```
|
|
|
|
This will:
|
|
1. Initialize PostgreSQL database
|
|
2. Configure environment variables
|
|
3. Build Docker image
|
|
4. Start service
|
|
5. Configure firewall rules (internal access only)
|
|
6. Verify deployment
|
|
|
|
---
|
|
|
|
## API Endpoints
|
|
|
|
### Health & Monitoring
|
|
|
|
#### GET /health
|
|
Health check endpoint (used by Docker, Prometheus, etc.)
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"status": "healthy",
|
|
"service": "node-registry",
|
|
"version": "0.1.0-stub",
|
|
"environment": "production",
|
|
"uptime_seconds": 3600.5,
|
|
"timestamp": "2025-01-17T14:30:00Z",
|
|
"database": {
|
|
"connected": true,
|
|
"host": "postgres",
|
|
"port": 5432,
|
|
"database": "node_registry"
|
|
}
|
|
}
|
|
```
|
|
|
|
#### GET /metrics
|
|
Prometheus-compatible metrics endpoint
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"service": "node-registry",
|
|
"uptime_seconds": 3600.5,
|
|
"total_nodes": 2,
|
|
"active_nodes": 1,
|
|
"timestamp": "2025-01-17T14:30:00Z"
|
|
}
|
|
```
|
|
|
|
### Node Management (Stub - To Be Implemented)
|
|
|
|
#### POST /api/v1/nodes/register
|
|
Register a new node
|
|
|
|
**Status:** 501 Not Implemented (stub)
|
|
|
|
#### POST /api/v1/nodes/{node_id}/heartbeat
|
|
Update node heartbeat
|
|
|
|
**Status:** 501 Not Implemented (stub)
|
|
|
|
#### GET /api/v1/nodes
|
|
List all registered nodes
|
|
|
|
**Status:** 501 Not Implemented (stub)
|
|
|
|
#### GET /api/v1/nodes/{node_id}
|
|
Get specific node information
|
|
|
|
**Status:** 501 Not Implemented (stub)
|
|
|
|
---
|
|
|
|
## Database Schema
|
|
|
|
### Tables
|
|
|
|
#### `nodes`
|
|
Core node registry
|
|
|
|
| Column | Type | Description |
|
|
|--------|------|-------------|
|
|
| id | UUID | Primary key |
|
|
| node_id | VARCHAR(255) | Unique node identifier (e.g. node-1-hetzner-gex44) |
|
|
| node_name | VARCHAR(255) | Human-readable name |
|
|
| node_role | VARCHAR(50) | production, development, backup |
|
|
| node_type | VARCHAR(50) | router, gateway, worker, etc. |
|
|
| ip_address | INET | Public IP |
|
|
| local_ip | INET | Local network IP |
|
|
| hostname | VARCHAR(255) | DNS hostname |
|
|
| status | VARCHAR(50) | online, offline, maintenance, degraded |
|
|
| last_heartbeat | TIMESTAMP | Last heartbeat time |
|
|
| registered_at | TIMESTAMP | Registration timestamp |
|
|
| updated_at | TIMESTAMP | Last update timestamp |
|
|
| metadata | JSONB | Additional node metadata |
|
|
|
|
#### `node_profiles`
|
|
Node capabilities and configurations
|
|
|
|
| Column | Type | Description |
|
|
|--------|------|-------------|
|
|
| id | UUID | Primary key |
|
|
| node_id | UUID | Foreign key to nodes.id |
|
|
| profile_name | VARCHAR(255) | Profile identifier |
|
|
| profile_type | VARCHAR(50) | llm, service, capability |
|
|
| config | JSONB | Profile configuration |
|
|
| enabled | BOOLEAN | Profile active status |
|
|
| created_at | TIMESTAMP | Creation timestamp |
|
|
| updated_at | TIMESTAMP | Last update timestamp |
|
|
|
|
#### `heartbeat_log`
|
|
Historical heartbeat data
|
|
|
|
| Column | Type | Description |
|
|
|--------|------|-------------|
|
|
| id | UUID | Primary key |
|
|
| node_id | UUID | Foreign key to nodes.id |
|
|
| timestamp | TIMESTAMP | Heartbeat timestamp |
|
|
| status | VARCHAR(50) | Node status at heartbeat |
|
|
| metrics | JSONB | System metrics (CPU, RAM, etc.) |
|
|
|
|
---
|
|
|
|
## Environment Variables
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| NODE_REGISTRY_DB_HOST | postgres | PostgreSQL host |
|
|
| NODE_REGISTRY_DB_PORT | 5432 | PostgreSQL port |
|
|
| NODE_REGISTRY_DB_NAME | node_registry | Database name |
|
|
| NODE_REGISTRY_DB_USER | node_registry_user | Database user |
|
|
| NODE_REGISTRY_DB_PASSWORD | - | Database password (required) |
|
|
| NODE_REGISTRY_HTTP_PORT | 9205 | HTTP server port |
|
|
| NODE_REGISTRY_ENV | production | Environment (development/production) |
|
|
| NODE_REGISTRY_LOG_LEVEL | info | Log level (debug/info/warning/error) |
|
|
|
|
---
|
|
|
|
## Security
|
|
|
|
### Network Access
|
|
- **Port 9205:** Internal network only (Node #1, Node #2, DAGI nodes)
|
|
- **Public Access:** Blocked by firewall (UFW rules)
|
|
- **Authentication:** To be implemented (API keys, JWT)
|
|
|
|
### Firewall Rules (Node #1)
|
|
```bash
|
|
# Allow from local network
|
|
ufw allow from 192.168.1.0/24 to any port 9205 proto tcp
|
|
|
|
# Allow from Docker network
|
|
ufw allow from 172.16.0.0/12 to any port 9205 proto tcp
|
|
|
|
# Deny from external
|
|
ufw deny 9205/tcp
|
|
```
|
|
|
|
---
|
|
|
|
## Database Initialization
|
|
|
|
### Manual Setup
|
|
|
|
```bash
|
|
# On Node #1
|
|
ssh root@144.76.224.179
|
|
|
|
# Copy SQL script to container
|
|
docker cp services/node-registry/migrations/init_node_registry.sql dagi-postgres:/tmp/
|
|
|
|
# Run initialization
|
|
docker exec -i dagi-postgres psql -U postgres < /tmp/init_node_registry.sql
|
|
|
|
# Verify
|
|
docker exec dagi-postgres psql -U postgres -d node_registry -c "\dt"
|
|
```
|
|
|
|
### Via Deployment Script
|
|
|
|
The `deploy-node-registry.sh` script automatically:
|
|
1. Checks if database exists
|
|
2. Creates database and user if needed
|
|
3. Generates secure password
|
|
4. Saves password to .env
|
|
|
|
---
|
|
|
|
## Monitoring & Health
|
|
|
|
### Docker Health Check
|
|
```bash
|
|
docker inspect dagi-node-registry | grep -A 5 Health
|
|
```
|
|
|
|
### Prometheus Scraping
|
|
Add to prometheus.yml:
|
|
```yaml
|
|
scrape_configs:
|
|
- job_name: 'node-registry'
|
|
static_configs:
|
|
- targets: ['node-registry:9205']
|
|
scrape_interval: 30s
|
|
```
|
|
|
|
### Grafana Dashboard
|
|
Add panel with query:
|
|
```promql
|
|
up{job="node-registry"}
|
|
```
|
|
|
|
---
|
|
|
|
## Development
|
|
|
|
### Testing Locally
|
|
|
|
```bash
|
|
# Run with development settings
|
|
export NODE_REGISTRY_ENV=development
|
|
python -m app.main
|
|
|
|
# Access interactive API docs
|
|
open http://localhost:9205/docs
|
|
```
|
|
|
|
### Adding New Endpoints
|
|
|
|
1. Edit `app/main.py`
|
|
2. Add route with `@app.get()` or `@app.post()`
|
|
3. Add Pydantic models for request/response
|
|
4. Implement database logic (when ready)
|
|
5. Test via /docs or curl
|
|
6. Update this README
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Service won't start
|
|
```bash
|
|
# Check logs
|
|
docker logs dagi-node-registry
|
|
|
|
# Check database connection
|
|
docker exec dagi-postgres pg_isready
|
|
|
|
# Check environment variables
|
|
docker exec dagi-node-registry env | grep NODE_REGISTRY
|
|
```
|
|
|
|
### Database connection error
|
|
```bash
|
|
# Verify database exists
|
|
docker exec dagi-postgres psql -U postgres -l | grep node_registry
|
|
|
|
# Verify user exists
|
|
docker exec dagi-postgres psql -U postgres -c "\du" | grep node_registry_user
|
|
|
|
# Test connection
|
|
docker exec dagi-postgres psql -U node_registry_user -d node_registry -c "SELECT 1"
|
|
```
|
|
|
|
### Port not accessible
|
|
```bash
|
|
# Check firewall rules
|
|
sudo ufw status | grep 9205
|
|
|
|
# Check if service is listening
|
|
netstat -tlnp | grep 9205
|
|
|
|
# Test from Node #2
|
|
curl http://144.76.224.179:9205/health
|
|
```
|
|
|
|
---
|
|
|
|
## Next Steps (for Cursor)
|
|
|
|
1. **Implement Database Layer**
|
|
- SQLAlchemy models for nodes, profiles, heartbeat
|
|
- Database connection pool
|
|
- Migration system (Alembic)
|
|
|
|
2. **Implement API Endpoints**
|
|
- Node registration with validation
|
|
- Heartbeat updates with metrics
|
|
- Node listing with filters
|
|
- Profile CRUD operations
|
|
|
|
3. **Add Authentication**
|
|
- API key-based auth
|
|
- JWT tokens for inter-node communication
|
|
- Rate limiting
|
|
|
|
4. **Add Monitoring**
|
|
- Prometheus metrics export
|
|
- Health check improvements
|
|
- Performance metrics
|
|
|
|
5. **Add Tests**
|
|
- Unit tests (pytest)
|
|
- Integration tests
|
|
- API endpoint tests
|
|
|
|
---
|
|
|
|
## Links
|
|
|
|
- [INFRASTRUCTURE.md](../../INFRASTRUCTURE.md) — Infrastructure overview
|
|
- [WARP.md](../../WARP.md) — Main developer guide
|
|
- [docker-compose.yml](../../docker-compose.yml) — Service configuration
|
|
|
|
---
|
|
|
|
**Last Updated:** 2025-01-17
|
|
**Maintained by:** Ivan Tytar & DAARION Team
|
|
**Status:** 🟡 Infrastructure Ready — Awaiting Cursor implementation
|