feat: додано Node Registry, GreenFood, Monitoring та Utils
This commit is contained in:
404
services/node-registry/README.md
Normal file
404
services/node-registry/README.md
Normal file
@@ -0,0 +1,404 @@
|
||||
# Node Registry Service
|
||||
|
||||
**Version:** 0.1.0-stub
|
||||
**Status:** 🟡 Stub Implementation (Infrastructure Ready)
|
||||
**Port:** 9205 (Internal only)
|
||||
|
||||
Central registry for DAGI network nodes (Node #1, Node #2, Node #N).
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Node Registry Service provides:
|
||||
- **Node Registration** — Register new nodes in DAGI network
|
||||
- **Heartbeat Tracking** — Monitor node health and availability
|
||||
- **Node Discovery** — Query available nodes and their capabilities
|
||||
- **Profile Management** — Store node profiles (LLM configs, services, capabilities)
|
||||
|
||||
---
|
||||
|
||||
## Current Implementation
|
||||
|
||||
### ✅ Completed (Infrastructure)
|
||||
- FastAPI application with /health and /metrics endpoints
|
||||
- Docker container configuration
|
||||
- PostgreSQL database schema
|
||||
- docker-compose integration
|
||||
- Deployment script for Node #1
|
||||
|
||||
### 🚧 To Be Implemented (by Cursor)
|
||||
- Full REST API endpoints
|
||||
- Node registration logic
|
||||
- Heartbeat mechanism
|
||||
- Database integration (SQLAlchemy models)
|
||||
- Prometheus metrics export
|
||||
- Node discovery algorithms
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Local Development
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
cd services/node-registry
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Set environment variables
|
||||
export NODE_REGISTRY_DB_HOST=localhost
|
||||
export NODE_REGISTRY_DB_PORT=5432
|
||||
export NODE_REGISTRY_DB_NAME=node_registry
|
||||
export NODE_REGISTRY_DB_USER=node_registry_user
|
||||
export NODE_REGISTRY_DB_PASSWORD=your_password
|
||||
export NODE_REGISTRY_HTTP_PORT=9205
|
||||
export NODE_REGISTRY_ENV=development
|
||||
export NODE_REGISTRY_LOG_LEVEL=debug
|
||||
|
||||
# Run service
|
||||
python -m app.main
|
||||
```
|
||||
|
||||
Service will start on http://localhost:9205
|
||||
|
||||
### Docker (Recommended)
|
||||
|
||||
```bash
|
||||
# Build image
|
||||
docker-compose build node-registry
|
||||
|
||||
# Start service
|
||||
docker-compose up -d node-registry
|
||||
|
||||
# Check logs
|
||||
docker-compose logs -f node-registry
|
||||
|
||||
# Check health
|
||||
curl http://localhost:9205/health
|
||||
```
|
||||
|
||||
### Deploy to Node #1 (Production)
|
||||
|
||||
```bash
|
||||
# From Node #2 (MacBook)
|
||||
./scripts/deploy-node-registry.sh
|
||||
```
|
||||
|
||||
This will:
|
||||
1. Initialize PostgreSQL database
|
||||
2. Configure environment variables
|
||||
3. Build Docker image
|
||||
4. Start service
|
||||
5. Configure firewall rules (internal access only)
|
||||
6. Verify deployment
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Health & Monitoring
|
||||
|
||||
#### GET /health
|
||||
Health check endpoint (used by Docker, Prometheus, etc.)
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"service": "node-registry",
|
||||
"version": "0.1.0-stub",
|
||||
"environment": "production",
|
||||
"uptime_seconds": 3600.5,
|
||||
"timestamp": "2025-01-17T14:30:00Z",
|
||||
"database": {
|
||||
"connected": true,
|
||||
"host": "postgres",
|
||||
"port": 5432,
|
||||
"database": "node_registry"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### GET /metrics
|
||||
Prometheus-compatible metrics endpoint
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"service": "node-registry",
|
||||
"uptime_seconds": 3600.5,
|
||||
"total_nodes": 2,
|
||||
"active_nodes": 1,
|
||||
"timestamp": "2025-01-17T14:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### Node Management (Stub - To Be Implemented)
|
||||
|
||||
#### POST /api/v1/nodes/register
|
||||
Register a new node
|
||||
|
||||
**Status:** 501 Not Implemented (stub)
|
||||
|
||||
#### POST /api/v1/nodes/{node_id}/heartbeat
|
||||
Update node heartbeat
|
||||
|
||||
**Status:** 501 Not Implemented (stub)
|
||||
|
||||
#### GET /api/v1/nodes
|
||||
List all registered nodes
|
||||
|
||||
**Status:** 501 Not Implemented (stub)
|
||||
|
||||
#### GET /api/v1/nodes/{node_id}
|
||||
Get specific node information
|
||||
|
||||
**Status:** 501 Not Implemented (stub)
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
### Tables
|
||||
|
||||
#### `nodes`
|
||||
Core node registry
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| id | UUID | Primary key |
|
||||
| node_id | VARCHAR(255) | Unique node identifier (e.g. node-1-hetzner-gex44) |
|
||||
| node_name | VARCHAR(255) | Human-readable name |
|
||||
| node_role | VARCHAR(50) | production, development, backup |
|
||||
| node_type | VARCHAR(50) | router, gateway, worker, etc. |
|
||||
| ip_address | INET | Public IP |
|
||||
| local_ip | INET | Local network IP |
|
||||
| hostname | VARCHAR(255) | DNS hostname |
|
||||
| status | VARCHAR(50) | online, offline, maintenance, degraded |
|
||||
| last_heartbeat | TIMESTAMP | Last heartbeat time |
|
||||
| registered_at | TIMESTAMP | Registration timestamp |
|
||||
| updated_at | TIMESTAMP | Last update timestamp |
|
||||
| metadata | JSONB | Additional node metadata |
|
||||
|
||||
#### `node_profiles`
|
||||
Node capabilities and configurations
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| id | UUID | Primary key |
|
||||
| node_id | UUID | Foreign key to nodes.id |
|
||||
| profile_name | VARCHAR(255) | Profile identifier |
|
||||
| profile_type | VARCHAR(50) | llm, service, capability |
|
||||
| config | JSONB | Profile configuration |
|
||||
| enabled | BOOLEAN | Profile active status |
|
||||
| created_at | TIMESTAMP | Creation timestamp |
|
||||
| updated_at | TIMESTAMP | Last update timestamp |
|
||||
|
||||
#### `heartbeat_log`
|
||||
Historical heartbeat data
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| id | UUID | Primary key |
|
||||
| node_id | UUID | Foreign key to nodes.id |
|
||||
| timestamp | TIMESTAMP | Heartbeat timestamp |
|
||||
| status | VARCHAR(50) | Node status at heartbeat |
|
||||
| metrics | JSONB | System metrics (CPU, RAM, etc.) |
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| NODE_REGISTRY_DB_HOST | postgres | PostgreSQL host |
|
||||
| NODE_REGISTRY_DB_PORT | 5432 | PostgreSQL port |
|
||||
| NODE_REGISTRY_DB_NAME | node_registry | Database name |
|
||||
| NODE_REGISTRY_DB_USER | node_registry_user | Database user |
|
||||
| NODE_REGISTRY_DB_PASSWORD | - | Database password (required) |
|
||||
| NODE_REGISTRY_HTTP_PORT | 9205 | HTTP server port |
|
||||
| NODE_REGISTRY_ENV | production | Environment (development/production) |
|
||||
| NODE_REGISTRY_LOG_LEVEL | info | Log level (debug/info/warning/error) |
|
||||
|
||||
---
|
||||
|
||||
## Security
|
||||
|
||||
### Network Access
|
||||
- **Port 9205:** Internal network only (Node #1, Node #2, DAGI nodes)
|
||||
- **Public Access:** Blocked by firewall (UFW rules)
|
||||
- **Authentication:** To be implemented (API keys, JWT)
|
||||
|
||||
### Firewall Rules (Node #1)
|
||||
```bash
|
||||
# Allow from local network
|
||||
ufw allow from 192.168.1.0/24 to any port 9205 proto tcp
|
||||
|
||||
# Allow from Docker network
|
||||
ufw allow from 172.16.0.0/12 to any port 9205 proto tcp
|
||||
|
||||
# Deny from external
|
||||
ufw deny 9205/tcp
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Initialization
|
||||
|
||||
### Manual Setup
|
||||
|
||||
```bash
|
||||
# On Node #1
|
||||
ssh root@144.76.224.179
|
||||
|
||||
# Copy SQL script to container
|
||||
docker cp services/node-registry/migrations/init_node_registry.sql dagi-postgres:/tmp/
|
||||
|
||||
# Run initialization
|
||||
docker exec -i dagi-postgres psql -U postgres < /tmp/init_node_registry.sql
|
||||
|
||||
# Verify
|
||||
docker exec dagi-postgres psql -U postgres -d node_registry -c "\dt"
|
||||
```
|
||||
|
||||
### Via Deployment Script
|
||||
|
||||
The `deploy-node-registry.sh` script automatically:
|
||||
1. Checks if database exists
|
||||
2. Creates database and user if needed
|
||||
3. Generates secure password
|
||||
4. Saves password to .env
|
||||
|
||||
---
|
||||
|
||||
## Monitoring & Health
|
||||
|
||||
### Docker Health Check
|
||||
```bash
|
||||
docker inspect dagi-node-registry | grep -A 5 Health
|
||||
```
|
||||
|
||||
### Prometheus Scraping
|
||||
Add to prometheus.yml:
|
||||
```yaml
|
||||
scrape_configs:
|
||||
- job_name: 'node-registry'
|
||||
static_configs:
|
||||
- targets: ['node-registry:9205']
|
||||
scrape_interval: 30s
|
||||
```
|
||||
|
||||
### Grafana Dashboard
|
||||
Add panel with query:
|
||||
```promql
|
||||
up{job="node-registry"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Development
|
||||
|
||||
### Testing Locally
|
||||
|
||||
```bash
|
||||
# Run with development settings
|
||||
export NODE_REGISTRY_ENV=development
|
||||
python -m app.main
|
||||
|
||||
# Access interactive API docs
|
||||
open http://localhost:9205/docs
|
||||
```
|
||||
|
||||
### Adding New Endpoints
|
||||
|
||||
1. Edit `app/main.py`
|
||||
2. Add route with `@app.get()` or `@app.post()`
|
||||
3. Add Pydantic models for request/response
|
||||
4. Implement database logic (when ready)
|
||||
5. Test via /docs or curl
|
||||
6. Update this README
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Service won't start
|
||||
```bash
|
||||
# Check logs
|
||||
docker logs dagi-node-registry
|
||||
|
||||
# Check database connection
|
||||
docker exec dagi-postgres pg_isready
|
||||
|
||||
# Check environment variables
|
||||
docker exec dagi-node-registry env | grep NODE_REGISTRY
|
||||
```
|
||||
|
||||
### Database connection error
|
||||
```bash
|
||||
# Verify database exists
|
||||
docker exec dagi-postgres psql -U postgres -l | grep node_registry
|
||||
|
||||
# Verify user exists
|
||||
docker exec dagi-postgres psql -U postgres -c "\du" | grep node_registry_user
|
||||
|
||||
# Test connection
|
||||
docker exec dagi-postgres psql -U node_registry_user -d node_registry -c "SELECT 1"
|
||||
```
|
||||
|
||||
### Port not accessible
|
||||
```bash
|
||||
# Check firewall rules
|
||||
sudo ufw status | grep 9205
|
||||
|
||||
# Check if service is listening
|
||||
netstat -tlnp | grep 9205
|
||||
|
||||
# Test from Node #2
|
||||
curl http://144.76.224.179:9205/health
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps (for Cursor)
|
||||
|
||||
1. **Implement Database Layer**
|
||||
- SQLAlchemy models for nodes, profiles, heartbeat
|
||||
- Database connection pool
|
||||
- Migration system (Alembic)
|
||||
|
||||
2. **Implement API Endpoints**
|
||||
- Node registration with validation
|
||||
- Heartbeat updates with metrics
|
||||
- Node listing with filters
|
||||
- Profile CRUD operations
|
||||
|
||||
3. **Add Authentication**
|
||||
- API key-based auth
|
||||
- JWT tokens for inter-node communication
|
||||
- Rate limiting
|
||||
|
||||
4. **Add Monitoring**
|
||||
- Prometheus metrics export
|
||||
- Health check improvements
|
||||
- Performance metrics
|
||||
|
||||
5. **Add Tests**
|
||||
- Unit tests (pytest)
|
||||
- Integration tests
|
||||
- API endpoint tests
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- [INFRASTRUCTURE.md](../../INFRASTRUCTURE.md) — Infrastructure overview
|
||||
- [WARP.md](../../WARP.md) — Main developer guide
|
||||
- [docker-compose.yml](../../docker-compose.yml) — Service configuration
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2025-01-17
|
||||
**Maintained by:** Ivan Tytar & DAARION Team
|
||||
**Status:** 🟡 Infrastructure Ready — Awaiting Cursor implementation
|
||||
Reference in New Issue
Block a user