feat: додано Node Registry, GreenFood, Monitoring та Utils

This commit is contained in:
Apple
2025-11-21 00:35:41 -08:00
parent 31f3602047
commit e018b9ab68
74 changed files with 13948 additions and 0 deletions

View File

@@ -0,0 +1,404 @@
# Node Registry Service
**Version:** 0.1.0-stub
**Status:** 🟡 Stub Implementation (Infrastructure Ready)
**Port:** 9205 (Internal only)
Central registry for DAGI network nodes (Node #1, Node #2, Node #N).
---
## Overview
Node Registry Service provides:
- **Node Registration** — Register new nodes in DAGI network
- **Heartbeat Tracking** — Monitor node health and availability
- **Node Discovery** — Query available nodes and their capabilities
- **Profile Management** — Store node profiles (LLM configs, services, capabilities)
---
## Current Implementation
### ✅ Completed (Infrastructure)
- FastAPI application with /health and /metrics endpoints
- Docker container configuration
- PostgreSQL database schema
- docker-compose integration
- Deployment script for Node #1
### 🚧 To Be Implemented (by Cursor)
- Full REST API endpoints
- Node registration logic
- Heartbeat mechanism
- Database integration (SQLAlchemy models)
- Prometheus metrics export
- Node discovery algorithms
---
## Quick Start
### Local Development
```bash
# Install dependencies
cd services/node-registry
pip install -r requirements.txt
# Set environment variables
export NODE_REGISTRY_DB_HOST=localhost
export NODE_REGISTRY_DB_PORT=5432
export NODE_REGISTRY_DB_NAME=node_registry
export NODE_REGISTRY_DB_USER=node_registry_user
export NODE_REGISTRY_DB_PASSWORD=your_password
export NODE_REGISTRY_HTTP_PORT=9205
export NODE_REGISTRY_ENV=development
export NODE_REGISTRY_LOG_LEVEL=debug
# Run service
python -m app.main
```
Service will start on http://localhost:9205
### Docker (Recommended)
```bash
# Build image
docker-compose build node-registry
# Start service
docker-compose up -d node-registry
# Check logs
docker-compose logs -f node-registry
# Check health
curl http://localhost:9205/health
```
### Deploy to Node #1 (Production)
```bash
# From Node #2 (MacBook)
./scripts/deploy-node-registry.sh
```
This will:
1. Initialize PostgreSQL database
2. Configure environment variables
3. Build Docker image
4. Start service
5. Configure firewall rules (internal access only)
6. Verify deployment
---
## API Endpoints
### Health & Monitoring
#### GET /health
Health check endpoint (used by Docker, Prometheus, etc.)
**Response:**
```json
{
"status": "healthy",
"service": "node-registry",
"version": "0.1.0-stub",
"environment": "production",
"uptime_seconds": 3600.5,
"timestamp": "2025-01-17T14:30:00Z",
"database": {
"connected": true,
"host": "postgres",
"port": 5432,
"database": "node_registry"
}
}
```
#### GET /metrics
Prometheus-compatible metrics endpoint
**Response:**
```json
{
"service": "node-registry",
"uptime_seconds": 3600.5,
"total_nodes": 2,
"active_nodes": 1,
"timestamp": "2025-01-17T14:30:00Z"
}
```
### Node Management (Stub - To Be Implemented)
#### POST /api/v1/nodes/register
Register a new node
**Status:** 501 Not Implemented (stub)
#### POST /api/v1/nodes/{node_id}/heartbeat
Update node heartbeat
**Status:** 501 Not Implemented (stub)
#### GET /api/v1/nodes
List all registered nodes
**Status:** 501 Not Implemented (stub)
#### GET /api/v1/nodes/{node_id}
Get specific node information
**Status:** 501 Not Implemented (stub)
---
## Database Schema
### Tables
#### `nodes`
Core node registry
| Column | Type | Description |
|--------|------|-------------|
| id | UUID | Primary key |
| node_id | VARCHAR(255) | Unique node identifier (e.g. node-1-hetzner-gex44) |
| node_name | VARCHAR(255) | Human-readable name |
| node_role | VARCHAR(50) | production, development, backup |
| node_type | VARCHAR(50) | router, gateway, worker, etc. |
| ip_address | INET | Public IP |
| local_ip | INET | Local network IP |
| hostname | VARCHAR(255) | DNS hostname |
| status | VARCHAR(50) | online, offline, maintenance, degraded |
| last_heartbeat | TIMESTAMP | Last heartbeat time |
| registered_at | TIMESTAMP | Registration timestamp |
| updated_at | TIMESTAMP | Last update timestamp |
| metadata | JSONB | Additional node metadata |
#### `node_profiles`
Node capabilities and configurations
| Column | Type | Description |
|--------|------|-------------|
| id | UUID | Primary key |
| node_id | UUID | Foreign key to nodes.id |
| profile_name | VARCHAR(255) | Profile identifier |
| profile_type | VARCHAR(50) | llm, service, capability |
| config | JSONB | Profile configuration |
| enabled | BOOLEAN | Profile active status |
| created_at | TIMESTAMP | Creation timestamp |
| updated_at | TIMESTAMP | Last update timestamp |
#### `heartbeat_log`
Historical heartbeat data
| Column | Type | Description |
|--------|------|-------------|
| id | UUID | Primary key |
| node_id | UUID | Foreign key to nodes.id |
| timestamp | TIMESTAMP | Heartbeat timestamp |
| status | VARCHAR(50) | Node status at heartbeat |
| metrics | JSONB | System metrics (CPU, RAM, etc.) |
---
## Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| NODE_REGISTRY_DB_HOST | postgres | PostgreSQL host |
| NODE_REGISTRY_DB_PORT | 5432 | PostgreSQL port |
| NODE_REGISTRY_DB_NAME | node_registry | Database name |
| NODE_REGISTRY_DB_USER | node_registry_user | Database user |
| NODE_REGISTRY_DB_PASSWORD | - | Database password (required) |
| NODE_REGISTRY_HTTP_PORT | 9205 | HTTP server port |
| NODE_REGISTRY_ENV | production | Environment (development/production) |
| NODE_REGISTRY_LOG_LEVEL | info | Log level (debug/info/warning/error) |
---
## Security
### Network Access
- **Port 9205:** Internal network only (Node #1, Node #2, DAGI nodes)
- **Public Access:** Blocked by firewall (UFW rules)
- **Authentication:** To be implemented (API keys, JWT)
### Firewall Rules (Node #1)
```bash
# Allow from local network
ufw allow from 192.168.1.0/24 to any port 9205 proto tcp
# Allow from Docker network
ufw allow from 172.16.0.0/12 to any port 9205 proto tcp
# Deny from external
ufw deny 9205/tcp
```
---
## Database Initialization
### Manual Setup
```bash
# On Node #1
ssh root@144.76.224.179
# Copy SQL script to container
docker cp services/node-registry/migrations/init_node_registry.sql dagi-postgres:/tmp/
# Run initialization
docker exec -i dagi-postgres psql -U postgres < /tmp/init_node_registry.sql
# Verify
docker exec dagi-postgres psql -U postgres -d node_registry -c "\dt"
```
### Via Deployment Script
The `deploy-node-registry.sh` script automatically:
1. Checks if database exists
2. Creates database and user if needed
3. Generates secure password
4. Saves password to .env
---
## Monitoring & Health
### Docker Health Check
```bash
docker inspect dagi-node-registry | grep -A 5 Health
```
### Prometheus Scraping
Add to prometheus.yml:
```yaml
scrape_configs:
- job_name: 'node-registry'
static_configs:
- targets: ['node-registry:9205']
scrape_interval: 30s
```
### Grafana Dashboard
Add panel with query:
```promql
up{job="node-registry"}
```
---
## Development
### Testing Locally
```bash
# Run with development settings
export NODE_REGISTRY_ENV=development
python -m app.main
# Access interactive API docs
open http://localhost:9205/docs
```
### Adding New Endpoints
1. Edit `app/main.py`
2. Add route with `@app.get()` or `@app.post()`
3. Add Pydantic models for request/response
4. Implement database logic (when ready)
5. Test via /docs or curl
6. Update this README
---
## Troubleshooting
### Service won't start
```bash
# Check logs
docker logs dagi-node-registry
# Check database connection
docker exec dagi-postgres pg_isready
# Check environment variables
docker exec dagi-node-registry env | grep NODE_REGISTRY
```
### Database connection error
```bash
# Verify database exists
docker exec dagi-postgres psql -U postgres -l | grep node_registry
# Verify user exists
docker exec dagi-postgres psql -U postgres -c "\du" | grep node_registry_user
# Test connection
docker exec dagi-postgres psql -U node_registry_user -d node_registry -c "SELECT 1"
```
### Port not accessible
```bash
# Check firewall rules
sudo ufw status | grep 9205
# Check if service is listening
netstat -tlnp | grep 9205
# Test from Node #2
curl http://144.76.224.179:9205/health
```
---
## Next Steps (for Cursor)
1. **Implement Database Layer**
- SQLAlchemy models for nodes, profiles, heartbeat
- Database connection pool
- Migration system (Alembic)
2. **Implement API Endpoints**
- Node registration with validation
- Heartbeat updates with metrics
- Node listing with filters
- Profile CRUD operations
3. **Add Authentication**
- API key-based auth
- JWT tokens for inter-node communication
- Rate limiting
4. **Add Monitoring**
- Prometheus metrics export
- Health check improvements
- Performance metrics
5. **Add Tests**
- Unit tests (pytest)
- Integration tests
- API endpoint tests
---
## Links
- [INFRASTRUCTURE.md](../../INFRASTRUCTURE.md) — Infrastructure overview
- [WARP.md](../../WARP.md) — Main developer guide
- [docker-compose.yml](../../docker-compose.yml) — Service configuration
---
**Last Updated:** 2025-01-17
**Maintained by:** Ivan Tytar & DAARION Team
**Status:** 🟡 Infrastructure Ready — Awaiting Cursor implementation