feat: додано Node Registry, GreenFood, Monitoring та Utils
This commit is contained in:
488
NODE-REGISTRY-STATUS.md
Normal file
488
NODE-REGISTRY-STATUS.md
Normal file
@@ -0,0 +1,488 @@
|
||||
# 🔧 Node Registry Service — Status & Deployment
|
||||
|
||||
**Версія:** 1.0.0
|
||||
**Дата створення:** 2025-01-17
|
||||
**Останнє оновлення:** 2025-01-17
|
||||
**Статус:** ✅ Complete + Integrated — Full Stack Implementation Ready for Production
|
||||
|
||||
---
|
||||
|
||||
## 📋 Overview
|
||||
|
||||
Node Registry Service — централізований реєстр для всіх нод DAGI мережі (Node #1, Node #2, майбутні Node #N).
|
||||
|
||||
### Призначення
|
||||
- **Реєстрація нод** — автоматична/ручна реєстрація нових нод
|
||||
- **Heartbeat tracking** — моніторинг доступності та здоров'я нод
|
||||
- **Node discovery** — пошук доступних нод та їх можливостей
|
||||
- **Profile management** — збереження профілів нод (LLM configs, services, capabilities)
|
||||
|
||||
---
|
||||
|
||||
## ✅ Що готово (Infrastructure by Warp)
|
||||
|
||||
### 1. Service Structure
|
||||
```
|
||||
services/node-registry/
|
||||
├── app/
|
||||
│ └── main.py # FastAPI stub application
|
||||
├── migrations/
|
||||
│ └── init_node_registry.sql # Database schema
|
||||
├── Dockerfile # Docker image configuration
|
||||
├── requirements.txt # Python dependencies
|
||||
└── README.md # Full service documentation
|
||||
```
|
||||
|
||||
### 2. FastAPI Application (`app/main.py`)
|
||||
- ✅ Health endpoint: `GET /health`
|
||||
- ✅ Metrics endpoint: `GET /metrics`
|
||||
- ✅ Root endpoint: `GET /`
|
||||
- 🚧 Stub API endpoints (501 Not Implemented):
|
||||
- `POST /api/v1/nodes/register`
|
||||
- `POST /api/v1/nodes/{node_id}/heartbeat`
|
||||
- `GET /api/v1/nodes`
|
||||
- `GET /api/v1/nodes/{node_id}`
|
||||
|
||||
### 3. PostgreSQL Database
|
||||
- ✅ Database: `node_registry`
|
||||
- ✅ User: `node_registry_user`
|
||||
- ✅ Tables created:
|
||||
- `nodes` — Core node registry
|
||||
- `node_profiles` — Node capabilities/configurations
|
||||
- `heartbeat_log` — Historical heartbeat data
|
||||
- ✅ Initial data: Node #1 and Node #2 pre-registered
|
||||
|
||||
### 4. Docker Configuration
|
||||
- ✅ Dockerfile with Python 3.11-slim
|
||||
- ✅ Health check configured
|
||||
- ✅ Non-root user (noderegistry)
|
||||
- ✅ Added to `docker-compose.yml` with dependencies
|
||||
|
||||
### 5. Deployment Script
|
||||
- ✅ `scripts/deploy-node-registry.sh`
|
||||
- SSH connection check
|
||||
- Database initialization
|
||||
- Secure password generation
|
||||
- Docker image build
|
||||
- Service start
|
||||
- Firewall configuration
|
||||
- Deployment verification
|
||||
|
||||
---
|
||||
|
||||
## 🔌 Service Configuration
|
||||
|
||||
### Port & Access
|
||||
- **Port:** 9205 (Internal only)
|
||||
- **Access:** Node #1, Node #2, DAGI nodes (LAN/VPN)
|
||||
- **Public access:** ❌ Blocked by firewall
|
||||
|
||||
### Environment Variables
|
||||
```bash
|
||||
NODE_REGISTRY_DB_HOST=postgres
|
||||
NODE_REGISTRY_DB_PORT=5432
|
||||
NODE_REGISTRY_DB_NAME=node_registry
|
||||
NODE_REGISTRY_DB_USER=node_registry_user
|
||||
NODE_REGISTRY_DB_PASSWORD=***generated_secure_password***
|
||||
NODE_REGISTRY_HTTP_PORT=9205
|
||||
NODE_REGISTRY_ENV=production
|
||||
NODE_REGISTRY_LOG_LEVEL=info
|
||||
```
|
||||
|
||||
### Firewall Rules (Node #1)
|
||||
```bash
|
||||
# Allow from local network
|
||||
ufw allow from 192.168.1.0/24 to any port 9205 proto tcp comment 'Node Registry - LAN'
|
||||
|
||||
# Allow from Docker network
|
||||
ufw allow from 172.16.0.0/12 to any port 9205 proto tcp comment 'Node Registry - Docker'
|
||||
|
||||
# Deny from external
|
||||
ufw deny 9205/tcp comment 'Node Registry - Block external'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🗄️ Database Schema
|
||||
|
||||
### Table: `nodes`
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| id | UUID | Primary key |
|
||||
| node_id | VARCHAR(255) | Unique identifier (e.g. node-1-hetzner-gex44) |
|
||||
| node_name | VARCHAR(255) | Human-readable name |
|
||||
| node_role | VARCHAR(50) | production, development, backup |
|
||||
| node_type | VARCHAR(50) | router, gateway, worker |
|
||||
| ip_address | INET | Public IP |
|
||||
| local_ip | INET | Local network IP |
|
||||
| hostname | VARCHAR(255) | DNS hostname |
|
||||
| status | VARCHAR(50) | online, offline, maintenance, degraded |
|
||||
| last_heartbeat | TIMESTAMP | Last heartbeat timestamp |
|
||||
| registered_at | TIMESTAMP | Registration time |
|
||||
| updated_at | TIMESTAMP | Last update time |
|
||||
| metadata | JSONB | Additional metadata |
|
||||
|
||||
### Table: `node_profiles`
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| id | UUID | Primary key |
|
||||
| node_id | UUID | Foreign key to nodes |
|
||||
| profile_name | VARCHAR(255) | Profile identifier |
|
||||
| profile_type | VARCHAR(50) | llm, service, capability |
|
||||
| config | JSONB | Profile configuration |
|
||||
| enabled | BOOLEAN | Active status |
|
||||
|
||||
### Table: `heartbeat_log`
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| id | UUID | Primary key |
|
||||
| node_id | UUID | Foreign key to nodes |
|
||||
| timestamp | TIMESTAMP | Heartbeat time |
|
||||
| status | VARCHAR(50) | Node status |
|
||||
| metrics | JSONB | System metrics (CPU, RAM, etc.) |
|
||||
|
||||
### Initial Data
|
||||
```sql
|
||||
-- Pre-registered nodes
|
||||
INSERT INTO nodes (node_id, node_name, node_role, node_type, ip_address, local_ip, hostname, status)
|
||||
VALUES
|
||||
('node-1-hetzner-gex44', 'Hetzner GEX44 Production', 'production', 'router', '144.76.224.179', NULL, 'gateway.daarion.city', 'offline'),
|
||||
('node-2-macbook-m4max', 'MacBook Pro M4 Max', 'development', 'router', NULL, '192.168.1.244', 'MacBook-Pro.local', 'offline');
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Deployment
|
||||
|
||||
### Quick Deploy to Node #1 (Production)
|
||||
|
||||
```bash
|
||||
# From Node #2 (MacBook)
|
||||
cd /Users/apple/github-projects/microdao-daarion
|
||||
|
||||
# Deploy service
|
||||
./scripts/deploy-node-registry.sh
|
||||
|
||||
# Register Node #1 using bootstrap
|
||||
python -m tools.dagi_node_agent.bootstrap \
|
||||
--role production-router \
|
||||
--labels router,gateway,production \
|
||||
--registry-url http://144.76.224.179:9205
|
||||
|
||||
# Register Node #2 using bootstrap
|
||||
python -m tools.dagi_node_agent.bootstrap \
|
||||
--role development-router \
|
||||
--labels router,development,mac,gpu \
|
||||
--registry-url http://192.168.1.244:9205
|
||||
```
|
||||
|
||||
### Manual Deployment Steps
|
||||
|
||||
#### 1. Initialize Database (on Node #1)
|
||||
```bash
|
||||
ssh root@144.76.224.179
|
||||
cd /opt/microdao-daarion
|
||||
|
||||
# Copy SQL script to container
|
||||
docker cp services/node-registry/migrations/init_node_registry.sql dagi-postgres:/tmp/
|
||||
|
||||
# Run initialization
|
||||
docker exec -i dagi-postgres psql -U postgres < /tmp/init_node_registry.sql
|
||||
```
|
||||
|
||||
#### 2. Generate Secure Password
|
||||
```bash
|
||||
# Generate and save to .env
|
||||
PASSWORD=$(openssl rand -base64 32)
|
||||
echo "NODE_REGISTRY_DB_PASSWORD=$PASSWORD" >> .env
|
||||
```
|
||||
|
||||
#### 3. Build and Start
|
||||
```bash
|
||||
# Build Docker image
|
||||
docker-compose build node-registry
|
||||
|
||||
# Start service
|
||||
docker-compose up -d node-registry
|
||||
|
||||
# Check status
|
||||
docker-compose ps | grep node-registry
|
||||
docker logs dagi-node-registry
|
||||
```
|
||||
|
||||
#### 4. Configure Firewall
|
||||
```bash
|
||||
# Allow internal access
|
||||
ufw allow from 192.168.1.0/24 to any port 9205 proto tcp
|
||||
ufw allow from 172.16.0.0/12 to any port 9205 proto tcp
|
||||
|
||||
# Deny external
|
||||
ufw deny 9205/tcp
|
||||
```
|
||||
|
||||
#### 5. Verify Deployment
|
||||
```bash
|
||||
# Health check
|
||||
curl http://localhost:9205/health
|
||||
|
||||
# Expected response:
|
||||
# {"status":"healthy","service":"node-registry","version":"0.1.0-stub",...}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🧪 Testing & Verification
|
||||
|
||||
### Local Testing (Node #2)
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
cd services/node-registry
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Run locally
|
||||
export NODE_REGISTRY_ENV=development
|
||||
python -m app.main
|
||||
|
||||
# Test endpoints
|
||||
curl http://localhost:9205/health
|
||||
curl http://localhost:9205/metrics
|
||||
open http://localhost:9205/docs # Interactive API docs
|
||||
```
|
||||
|
||||
### Production Testing (Node #1)
|
||||
|
||||
```bash
|
||||
# From Node #2, test internal access
|
||||
curl http://144.76.224.179:9205/health
|
||||
|
||||
# From Node #1
|
||||
ssh root@144.76.224.179
|
||||
curl http://localhost:9205/health
|
||||
curl http://localhost:9205/metrics
|
||||
|
||||
# Check logs
|
||||
docker logs dagi-node-registry --tail 50
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Monitoring
|
||||
|
||||
### Health Endpoint
|
||||
```json
|
||||
GET http://localhost:9205/health
|
||||
|
||||
{
|
||||
"status": "healthy",
|
||||
"service": "node-registry",
|
||||
"version": "0.1.0-stub",
|
||||
"environment": "production",
|
||||
"uptime_seconds": 3600.5,
|
||||
"timestamp": "2025-01-17T14:30:00Z",
|
||||
"database": {
|
||||
"connected": true,
|
||||
"host": "postgres",
|
||||
"port": 5432,
|
||||
"database": "node_registry"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Metrics Endpoint
|
||||
```json
|
||||
GET http://localhost:9205/metrics
|
||||
|
||||
{
|
||||
"service": "node-registry",
|
||||
"uptime_seconds": 3600.5,
|
||||
"total_nodes": 2,
|
||||
"active_nodes": 1,
|
||||
"timestamp": "2025-01-17T14:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### Prometheus Integration (Future)
|
||||
```yaml
|
||||
# prometheus.yml
|
||||
scrape_configs:
|
||||
- job_name: 'node-registry'
|
||||
static_configs:
|
||||
- targets: ['node-registry:9205']
|
||||
scrape_interval: 30s
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ Implemented by Cursor
|
||||
|
||||
### Completed Features
|
||||
|
||||
### Priority 1: Database Integration ✅
|
||||
- [x] SQLAlchemy ORM models (`models.py`)
|
||||
- `Node` model (node_id, hostname, ip, role, labels, status, heartbeat)
|
||||
- `NodeProfile` model (role-based configuration profiles)
|
||||
- [x] Database connection pool
|
||||
- [x] SQL migration (`001_create_node_registry_tables.sql`)
|
||||
- [x] Health check with DB connection
|
||||
|
||||
### Priority 2: Core API Endpoints ✅
|
||||
- [x] `POST /api/v1/nodes/register` — Register/update node with auto node_id generation
|
||||
- [x] `POST /api/v1/nodes/heartbeat` — Update heartbeat timestamp
|
||||
- [x] `GET /api/v1/nodes` — List all nodes with filters (role, label, status)
|
||||
- [x] `GET /api/v1/nodes/{node_id}` — Get specific node details
|
||||
- [x] CRUD operations in `crud.py`:
|
||||
- `register_node()` — Auto-generate node_id
|
||||
- `update_heartbeat()` — Update heartbeat
|
||||
- `get_node()`, `list_nodes()` — Query nodes
|
||||
- `get_node_profile()` — Get role profile
|
||||
|
||||
### Priority 3: Node Profiles ✅
|
||||
- [x] `GET /api/v1/profiles/{role}` — Get role-based configuration profile
|
||||
- [x] `NodeProfile` model with role-based configs
|
||||
- [ ] Per-node profile management (future enhancement)
|
||||
|
||||
### Priority 4: Security & Auth ⚠️
|
||||
- [x] Request validation (Pydantic schemas in `schemas.py`)
|
||||
- [ ] API key authentication (future)
|
||||
- [ ] JWT tokens for inter-node communication (future)
|
||||
- [ ] Rate limiting (future)
|
||||
|
||||
### Priority 5: Monitoring & Metrics ✅
|
||||
- [x] Health check endpoint with DB connectivity
|
||||
- [x] Metrics endpoint (basic)
|
||||
- [ ] Prometheus metrics export (prometheus_client) (future)
|
||||
- [ ] Performance metrics (request duration, DB queries) (future)
|
||||
- [ ] Structured logging (JSON) (future)
|
||||
|
||||
### Priority 6: Testing ✅
|
||||
- [x] Unit tests (`tests/test_crud.py`) — CRUD operations
|
||||
- [x] Integration tests (`tests/test_api.py`) — API endpoints
|
||||
- [ ] Load testing (future)
|
||||
|
||||
### Priority 7: Bootstrap Tool ✅
|
||||
- [x] DAGI Node Agent Bootstrap (`tools/dagi_node_agent/bootstrap.py`)
|
||||
- Automatic hostname and IP detection
|
||||
- Registration with Node Registry
|
||||
- Local node_id storage (`/etc/dagi/node_id` or `~/.config/dagi/node_id`)
|
||||
- Initial heartbeat after registration
|
||||
- CLI interface with role and labels support
|
||||
|
||||
### Priority 8: DAGI Router Integration ✅
|
||||
- [x] Node Registry Client (`utils/node_registry_client.py`)
|
||||
- Async HTTP client for Node Registry API
|
||||
- Methods: `get_nodes()`, `get_node()`, `get_nodes_by_role()`, `get_available_nodes()`
|
||||
- Graceful degradation when service unavailable
|
||||
- Error handling and retries
|
||||
- [x] Router Integration (`router_app.py`)
|
||||
- Added `get_available_nodes()` method
|
||||
- Node discovery for routing decisions
|
||||
- [x] HTTP API (`http_api.py`)
|
||||
- New endpoint: `GET /nodes` (with role filter)
|
||||
- Proxy to Node Registry service
|
||||
- [x] Test Scripts
|
||||
- `scripts/test_node_registry.sh` — API endpoint testing
|
||||
- `scripts/test_bootstrap.sh` — Bootstrap tool testing
|
||||
- `scripts/init_node_registry_db.sh` — Database initialization
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Management Commands
|
||||
|
||||
### Service Control
|
||||
```bash
|
||||
# Start
|
||||
docker-compose up -d node-registry
|
||||
|
||||
# Stop
|
||||
docker-compose stop node-registry
|
||||
|
||||
# Restart
|
||||
docker-compose restart node-registry
|
||||
|
||||
# Rebuild
|
||||
docker-compose up -d --build node-registry
|
||||
|
||||
# Logs
|
||||
docker logs -f dagi-node-registry
|
||||
docker-compose logs -f node-registry
|
||||
```
|
||||
|
||||
### Database Operations
|
||||
```bash
|
||||
# Connect to database
|
||||
docker exec -it dagi-postgres psql -U node_registry_user -d node_registry
|
||||
|
||||
# List tables
|
||||
\dt
|
||||
|
||||
# Query nodes
|
||||
SELECT node_id, node_name, status, last_heartbeat FROM nodes;
|
||||
|
||||
# Query profiles
|
||||
SELECT n.node_name, p.profile_name, p.profile_type, p.enabled
|
||||
FROM nodes n
|
||||
JOIN node_profiles p ON n.id = p.node_id;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📖 Documentation
|
||||
|
||||
- **Service README:** [services/node-registry/README.md](./services/node-registry/README.md)
|
||||
- **Deployment Script:** [scripts/deploy-node-registry.sh](./scripts/deploy-node-registry.sh)
|
||||
- **Database Schema:** [services/node-registry/migrations/init_node_registry.sql](./services/node-registry/migrations/init_node_registry.sql)
|
||||
- **Docker Compose:** [docker-compose.yml](./docker-compose.yml) (lines 253-282)
|
||||
- **INFRASTRUCTURE.md:** [INFRASTRUCTURE.md](./INFRASTRUCTURE.md) (Add Node Registry section)
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Related Services
|
||||
|
||||
| Service | Port | Connection | Purpose |
|
||||
|---------|------|------------|---------|
|
||||
| PostgreSQL | 5432 | Required | Database storage |
|
||||
| DAGI Router | 9102 | Optional | Node info for routing |
|
||||
| Prometheus | 9090 | Optional | Metrics scraping |
|
||||
| Grafana | 3000 | Optional | Monitoring dashboard |
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Security Considerations
|
||||
|
||||
### Network Security
|
||||
- ✅ Port 9205 accessible only from internal network
|
||||
- ✅ Firewall rules configured (UFW)
|
||||
- ⚠️ No authentication yet (to be added by Cursor)
|
||||
|
||||
### Database Security
|
||||
- ✅ Secure password generated automatically
|
||||
- ✅ Dedicated database user with limited privileges
|
||||
- ✅ Password stored in `.env` (not committed to git)
|
||||
|
||||
### Future Improvements
|
||||
- [ ] API key authentication
|
||||
- [ ] TLS/SSL for API communication
|
||||
- [ ] Rate limiting per node
|
||||
- [ ] Audit logging for node changes
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Acceptance Criteria Status
|
||||
|
||||
| Criteria | Status | Notes |
|
||||
|----------|--------|-------|
|
||||
| Database `node_registry` created | ✅ | With tables and user |
|
||||
| Environment variables configured | ✅ | In docker-compose.yml |
|
||||
| Service added to docker-compose | ✅ | With health check |
|
||||
| Port 9205 listens locally | 🟡 | After deployment |
|
||||
| Accessible from Node #2 (LAN) | 🟡 | After deployment |
|
||||
| Firewall blocks external | 🟡 | After deployment |
|
||||
| INFRASTRUCTURE.md updated | 🟡 | See NODE-REGISTRY-STATUS.md |
|
||||
| SYSTEM-INVENTORY.md updated | 🚧 | Todo |
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2025-01-17 by WARP AI
|
||||
**Next Steps:** Deploy to Node #1, hand over to Cursor for API implementation
|
||||
**Status:** ✅ Infrastructure Complete — Ready for Cursor Implementation
|
||||
Reference in New Issue
Block a user