Files
microdao-daarion/NODA1-README.md
Apple ef3473db21 snapshot: NODE1 production state 2026-02-09
Complete snapshot of /opt/microdao-daarion/ from NODE1 (144.76.224.179).
This represents the actual running production code that has diverged
significantly from the previous main branch.

Key changes from old main:
- Gateway (http_api.py): expanded from ~40KB to 164KB with full agent support
- Router: new /v1/agents/{id}/infer endpoint with vision + DeepSeek routing
- Behavior Policy: SOWA v2.2 (3-level: FULL/ACK/SILENT)
- Agent Registry: config/agent_registry.yml as single source of truth
- 13 agents configured (was 3)
- Memory service integration
- CrewAI teams and roles

Excluded from snapshot: venv/, .env, data/, backups, .tgz archives

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-09 08:46:46 -08:00

295 lines
8.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 🏗️ NODA1 Production Stack
**Version:** 2.1
**Last Updated:** 2026-01-26
**Status:** Production Ready ✅
## 📍 Node Information
- **Hostname:** node1-daarion
- **IP Address:** 144.76.224.179
- **IPv6:** 2a01:4f8:201:2a6::2
- **Location:** Hetzner Cloud (Germany)
- **Role:** Production Router + Gateway + All Services
- **Uptime Target:** 24/7
- **SSH:** `ssh root@144.76.224.179`
## 🖥️ Hardware
- **CPU:** Available cores (view with `nproc`)
- **RAM:** 62GB
- **Disk:** 1.7TB (~1.3TB available)
- **GPU:** NVIDIA RTX 4000 SFF Ada Generation (20GB VRAM)
## 🐳 Docker Services (27+ active)
### Core Services (✅ All Healthy)
| Service | Port | Container | Health |
|---------|------|-----------|--------|
| Router | 9102 | dagi-router-node1 | ✅ |
| Gateway | 9300 | dagi-gateway | ✅ |
| Memory Service | 8000 | dagi-memory-service-node1 | ✅ |
| RAG Service | 9500 | rag-service-node1 | ✅ |
| Swapper Service | 8890-8891 | swapper-service-node1 | ✅ |
| Vision Encoder | 8001 | dagi-vision-encoder-node1 | ✅ |
### Databases (✅ All Healthy)
| Service | Port | Container | Health |
|---------|------|-----------|--------|
| PostgreSQL | 5432 | dagi-postgres | ✅ |
| Qdrant | 6333-6334 | dagi-qdrant-node1 | ✅ |
| Redis | 6379 | dagi-redis-node1 | ✅ |
| Neo4j | 7474, 7687 | dagi-neo4j-node1 | ✅ |
### Supporting Services
| Service | Port | Container | Health |
|---------|------|-----------|--------|
| NATS | 4222 | dagi-nats-node1 | ✅ |
| MinIO | 9000-9001 | dagi-minio-node1 | ✅ |
| Crawl4AI | 11235 | dagi-crawl4ai-node1 | ✅ |
| Parser Pipeline | 8101 | parser-pipeline | ✅ |
| Ingest Service | 8100 | ingest-service | ✅ |
### AI/ML Services
| Service | Port | Container | Status |
|---------|------|-----------|--------|
| CrewAI | - | dagi-crewai-node1 | ✅ |
| CrewAI NATS Worker | 9011 | crewai-nats-worker | ✅ |
### Artifact Services
| Service | Port | Container | Status |
|---------|------|-----------|--------|
| Artifact Registry | 9220 | artifact-registry-node1 | ✅ |
| Brand Registry | 9210 | brand-registry-node1 | ✅ |
| Brand Intake | 9211 | brand-intake-node1 | ✅ |
| Presentation Renderer | 9212 | presentation-renderer-node1 | ✅ |
### Monitoring (✅ All Healthy)
| Service | Port | Container | Health |
|---------|------|-----------|--------|
| Prometheus | 9090 | prometheus | ✅ |
| Grafana | 3030 | grafana | ✅ |
## 🤖 Telegram Bots (7 active)
1.**DAARWIZZ** - Main orchestrator
2.**Helion** - Energy Union AI
3.**GREENFOOD** - Agriculture assistant
4.**AgroMatrix** - Agro analytics
5.**NUTRA** - Nutrition advisor
6.**Druid** - Legal assistant
7. ⚠️ **Alateya** - (token not configured)
## 📊 Health Check Endpoints
```bash
# All services quick check
curl http://localhost:9102/health # Router
curl http://localhost:9300/health # Gateway
curl http://localhost:8000/health # Memory Service
curl http://localhost:9500/health # RAG
curl http://localhost:8890/health # Swapper
curl http://localhost:6333/healthz # Qdrant
curl http://localhost:8001/health # Vision Encoder
curl http://localhost:8101/health # Parser Pipeline
curl http://localhost:9090/-/healthy # Prometheus
curl http://localhost:3030/api/health # Grafana
```
## 🔧 Common Operations
### View all services
```bash
docker ps --format "table {{.Names}}\t{{.Status}}"
```
### Restart a service
```bash
docker restart <container-name>
```
### View logs
```bash
docker logs <container-name> --tail 50 -f
```
### System status
```bash
nvidia-smi # GPU status
df -h # Disk usage
free -h # Memory usage
uptime # System uptime
```
## 💾 Backups
### PostgreSQL (Auto)
- **Location:** `/opt/backups/postgres/`
- **Schedule:** Every 6 hours (3:00, 9:00, 15:00, 21:00)
- **Retention:** 7 days daily, 4 weeks, 6 months
- **Container:** postgres-backup-node1
### Qdrant (Manual)
```bash
# Create snapshot
curl -X POST "http://localhost:6333/snapshots"
# List snapshots
curl "http://localhost:6333/snapshots"
```
### Manual backup all
```bash
cd /opt/microdao-daarion
./scripts/backup/backup_all.sh
```
## 🔒 Security Status
- ✅ No suspicious processes
- ✅ No executables in /tmp
- ✅ Firewall configured
- ✅ Daily backups active
- ✅ System load normal (< 1.0)
## ⚙️ Configuration Files
- **Docker Compose:** `docker-compose.node1.yml`
- **Router Config:** `services/router/router_config.yaml`
- **Backup Compose:** `docker-compose.backups.yml`
## 📝 Recent Changes (2026-01-26)
### ✅ Fixed Issues
1. **Memory Service** - Fixed MEMORY_QDRANT_HOST (was `qdrant`, now `dagi-qdrant-node1`)
2. **Qdrant snapshot** created before fix: `full-snapshot-2026-01-26-10-11-31.snapshot`
### ⚠️ Known Issues
- **Control-plane** container port 9200 not published to host (internal only)
- **Image-gen** service not running (use swapper-service instead)
## 🆚 Version History
### v2.1 (2026-01-26)
- Memory Service DNS fix (qdrant → dagi-qdrant-node1)
- Full health check verified
- Documentation updated
### v2.0 (2026-01-22)
- Git repository initialized
- Qdrant healthcheck fixed
- render-pdf-worker disabled
### v1.x (2026-01-10 - 2026-01-19)
- Previous deployment
- Security incidents resolved
## 📞 Support
- **SSH:** root@144.76.224.179
- **Monitoring:** http://144.76.224.179:3030 (Grafana)
- **Metrics:** http://144.76.224.179:9090 (Prometheus)
---
**Maintained by:** NODA1 System
**Last Health Check:** 2026-01-26 11:13 UTC
**Status:** ✅ All systems operational
---
## 🔧 By Design (очікувана поведінка)
### Сервіси без публічних портів
| Сервіс | Порт | Статус | Пояснення |
|--------|------|--------|-----------|
| **RBAC** | 9200 | Internal only | Порт не опублікований. Доступ тільки з docker network. |
| **Image-gen** | 8892 | Не використовується | Генерація зображень йде через `swapper-service (8890)`. |
| **Parser** | 9400 | Відсутній | Замінено на `parser-pipeline (8101)` як єдину точку парсингу. |
### Діагностика internal сервісів
```bash
# RBAC (зсередини docker network)
docker exec dagi-gateway curl -sS http://rbac:9200/health
# Перевірка DNS resolution
docker exec dagi-memory-service-node1 python3 -c "import socket; print(socket.gethostbyname('dagi-qdrant-node1'))"
```
### Нормальні значення
- **Qdrant**: 18 колекцій, 900+ векторів
- **Memory Service**: 200 OK на `/health` (healthcheck через python urllib)
- **Load average**: < 2.0 норма, < 5.0 допустимо
---
## 📊 Prometheus Alerting
### Налаштовані алерти
| Alert | Умова | Severity |
|-------|-------|----------|
| ServiceDown | `up == 0` > 2m | critical |
| QdrantCollectionsLow | collections < 10 | warning |
| QdrantVectorsDropped | vectors < 500 | warning |
| HostDiskSpaceLow | free < 15% | warning |
| HostMemoryHigh | usage > 90% | warning |
| HostHighLoad | load15 > 10 | warning |
### Перевірка rules
```bash
curl -sS http://127.0.0.1:9090/api/v1/rules | python3 -m json.tool | head -50
```
---
## 🔄 Qdrant Backup/Restore
### Снапшоти
- **Розташування**: `/opt/backups/qdrant/` та через API
- **Retention**: щоденні автоматичні
- **Останній**: `full-snapshot-2026-01-26-10-11-31.snapshot` (1.2GB)
### Restore Drill (перевірено 2026-01-26)
```bash
# Restore успішно протестовано на окремому порту 16333
# helion_messages: 365 points відновлено і перевірено пошуком
```
---
**Last Updated:** 2026-01-26 11:40
---
## Behavior Policy v2.1 CHANGELOG
**Date:** 2026-02-07
**Version:** Behavior Policy v2.1 / Global System Prompt v2.1
### Architecture (Source of Truth)
| Layer | Component | Location |
|-------|-----------|----------|
| Policy document | Global System Prompt v2.1 | prompts/global_system_prompt_v2.md |
| Gateway (source of truth) | detect_url + detect_explicit_request | gateway-bot/behavior_policy.py |
| Decision layer | behavior_policy.py v2.1 | gateway-bot/behavior_policy.py |
| HTTP API (gateway) | http_api.py | gateway-bot/http_api.py |
| PromptBuilder | N/A (runtime_context injected at gateway) | services/router/prompt_builder.py |
| Tests | 39 tests | tests/test_behavior_policy.py |
| Runbook | Behavior Policy v2.1 | runbooks/behavior-policy-v2.1.md |
As of v2.1, runtime_context injection happens in gateway (http_api.py), not PromptBuilder.
### Breaking Changes (from v1.1)
- Bare @mention in public/topic WITHOUT has_explicit_request -> NO_OUTPUT
- Gateway computes has_link and has_explicit_request (behavior_policy does NOT override)
- thread_has_agent_participation is now REQUIRED (fallback: false)
- has_explicit_request contract: imperative OR (? AND (dm OR reply OR mention OR thread))