Files
microdao-daarion/NODA1-README.md
Apple ef3473db21 snapshot: NODE1 production state 2026-02-09
Complete snapshot of /opt/microdao-daarion/ from NODE1 (144.76.224.179).
This represents the actual running production code that has diverged
significantly from the previous main branch.

Key changes from old main:
- Gateway (http_api.py): expanded from ~40KB to 164KB with full agent support
- Router: new /v1/agents/{id}/infer endpoint with vision + DeepSeek routing
- Behavior Policy: SOWA v2.2 (3-level: FULL/ACK/SILENT)
- Agent Registry: config/agent_registry.yml as single source of truth
- 13 agents configured (was 3)
- Memory service integration
- CrewAI teams and roles

Excluded from snapshot: venv/, .env, data/, backups, .tgz archives

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-09 08:46:46 -08:00

8.8 KiB
Raw Blame History

🏗️ NODA1 Production Stack

Version: 2.1
Last Updated: 2026-01-26
Status: Production Ready

📍 Node Information

  • Hostname: node1-daarion
  • IP Address: 144.76.224.179
  • IPv6: 2a01:4f8:201:2a6::2
  • Location: Hetzner Cloud (Germany)
  • Role: Production Router + Gateway + All Services
  • Uptime Target: 24/7
  • SSH: ssh root@144.76.224.179

🖥️ Hardware

  • CPU: Available cores (view with nproc)
  • RAM: 62GB
  • Disk: 1.7TB (~1.3TB available)
  • GPU: NVIDIA RTX 4000 SFF Ada Generation (20GB VRAM)

🐳 Docker Services (27+ active)

Core Services ( All Healthy)

Service Port Container Health
Router 9102 dagi-router-node1
Gateway 9300 dagi-gateway
Memory Service 8000 dagi-memory-service-node1
RAG Service 9500 rag-service-node1
Swapper Service 8890-8891 swapper-service-node1
Vision Encoder 8001 dagi-vision-encoder-node1

Databases ( All Healthy)

Service Port Container Health
PostgreSQL 5432 dagi-postgres
Qdrant 6333-6334 dagi-qdrant-node1
Redis 6379 dagi-redis-node1
Neo4j 7474, 7687 dagi-neo4j-node1

Supporting Services

Service Port Container Health
NATS 4222 dagi-nats-node1
MinIO 9000-9001 dagi-minio-node1
Crawl4AI 11235 dagi-crawl4ai-node1
Parser Pipeline 8101 parser-pipeline
Ingest Service 8100 ingest-service

AI/ML Services

Service Port Container Status
CrewAI - dagi-crewai-node1
CrewAI NATS Worker 9011 crewai-nats-worker

Artifact Services

Service Port Container Status
Artifact Registry 9220 artifact-registry-node1
Brand Registry 9210 brand-registry-node1
Brand Intake 9211 brand-intake-node1
Presentation Renderer 9212 presentation-renderer-node1

Monitoring ( All Healthy)

Service Port Container Health
Prometheus 9090 prometheus
Grafana 3030 grafana

🤖 Telegram Bots (7 active)

  1. DAARWIZZ - Main orchestrator
  2. Helion - Energy Union AI
  3. GREENFOOD - Agriculture assistant
  4. AgroMatrix - Agro analytics
  5. NUTRA - Nutrition advisor
  6. Druid - Legal assistant
  7. ⚠️ Alateya - (token not configured)

📊 Health Check Endpoints

# All services quick check
curl http://localhost:9102/health  # Router
curl http://localhost:9300/health  # Gateway
curl http://localhost:8000/health  # Memory Service
curl http://localhost:9500/health  # RAG
curl http://localhost:8890/health  # Swapper
curl http://localhost:6333/healthz # Qdrant
curl http://localhost:8001/health  # Vision Encoder
curl http://localhost:8101/health  # Parser Pipeline
curl http://localhost:9090/-/healthy # Prometheus
curl http://localhost:3030/api/health # Grafana

🔧 Common Operations

View all services

docker ps --format "table {{.Names}}\t{{.Status}}"

Restart a service

docker restart <container-name>

View logs

docker logs <container-name> --tail 50 -f

System status

nvidia-smi  # GPU status
df -h       # Disk usage
free -h     # Memory usage
uptime      # System uptime

💾 Backups

PostgreSQL (Auto)

  • Location: /opt/backups/postgres/
  • Schedule: Every 6 hours (3:00, 9:00, 15:00, 21:00)
  • Retention: 7 days daily, 4 weeks, 6 months
  • Container: postgres-backup-node1

Qdrant (Manual)

# Create snapshot
curl -X POST "http://localhost:6333/snapshots"

# List snapshots
curl "http://localhost:6333/snapshots"

Manual backup all

cd /opt/microdao-daarion
./scripts/backup/backup_all.sh

🔒 Security Status

  • No suspicious processes
  • No executables in /tmp
  • Firewall configured
  • Daily backups active
  • System load normal (< 1.0)

⚙️ Configuration Files

  • Docker Compose: docker-compose.node1.yml
  • Router Config: services/router/router_config.yaml
  • Backup Compose: docker-compose.backups.yml

📝 Recent Changes (2026-01-26)

Fixed Issues

  1. Memory Service - Fixed MEMORY_QDRANT_HOST (was qdrant, now dagi-qdrant-node1)
  2. Qdrant snapshot created before fix: full-snapshot-2026-01-26-10-11-31.snapshot

⚠️ Known Issues

  • Control-plane container port 9200 not published to host (internal only)
  • Image-gen service not running (use swapper-service instead)

🆚 Version History

v2.1 (2026-01-26)

  • Memory Service DNS fix (qdrant → dagi-qdrant-node1)
  • Full health check verified
  • Documentation updated

v2.0 (2026-01-22)

  • Git repository initialized
  • Qdrant healthcheck fixed
  • render-pdf-worker disabled

v1.x (2026-01-10 - 2026-01-19)

  • Previous deployment
  • Security incidents resolved

📞 Support


Maintained by: NODA1 System
Last Health Check: 2026-01-26 11:13 UTC
Status: All systems operational


🔧 By Design (очікувана поведінка)

Сервіси без публічних портів

Сервіс Порт Статус Пояснення
RBAC 9200 Internal only Порт не опублікований. Доступ тільки з docker network.
Image-gen 8892 Не використовується Генерація зображень йде через swapper-service (8890).
Parser 9400 Відсутній Замінено на parser-pipeline (8101) як єдину точку парсингу.

Діагностика internal сервісів

# RBAC (зсередини docker network)
docker exec dagi-gateway curl -sS http://rbac:9200/health

# Перевірка DNS resolution
docker exec dagi-memory-service-node1 python3 -c "import socket; print(socket.gethostbyname('dagi-qdrant-node1'))"

Нормальні значення

  • Qdrant: 18 колекцій, 900+ векторів
  • Memory Service: 200 OK на /health (healthcheck через python urllib)
  • Load average: < 2.0 норма, < 5.0 допустимо

📊 Prometheus Alerting

Налаштовані алерти

Alert Умова Severity
ServiceDown up == 0 > 2m critical
QdrantCollectionsLow collections < 10 warning
QdrantVectorsDropped vectors < 500 warning
HostDiskSpaceLow free < 15% warning
HostMemoryHigh usage > 90% warning
HostHighLoad load15 > 10 warning

Перевірка rules

curl -sS http://127.0.0.1:9090/api/v1/rules | python3 -m json.tool | head -50

🔄 Qdrant Backup/Restore

Снапшоти

  • Розташування: /opt/backups/qdrant/ та через API
  • Retention: щоденні автоматичні
  • Останній: full-snapshot-2026-01-26-10-11-31.snapshot (1.2GB)

Restore Drill (перевірено 2026-01-26)

# Restore успішно протестовано на окремому порту 16333
# helion_messages: 365 points відновлено і перевірено пошуком

Last Updated: 2026-01-26 11:40


Behavior Policy v2.1 CHANGELOG

Date: 2026-02-07 Version: Behavior Policy v2.1 / Global System Prompt v2.1

Architecture (Source of Truth)

Layer Component Location
Policy document Global System Prompt v2.1 prompts/global_system_prompt_v2.md
Gateway (source of truth) detect_url + detect_explicit_request gateway-bot/behavior_policy.py
Decision layer behavior_policy.py v2.1 gateway-bot/behavior_policy.py
HTTP API (gateway) http_api.py gateway-bot/http_api.py
PromptBuilder N/A (runtime_context injected at gateway) services/router/prompt_builder.py
Tests 39 tests tests/test_behavior_policy.py
Runbook Behavior Policy v2.1 runbooks/behavior-policy-v2.1.md

As of v2.1, runtime_context injection happens in gateway (http_api.py), not PromptBuilder.

Breaking Changes (from v1.1)

  • Bare @mention in public/topic WITHOUT has_explicit_request -> NO_OUTPUT
  • Gateway computes has_link and has_explicit_request (behavior_policy does NOT override)
  • thread_has_agent_participation is now REQUIRED (fallback: false)
  • has_explicit_request contract: imperative OR (? AND (dm OR reply OR mention OR thread))