# 🏗️ NODA1 Production Stack **Version:** 2.2 **Last Updated:** 2026-02-11 **Status:** Production (drift-controlled) ✅ ## 🔎 Current Reality (2026-02-11) - Deploy root: `/opt/microdao-daarion` (single runtime root) - Drift control: `/opt/microdao-daarion/ops/drift-check.sh` → expected `DRIFT_CHECK: OK` - Gateway: `agents_count=13` (user-facing) - Router: 15 active agents (13 user-facing + 2 internal) - Internal routing defaults: - `monitor` → local (`swapper+ollama`, `qwen3-8b`) - `devtools` → local (`swapper+ollama`, `qwen3-8b`) + conditional cloud fallback for heavy task types - Memory service: `/health` and `/stats` return `200` ## 📍 Node Information - **Hostname:** node1-daarion - **IP Address:** 144.76.224.179 - **IPv6:** 2a01:4f8:201:2a6::2 - **Location:** Hetzner Cloud (Germany) - **Role:** Production Router + Gateway + All Services - **Uptime Target:** 24/7 - **SSH:** `ssh root@144.76.224.179` ## 🖥️ Hardware - **CPU:** Available cores (view with `nproc`) - **RAM:** 62GB - **Disk:** 1.7TB (~1.3TB available) - **GPU:** NVIDIA RTX 4000 SFF Ada Generation (20GB VRAM) ## 🐳 Docker Services (27+ active) ### Core Services (✅ All Healthy) | Service | Port | Container | Health | |---------|------|-----------|--------| | Router | 9102 | dagi-router-node1 | ✅ | | Gateway | 9300 | dagi-gateway | ✅ | | Memory Service | 8000 | dagi-memory-service-node1 | ✅ | | RAG Service | 9500 | rag-service-node1 | ✅ | | Swapper Service | 8890-8891 | swapper-service-node1 | ✅ | | Vision Encoder | 8001 | dagi-vision-encoder-node1 | ✅ | ### Databases (✅ All Healthy) | Service | Port | Container | Health | |---------|------|-----------|--------| | PostgreSQL | 5432 | dagi-postgres | ✅ | | Qdrant | 6333-6334 | dagi-qdrant-node1 | ✅ | | Redis | 6379 | dagi-redis-node1 | ✅ | | Neo4j | 7474, 7687 | dagi-neo4j-node1 | ✅ | ### Supporting Services | Service | Port | Container | Health | |---------|------|-----------|--------| | NATS | 4222 | dagi-nats-node1 | ✅ | | MinIO | 9000-9001 | dagi-minio-node1 | ✅ | | Crawl4AI | 11235 | dagi-crawl4ai-node1 | ✅ | | Parser Pipeline | 8101 | parser-pipeline | ✅ | | Ingest Service | 8100 | ingest-service | ✅ | ### AI/ML Services | Service | Port | Container | Status | |---------|------|-----------|--------| | CrewAI | - | dagi-crewai-node1 | ✅ | | CrewAI NATS Worker | 9011 | crewai-nats-worker | ✅ | ### Artifact Services | Service | Port | Container | Status | |---------|------|-----------|--------| | Artifact Registry | 9220 | artifact-registry-node1 | ✅ | | Brand Registry | 9210 | brand-registry-node1 | ✅ | | Brand Intake | 9211 | brand-intake-node1 | ✅ | | Presentation Renderer | 9212 | presentation-renderer-node1 | ✅ | ### Monitoring (✅ All Healthy) | Service | Port | Container | Health | |---------|------|-----------|--------| | Prometheus | 9090 | prometheus | ✅ | | Grafana | 3030 | grafana | ✅ | ## 🤖 Telegram Bots (13 user-facing) У production gateway зараз user-facing агенти: `daarwizz`, `helion`, `alateya`, `druid`, `nutra`, `agromatrix`, `greenfood`, `clan`, `eonarch`, `yaromir`, `soul`, `senpai`, `sofiia`. Швидка перевірка: ```bash curl -sS http://localhost:9300/health ``` ## 📊 Health Check Endpoints ```bash # All services quick check curl http://localhost:9102/health # Router curl http://localhost:9300/health # Gateway curl http://localhost:8000/health # Memory Service curl http://localhost:9500/health # RAG curl http://localhost:8890/health # Swapper curl http://localhost:6333/healthz # Qdrant curl http://localhost:8001/health # Vision Encoder curl http://localhost:8101/health # Parser Pipeline curl http://localhost:9090/-/healthy # Prometheus curl http://localhost:3030/api/health # Grafana ``` ## 🔧 Common Operations ### View all services ```bash docker ps --format "table {{.Names}}\t{{.Status}}" ``` ### Restart a service ```bash docker restart ``` ### View logs ```bash docker logs --tail 50 -f ``` ### System status ```bash nvidia-smi # GPU status df -h # Disk usage free -h # Memory usage uptime # System uptime ``` ## 💾 Backups ### PostgreSQL (Auto) - **Location:** `/opt/backups/postgres/` - **Schedule:** Every 6 hours (3:00, 9:00, 15:00, 21:00) - **Retention:** 7 days daily, 4 weeks, 6 months - **Container:** postgres-backup-node1 ### Qdrant (Manual) ```bash # Create snapshot curl -X POST "http://localhost:6333/snapshots" # List snapshots curl "http://localhost:6333/snapshots" ``` ### Manual backup all ```bash cd /opt/microdao-daarion ./scripts/backup/backup_all.sh ``` ## 🔒 Security Status - ✅ No suspicious processes - ✅ No executables in /tmp - ✅ Firewall configured - ✅ Daily backups active - ✅ System load normal (< 1.0) ## ⚙️ Configuration Files - **Docker Compose:** `docker-compose.node1.yml` - **Router Config:** `services/router/router_config.yaml` - **Backup Compose:** `docker-compose.backups.yml` ## 📝 Recent Changes (2026-01-26) ### ✅ Fixed Issues 1. **Memory Service** - Fixed MEMORY_QDRANT_HOST (was `qdrant`, now `dagi-qdrant-node1`) 2. **Qdrant snapshot** created before fix: `full-snapshot-2026-01-26-10-11-31.snapshot` ### ⚠️ Known Issues - **Control-plane** container port 9200 not published to host (internal only) - **Image-gen** service not running (use swapper-service instead) ## 🆚 Version History ### v2.1 (2026-01-26) - Memory Service DNS fix (qdrant → dagi-qdrant-node1) - Full health check verified - Documentation updated ### v2.0 (2026-01-22) - Git repository initialized - Qdrant healthcheck fixed - render-pdf-worker disabled ### v1.x (2026-01-10 - 2026-01-19) - Previous deployment - Security incidents resolved ## 📞 Support - **SSH:** root@144.76.224.179 - **Monitoring:** http://144.76.224.179:3030 (Grafana) - **Metrics:** http://144.76.224.179:9090 (Prometheus) --- **Maintained by:** NODA1 System **Last Health Check:** 2026-01-26 11:13 UTC **Status:** ✅ All systems operational --- ## 🔧 By Design (очікувана поведінка) ### Сервіси без публічних портів | Сервіс | Порт | Статус | Пояснення | |--------|------|--------|-----------| | **RBAC** | 9200 | Internal only | Порт не опублікований. Доступ тільки з docker network. | | **Image-gen** | 8892 | Не використовується | Генерація зображень йде через `swapper-service (8890)`. | | **Parser** | 9400 | Відсутній | Замінено на `parser-pipeline (8101)` як єдину точку парсингу. | ### Діагностика internal сервісів ```bash # RBAC (зсередини docker network) docker exec dagi-gateway curl -sS http://rbac:9200/health # Перевірка DNS resolution docker exec dagi-memory-service-node1 python3 -c "import socket; print(socket.gethostbyname('dagi-qdrant-node1'))" ``` ### Нормальні значення - **Qdrant**: 18 колекцій, 900+ векторів - **Memory Service**: 200 OK на `/health` (healthcheck через python urllib) - **Load average**: < 2.0 норма, < 5.0 допустимо --- ## 📊 Prometheus Alerting ### Налаштовані алерти | Alert | Умова | Severity | |-------|-------|----------| | ServiceDown | `up == 0` > 2m | critical | | QdrantCollectionsLow | collections < 10 | warning | | QdrantVectorsDropped | vectors < 500 | warning | | HostDiskSpaceLow | free < 15% | warning | | HostMemoryHigh | usage > 90% | warning | | HostHighLoad | load15 > 10 | warning | ### Перевірка rules ```bash curl -sS http://127.0.0.1:9090/api/v1/rules | python3 -m json.tool | head -50 ``` --- ## 🔄 Qdrant Backup/Restore ### Снапшоти - **Розташування**: `/opt/backups/qdrant/` та через API - **Retention**: щоденні автоматичні - **Останній**: `full-snapshot-2026-01-26-10-11-31.snapshot` (1.2GB) ### Restore Drill (перевірено 2026-01-26) ```bash # Restore успішно протестовано на окремому порту 16333 # helion_messages: 365 points відновлено і перевірено пошуком ``` --- **Last Updated:** 2026-01-26 11:40 --- ## Behavior Policy v2.1 CHANGELOG **Date:** 2026-02-07 **Version:** Behavior Policy v2.1 / Global System Prompt v2.1 ### Architecture (Source of Truth) | Layer | Component | Location | |-------|-----------|----------| | Policy document | Global System Prompt v2.1 | prompts/global_system_prompt_v2.md | | Gateway (source of truth) | detect_url + detect_explicit_request | gateway-bot/behavior_policy.py | | Decision layer | behavior_policy.py v2.1 | gateway-bot/behavior_policy.py | | HTTP API (gateway) | http_api.py | gateway-bot/http_api.py | | PromptBuilder | N/A (runtime_context injected at gateway) | services/router/prompt_builder.py | | Tests | 39 tests | tests/test_behavior_policy.py | | Runbook | Behavior Policy v2.1 | runbooks/behavior-policy-v2.1.md | As of v2.1, runtime_context injection happens in gateway (http_api.py), not PromptBuilder. ### Breaking Changes (from v1.1) - Bare @mention in public/topic WITHOUT has_explicit_request -> NO_OUTPUT - Gateway computes has_link and has_explicit_request (behavior_policy does NOT override) - thread_has_agent_participation is now REQUIRED (fallback: false) - has_explicit_request contract: imperative OR (? AND (dm OR reply OR mention OR thread))