Files
microdao-daarion/INFRASTRUCTURE.md
Apple 744c149300
Some checks failed
Build and Deploy Docs / build-and-deploy (push) Has been cancelled
Add automated session logging system
- Created logs/ structure (sessions, operations, incidents)
- Added session-start/log/end scripts
- Installed Git hooks for auto-logging commits/pushes
- Added shell integration for zsh
- Created CHANGELOG.md
- Documented today's session (2026-01-10)
2026-01-10 04:53:17 -08:00

1806 lines
59 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 🏗️ Infrastructure Overview — DAARION & MicroDAO
**Версія:** 2.4.0
**Останнє оновлення:** 2026-01-09 13:50
**Статус:** Production Ready (95% Multimodal Integration)
**Останні зміни:**
- 🔒 **Security Incident Resolution** (Dec 6 2025 - Jan 8 2026)
- ✅ Compromised container removed (`daarion-web`)
- ✅ Firewall rules implemented (egress filtering)
- ✅ Monitoring for scanning attempts deployed
- ✅ Router Multimodal API (v1.1.0) - images/files/audio/web-search
- ✅ Telegram Gateway Multimodal - voice/photo/documents
- ✅ Frontend Multimodal UI - enhanced mode
- ✅ Web Search Service (НОДА2)
- ⚠️ STT/OCR Services (НОДА2 Docker issues, fallback працює)
---
## 📍 Network Nodes
### Node #1: Production Server (Hetzner GEX44 #2844465)
- **Node ID:** `node-1-hetzner-gex44`
- **IP Address:** `144.76.224.179`
- **SSH Access:** `ssh root@144.76.224.179`
- **Location:** Hetzner Cloud (Germany)
- **Project Root:** `/opt/microdao-daarion`
- **Docker Network:** `dagi-network`
- **Role:** Production Router + Gateway + All Services
- **Uptime:** 24/7
- **Prometheus Tunnel:** `scripts/start-node1-prometheus-tunnel.sh` (дефолт `localhost:19090``NODE1:9090`, можна змінити `LOCAL_PORT`)
**Domains:**
- `gateway.daarion.city``144.76.224.179` (Gateway + Nginx)
- `api.daarion.city` → TBD (API Gateway)
- `daarion.city` → TBD (Main website)
### Node #2: Development Node (MacBook Pro M4 Max)
- **Node ID:** `node-2-macbook-m4max`
- **Local IP:** `192.168.1.33` (updated 2025-11-23)
- **SSH Access:** `ssh apple@192.168.1.244` (if enabled)
- **Location:** Local Network (Ivan's Office)
- **Project Root:** `/Users/apple/github-projects/microdao-daarion`
- **Role:** Development + Testing + Backup Router
- **Specs:** M4 Max (16 cores), 64GB RAM, 2TB SSD, 40-core GPU
- **Uptime:** On-demand (battery-powered)
**See full specs:** [NODE-2-MACBOOK-SPECS.md](./NODE-2-MACBOOK-SPECS.md)
**Current state:** [NODE-2-CURRENT-STATE.md](./NODE-2-CURRENT-STATE.md) — What's running now
### Node #3: AI/ML Workstation (Threadripper PRO + RTX 3090)
- **Node ID:** `node-3-threadripper-rtx3090`
- **Hostname:** `llm80-che-1-1`
- **IP Address:** `80.77.35.151`
- **SSH Access:** `ssh zevs@80.77.35.151 -p33147` (password: `147zevs369`)
- **Location:** Remote Datacenter
- **OS:** Ubuntu 24.04.3 LTS (Noble Numbat)
- **Uptime:** 24/7
- **Role:** AI/ML Workloads, GPU Inference, Kubernetes Orchestration
**Hardware Specs:**
- **CPU:** AMD Ryzen Threadripper PRO 5975WX
- 32 cores / 64 threads
- Base: 1.8 GHz, Boost: 3.6 GHz
- **RAM:** 128GB DDR4
- **GPU:** NVIDIA GeForce RTX 3090
- 24GB GDDR6X VRAM
- 10496 CUDA cores
- CUDA 13.0, Driver 580.95.05
- **Storage:** Samsung SSD 990 PRO 4TB NVMe
- Total: 3.6TB
- Root partition: 100GB (27% used)
- Available for expansion: 3.5TB
- **Container Runtime:** MicroK8s + containerd
**Services Running:**
- Port 3000 - Unknown service (needs investigation)
- Port 8080 - Unknown service (needs investigation)
- Port 11434 - Ollama (localhost only)
- Port 27017/27019 - MongoDB (localhost only)
- Kubernetes API: 16443
- Various K8s services: 10248-10259, 25000
**Security Status:** ✅ Clean (verified 2026-01-09)
- No crypto miners detected
- 0 zombie processes
- CPU load: 0.17 (very low)
- GPU utilization: 0% (ready for workloads)
**Recommended Use Cases:**
- 🤖 Large LLM inference (Llama 70B, Qwen 72B, Mixtral 8x22B)
- 🧠 Model training and fine-tuning
- 🎨 Stable Diffusion XL image generation
- 🔬 AI/ML research and experimentation
- 🚀 Kubernetes-based AI service orchestration
---
## 🐙 GitHub Repositories
### 1. MicroDAO (Current Project)
- **Repository:** `git@github.com:IvanTytar/microdao-daarion`
- **HTTPS:** `https://github.com/IvanTytar/microdao-daarion`
- **Remote Name:** `origin`
- **Main Branch:** `main`
- **Purpose:** MicroDAO core code, DAGI Stack, documentation
**Quick Clone:**
```bash
git clone git@github.com:IvanTytar/microdao-daarion
cd microdao-daarion
```
### 2. DAARION.city
- **Repository:** `git@github.com:DAARION-DAO/daarion-ai-city.git`
- **HTTPS:** `https://github.com/DAARION-DAO/daarion-ai-city.git`
- **Remote Name:** `daarion-city`
- **Main Branch:** `main`
- **Purpose:** Official DAARION.city website and integrations
**Quick Clone:**
```bash
git clone git@github.com:DAARION-DAO/daarion-ai-city.git
cd daarion-ai-city
```
**Add as remote to MicroDAO:**
```bash
cd microdao-daarion
git remote add daarion-city git@github.com:DAARION-DAO/daarion-ai-city.git
git fetch daarion-city
```
---
## 🤖 Для агентів Cursor: Робота на НОДА1
### SSH підключення до НОДА1
**Базова команда:**
```bash
ssh root@144.76.224.179
```
**Важливо для агентів:**
- SSH ключ має бути налаштований на локальній машині користувача
- Якщо ключа немає, підключення запитає пароль (який має надати користувач)
- Після підключення ви працюєте від імені `root`
### Робочі директорії на НОДА1
```bash
# Основний проєкт
cd /opt/microdao-daarion
# Docker контейнери
docker ps # список запущених контейнерів
docker logs <container_name> # логи контейнера
docker exec -it <container_name> bash # зайти в контейнер
# Логи системи
cd /var/log
tail -f /var/log/syslog # системні логи
journalctl -u docker -f # Docker логи в реальному часі
# Скрипти безпеки
ls -la /root/*.sh # firewall та моніторинг скрипти
```
### Типові завдання для агентів
**1. Перевірити статус сервісів:**
```bash
ssh root@144.76.224.179 "docker ps --format 'table {{.Names}}\\t{{.Status}}'"
```
**2. Перезапустити сервіс:**
```bash
ssh root@144.76.224.179 "docker restart <service_name>"
```
**3. Переглянути логи:**
```bash
ssh root@144.76.224.179 "docker logs --tail 50 <service_name>"
```
**4. Виконати команду в контейнері:**
```bash
ssh root@144.76.224.179 "docker exec <container_name> <command>"
```
**5. Git operations:**
```bash
ssh root@144.76.224.179 "cd /opt/microdao-daarion && git pull origin main"
ssh root@144.76.224.179 "cd /opt/microdao-daarion && git status"
```
**6. Перезапустити Docker Compose:**
```bash
ssh root@144.76.224.179 "cd /opt/microdao-daarion && docker compose restart"
```
### Interactive режим (для складних завдань)
Якщо потрібно виконати кілька команд підряд, використовуйте interactive SSH:
```bash
# Запустіть інтерактивну сесію
ssh root@144.76.224.179
# Тепер ви на сервері, можете виконувати команди:
cd /opt/microdao-daarion
docker ps
docker logs dagi-router --tail 20
exit # вийти з SSH
```
### Важливі нотатки для агентів
1. **Завжди перевіряйте, де ви знаходитесь:**
```bash
hostname # має показати назву сервера Hetzner
pwd # поточна директорія
```
2. **Не виконуйте деструктивні команди без підтвердження:**
- `docker rm -f` (видалення контейнерів)
- `rm -rf` (видалення файлів)
- Будь-які зміни в production без backup
3. **Перевіряйте статус перед змінами:**
```bash
docker ps # що зараз працює
docker compose ps # статус docker compose сервісів
systemctl status docker # статус Docker daemon
```
4. **Логування ваших дій:**
- Всі важливі зміни документуйте
- Використовуйте `git commit` з детальними повідомленнями
- Включайте `Co-Authored-By: Cursor Agent <agent@cursor.sh>`
### Приклад сесії для Cursor Agent
```bash
# 1. Підключення
ssh root@144.76.224.179
# 2. Перехід до проєкту
cd /opt/microdao-daarion
# 3. Перевірка статусу
git status
docker ps --format "table {{.Names}}\\t{{.Status}}"
# 4. Оновлення коду (якщо потрібно)
git pull origin main
# 5. Перезапуск сервісів (якщо потрібно)
docker compose restart dagi-router
# 6. Перевірка логів
docker logs dagi-router --tail 20
# 7. Вихід
exit
```
### Troubleshooting
**Якщо SSH не підключається:**
1. Перевірте, чи сервер онлайн: `ping 144.76.224.179`
2. Перевірте SSH ключі: `ls -la ~/.ssh/`
3. Спробуйте з verbose: `ssh -v root@144.76.224.179`
**Якщо контейнери не працюють:**
1. Перевірте Docker: `systemctl status docker`
2. Перевірте логи: `journalctl -u docker --no-pager -n 50`
3. Перезапустіть Docker: `systemctl restart docker`
**Якщо потрібен rescue mode:**
1. Зайдіть в Hetzner Robot: https://robot.hetzner.com
2. Активуйте rescue system
3. Зробіть Reset
4. Підключіться через SSH з rescue паролем
---
## 🚀 Services & Ports (Docker Compose)
### Core Services
| Service | Port | Container Name | Health Endpoint |
|---------|------|----------------|-----------------|
| **DAGI Router** | 9102 | `dagi-router` | `http://localhost:9102/health` |
| **Bot Gateway** | 9300 | `dagi-gateway` | `http://localhost:9300/health` |
| **DevTools Backend** | 8008 | `dagi-devtools` | `http://localhost:8008/health` |
| **CrewAI Orchestrator** | 9010 | `dagi-crewai` | `http://localhost:9010/health` |
| **RBAC Service** | 9200 | `dagi-rbac` | `http://localhost:9200/health` |
| **RAG Service** | 9500 | `dagi-rag-service` | `http://localhost:9500/health` |
| **Memory Service** | 8000 | `dagi-memory-service` | `http://localhost:8000/health` |
| **Parser Service** | 9400 | `dagi-parser-service` | `http://localhost:9400/health` |
| **Swapper Service** | 8890-8891 | `swapper-service` | `http://localhost:8890/health` |
| **Frontend (Vite)** | 8899 | `frontend` | `http://localhost:8899` |
| **Agent Cabinet Service** | 8898 | `agent-cabinet-service` | `http://localhost:8898/health` |
| **PostgreSQL** | 5432 | `dagi-postgres` | - |
| **Redis** | 6379 | `redis` | `redis-cli PING` |
| **Neo4j** | 7687 (bolt), 7474 (http) | `neo4j` | `http://localhost:7474` |
| **Qdrant** | 6333 (http), 6334 (grpc) | `dagi-qdrant` | `http://localhost:6333/healthz` |
| **Grafana** | 3000 | `grafana` | `http://localhost:3000` |
| **Prometheus** | 9090 | `prometheus` | `http://localhost:9090` |
| **Neo4j Exporter** | 9091 | `neo4j-exporter` | `http://localhost:9091/metrics` |
| **Ollama** | 11434 | `ollama` (external) | `http://localhost:11434/api/tags` |
### Multimodal Services (НОДА2)
| Service | Port | Container Name | Health Endpoint |
|---------|------|----------------|-----------------|
| **STT Service** | 8895 | `stt-service` | `http://192.168.1.244:8895/health` |
| **OCR Service** | 8896 | `ocr-service` | `http://192.168.1.244:8896/health` |
| **Web Search** | 8897 | `web-search-service` | `http://192.168.1.244:8897/health` |
| **Vector DB** | 8898 | `vector-db-service` | `http://192.168.1.244:8898/health` |
**Note:** Vision Encoder (port 8001) не запущений на Node #1. Замість нього використовується **Swapper Service** з **vision-8b** моделлю (Qwen3-VL 8B) для обробки зображень через динамічне завантаження моделей.
**Swapper Service:**
- **Порт:** 8890 (HTTP), 8891 (Prometheus metrics)
- **URL НОДА1:** `http://144.76.224.179:8890`
- **URL НОДА2:** `http://192.168.1.244:8890`
- **Відображення:** Тільки в кабінетах НОД (`/nodes/node-1`, `/nodes/node-2`)
- **Оновлення:** В реальному часі (кожні 30 секунд)
- **Моделі:** 5 моделей (qwen3:8b, qwen3-vl:8b, qwen2.5:7b-instruct, qwen2.5:3b-instruct, qwen2-math:7b)
- **Спеціалісти:** 6 спеціалістів (vision-8b, math-7b, structured-fc-3b, rag-mini-4b, lang-gateway-4b, security-guard-7b)
### HTTPS Gateway (Nginx)
- **Port:** 443 (HTTPS), 80 (HTTP redirect)
- **Domain:** `gateway.daarion.city`
- **SSL:** Let's Encrypt (auto-renewal)
- **Proxy Pass:**
- `/telegram/webhook` → `http://localhost:9300/telegram/webhook`
- `/helion/telegram/webhook` → `http://localhost:9300/helion/telegram/webhook`
---
## 🤖 Telegram Bots
### 1. DAARWIZZ Bot
- **Username:** [@DAARWIZZBot](https://t.me/DAARWIZZBot)
- **Bot ID:** `8323412397`
- **Token:** `8323412397:AAFxaru-hHRl08A3T6TC02uHLvO5wAB0m3M` ✅
- **Webhook:** `https://gateway.daarion.city/telegram/webhook`
- **Status:** Active (Production)
### 2. Helion Bot (Energy Union AI)
- **Username:** [@HelionEnergyBot](https://t.me/HelionEnergyBot) (example)
- **Bot ID:** `8112062582`
- **Token:** `8112062582:AAGI7tPFo4gvZ6bfbkFu9miq5GdAH2_LvcM` ✅
- **Webhook:** `https://gateway.daarion.city/helion/telegram/webhook`
- **Status:** Ready for deployment
---
## 🔐 Environment Variables (.env)
### Essential Variables
```bash
# Bot Gateway
TELEGRAM_BOT_TOKEN=8323412397:AAFxaru-hHRl08A3T6TC02uHLvO5wAB0m3M
HELION_TELEGRAM_BOT_TOKEN=8112062582:AAGI7tPFo4gvZ6bfbkFu9miq5GdAH2_LvcM
GATEWAY_PORT=9300
# DAGI Router
ROUTER_PORT=9102
ROUTER_CONFIG_PATH=./router-config.yml
# Ollama (Local LLM)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=qwen3:8b
# Memory Service
MEMORY_SERVICE_URL=http://memory-service:8000
MEMORY_DATABASE_URL=postgresql://postgres:postgres@postgres:5432/daarion_memory
# PostgreSQL
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=daarion_memory
# RBAC
RBAC_PORT=9200
RBAC_DATABASE_URL=sqlite:///./rbac.db
# Vision Encoder (GPU required for production)
VISION_ENCODER_URL=http://vision-encoder:8001
VISION_DEVICE=cuda
VISION_MODEL_NAME=ViT-L-14
VISION_MODEL_PRETRAINED=openai
# Qdrant Vector Database
QDRANT_HOST=qdrant
QDRANT_PORT=6333
QDRANT_ENABLED=true
# CORS
CORS_ORIGINS=http://localhost:3000,https://daarion.city
# Environment
ENVIRONMENT=production
DEBUG=false
LOG_LEVEL=INFO
```
---
## 🌌 SPACE API (planets, nodes, events)
**Сервіс:** `space-service` (FastAPI / Node.js)
**Порти:** `7001` (FastAPI), `3005` (Node.js)
### **GET /space/planets**
Повертає DAO-планети (health, treasury, satellites, anomaly score, position).
**Response Example:**
```json
[
{
"dao_id": "dao:3",
"name": "Aurora Circle",
"health": "good",
"treasury": 513200,
"activity": 0.84,
"governance_temperature": 72,
"anomaly_score": 0.04,
"position": { "x": 120, "y": 40, "z": -300 },
"node_count": 12,
"satellites": [
{
"node_id": "node:03",
"gpu_load": 0.66,
"latency": 14,
"agents": 22
}
]
}
]
```
### **GET /space/nodes**
Повертає стан кожної ноди (GPU, CPU, memory, network, agents, status).
**Response Example:**
```json
[
{
"node_id": "node:03",
"name": "Quantum Relay",
"microdao": "microdao:7",
"gpu": {
"load": 0.72,
"vram_used": 30.1,
"vram_total": 40.0,
"temperature": 71
},
"cpu": {
"load": 0.44,
"temperature": 62
},
"memory": {
"used": 11.2,
"total": 32.0
},
"network": {
"latency": 12,
"bandwidth_in": 540,
"bandwidth_out": 430,
"packet_loss": 0.01
},
"agents": 14,
"status": "healthy"
}
]
```
### **GET /space/events**
Поточні DAO/Space події (governance, treasury, anomalies, node alerts).
**Query Parameters:**
- `seconds` (optional): Time window in seconds (default: 120)
**Response Example:**
```json
[
{
"type": "dao.vote.opened",
"dao_id": "dao:3",
"timestamp": 1735680041,
"severity": "info",
"meta": {
"proposal_id": "P-173",
"title": "Budget Allocation 2025"
}
},
{
"type": "node.alert.overload",
"node_id": "node:05",
"timestamp": 1735680024,
"severity": "warn",
"meta": {
"gpu_load": 0.92
}
}
]
```
### **Джерела даних:**
| Дані | Джерело | Компонент |
| ------ | -------------------------------------------- | ------------------------------- |
| DAO | microDAO Service / DAO-Service | PostgreSQL |
| Ноди | NodeMetrics Agent → NATS → Metrics Collector | Redis / Timescale |
| Агенти | Router → Agent Registry | Redis / SQLite |
| Події | NATS JetStream | JetStream Stream `events.space` |
**Frontend Integration:**
- API клієнти: `src/api/space/getPlanets.ts`, `src/api/space/getNodes.ts`, `src/api/space/getSpaceEvents.ts`
- Використання: City Dashboard, Space Dashboard, Living Map, World Prototype
---
## 📦 Deployment Workflow
### 1. Local Development → GitHub
```bash
# On Mac (local)
cd /Users/apple/github-projects/microdao-daarion
git add .
git commit -m "feat: description"
git push origin main
```
### 2. GitHub → Production Server
```bash
# SSH to server
ssh root@144.76.224.179
# Navigate to project
cd /opt/microdao-daarion
# Pull latest changes
git pull origin main
# Restart services
docker-compose down
docker-compose up -d --build
# Check status
docker-compose ps
docker-compose logs -f gateway
```
### 3. HTTPS Gateway Setup
```bash
# On server (one-time setup)
sudo ./scripts/setup-nginx-gateway.sh gateway.daarion.city admin@daarion.city
```
### 4. Register Telegram Webhook
```bash
# On server
./scripts/register-agent-webhook.sh daarwizz 8323412397:AAFxaru-hHRl08A3T6TC02uHLvO5wAB0m3M gateway.daarion.city
./scripts/register-agent-webhook.sh helion 8112062582:AAGI7tPFo4gvZ6bfbkFu9miq5GdAH2_LvcM gateway.daarion.city
```
---
## 🧪 Testing & Monitoring
### Health Checks (All Services)
```bash
# On server
curl http://localhost:9102/health # Router
curl http://localhost:9300/health # Gateway
curl http://localhost:8000/health # Memory
curl http://localhost:9200/health # RBAC
curl http://localhost:9500/health # RAG
curl http://localhost:8001/health # Vision Encoder
curl http://localhost:6333/healthz # Qdrant
# Public HTTPS
curl https://gateway.daarion.city/health
```
### Smoke Tests
```bash
# On server
cd /opt/microdao-daarion
./smoke.sh
```
### View Logs
```bash
# All services
docker-compose logs -f
# Specific service
docker-compose logs -f gateway
docker-compose logs -f router
docker-compose logs -f memory-service
# Filter by error level
docker-compose logs gateway | grep ERROR
```
### Database Check
```bash
# PostgreSQL
docker exec -it dagi-postgres psql -U postgres -c "\l"
docker exec -it dagi-postgres psql -U postgres -d daarion_memory -c "\dt"
```
---
## 🌐 DNS Configuration
### Current DNS Records (Cloudflare/Hetzner)
| Record Type | Name | Value | TTL |
|-------------|------|-------|-----|
| A | `gateway.daarion.city` | `144.76.224.179` | 300 |
| A | `daarion.city` | TBD | 300 |
| A | `api.daarion.city` | TBD | 300 |
**Verify DNS:**
```bash
dig gateway.daarion.city +short
# Should return: 144.76.224.179
```
---
## 📂 Key File Locations
### On Server (`/opt/microdao-daarion`)
- **Docker Compose:** `docker-compose.yml`
- **Environment:** `.env` (never commit!)
- **Router Config:** `router-config.yml`
- **Nginx Setup:** `scripts/setup-nginx-gateway.sh`
- **Webhook Register:** `scripts/register-agent-webhook.sh`
- **Logs:** `logs/` directory
- **Data:** `data/` directory
### System Prompts
- **DAARWIZZ:** `gateway-bot/daarwizz_prompt.txt`
- **Helion:** `gateway-bot/helion_prompt.txt`
### Documentation
- **Quick Start:** `WARP.md`
- **Agents Map:** `docs/agents.md`
- **RAG Ingestion:** `RAG-INGESTION-STATUS.md`
- **HMM Memory:** `HMM-MEMORY-STATUS.md`
- **Crawl4AI Service:** `CRAWL4AI-STATUS.md`
- **Architecture:** `docs/cursor/README.md`
- **API Reference:** `docs/api.md`
---
## 🔄 Backup & Restore
### Backup Database
```bash
# PostgreSQL dump
docker exec dagi-postgres pg_dump -U postgres daarion_memory > backup_$(date +%Y%m%d).sql
# RBAC SQLite
cp data/rbac/rbac.db backups/rbac_$(date +%Y%m%d).db
```
### Restore Database
```bash
# PostgreSQL restore
cat backup_20250117.sql | docker exec -i dagi-postgres psql -U postgres daarion_memory
# RBAC restore
cp backups/rbac_20250117.db data/rbac/rbac.db
docker-compose restart rbac
```
---
## 📞 Contacts & Support
### Team
- **Owner:** Ivan Tytar
- **Email:** admin@daarion.city
- **GitHub:** [@IvanTytar](https://github.com/IvanTytar)
### External Services
- **Hetzner Support:** https://www.hetzner.com/support
- **Cloudflare Support:** https://dash.cloudflare.com
- **Telegram Bot Support:** https://core.telegram.org/bots
---
## 🔗 Quick Reference Links
### Documentation
- [WARP.md](./WARP.md) — Main developer guide
- [SYSTEM-INVENTORY.md](./SYSTEM-INVENTORY.md) — Complete system inventory (GPU, AI models, 17 services)
- [DAARION_CITY_REPO.md](./DAARION_CITY_REPO.md) — Repository management
- [RAG-INGESTION-STATUS.md](./RAG-INGESTION-STATUS.md) — RAG event-driven ingestion (Wave 1, 2, 3)
- [HMM-MEMORY-STATUS.md](./HMM-MEMORY-STATUS.md) — Hierarchical Memory System for agents
- [CRAWL4AI-STATUS.md](./CRAWL4AI-STATUS.md) — Web crawler for document ingestion (PDF, Images, HTML)
- [VISION-ENCODER-STATUS.md](./VISION-ENCODER-STATUS.md) — Vision Encoder service status (OpenCLIP multimodal embeddings)
- [VISION-RAG-IMPLEMENTATION.md](./VISION-RAG-IMPLEMENTATION.md) — Vision RAG complete implementation (client, image search, routing)
- [services/vision-encoder/README.md](./services/vision-encoder/README.md) — Vision Encoder deployment guide
- [SERVER_SETUP_INSTRUCTIONS.md](./SERVER_SETUP_INSTRUCTIONS.md) — Server setup
- [DEPLOY-NOW.md](./DEPLOY-NOW.md) — Deployment checklist
- [STATUS-HELION.md](./STATUS-HELION.md) — Helion agent status
### Monitoring Dashboards
- **Gateway Health:** `https://gateway.daarion.city/health`
- **Router Providers:** `http://localhost:9102/providers`
- **Routing Table:** `http://localhost:9102/routing`
- **Prometheus:** `http://localhost:9090` (Metrics, Alerts, Targets)
- **Grafana Dashboard:** `http://localhost:3000` (Neo4j metrics, DAO/Agents/Users analytics)
- **Neo4j Browser:** `http://localhost:7474` (Graph visualization, Cypher queries)
- **Neo4j Exporter:** `http://localhost:9091/metrics` (Prometheus metrics endpoint)
---
## 🚨 Troubleshooting
### Service Not Starting
```bash
# Check logs
docker-compose logs service-name
# Restart service
docker-compose restart service-name
# Rebuild and restart
docker-compose up -d --build service-name
```
### Database Connection Issues
```bash
# Check PostgreSQL
docker exec -it dagi-postgres psql -U postgres -c "SELECT 1"
# Restart PostgreSQL
docker-compose restart postgres
# Check connection from memory service
docker exec -it dagi-memory-service env | grep DATABASE
```
### Webhook Not Working
```bash
# Check webhook status
curl "https://api.telegram.org/bot<TOKEN>/getWebhookInfo"
# Re-register webhook
./scripts/register-agent-webhook.sh <agent> <token> <domain>
# Check gateway logs
docker-compose logs -f gateway | grep webhook
```
### SSL Certificate Issues
```bash
# Check certificate
sudo certbot certificates
# Renew certificate
sudo certbot renew --dry-run
sudo certbot renew
# Restart Nginx
sudo systemctl restart nginx
```
---
## 📊 Metrics & Analytics (Future)
### Planned Monitoring Stack
- **Prometheus:** Metrics collection
- **Grafana:** Dashboards
- **Loki:** Log aggregation
- **Alertmanager:** Alerts
**Port Reservations:**
- Prometheus: 9090
- Grafana: 3000
- Loki: 3100
---
---
## 🖥️ Кабінети НОД та МікроДАО
### Кабінети НОД
- **НОДА1:** `http://localhost:8899/nodes/node-1`
- **НОДА2:** `http://localhost:8899/nodes/node-2`
**Функціонал:**
- Огляд (метрики, статус, GPU)
- Агенти (список, деплой, управління)
- Сервіси (Swapper Service з детальними метриками, інші сервіси)
- Метрики (CPU, RAM, Disk, Network)
- Плагіни (встановлені та доступні)
- Інвентаризація (повна інформація про встановлене ПЗ)
**Swapper Service в кабінетах НОД:**
- Статус сервісу (CPU, RAM, VRAM, Uptime)
- Конфігурація (режим, max concurrent, memory buffer, eviction)
- Моделі (таблиця з усіма моделями, статусом, uptime, запитами)
- Спеціалісти (6 спеціалістів з інформацією про моделі та використання)
- Активна модель (якщо є)
- Оновлення в реальному часі (кожні 30 секунд)
### Кабінети МікроДАО
- **DAARION:** `http://localhost:8899/microdao/daarion`
- **GREENFOOD:** `http://localhost:8899/microdao/greenfood`
- **ENERGY UNION:** `http://localhost:8899/microdao/energy-union`
**Функціонал:**
- Огляд (чат з оркестратором, статистика)
- Агенти (список агентів, оркестратор з НОДИ1)
- Канали (список каналів)
- Проєкти (майбутнє)
- Управління мікроДАО (тільки для DAARION - панель управління всіма мікроДАО)
- DAARION Core (тільки для DAARION)
- Налаштування
**Оркестратори:**
- DAARION → DAARWIZZ (agent-daarwizz)
- GREENFOOD → GREENFOOD Assistant (agent-greenfood-assistant)
- ENERGY UNION → Helion (agent-helion)
---
---
## 🎤 Multimodal Services Details (НОДА2)
### STT Service — Speech-to-Text
- **URL:** `http://192.168.1.244:8895`
- **Technology:** OpenAI Whisper AI (base model)
- **Functions:**
- Voice → Text transcription
- Ukrainian, English, Russian support
- Auto-transcription for Telegram bots
- **Endpoints:**
- `POST /api/stt` — Transcribe base64 audio
- `POST /api/stt/upload` — Upload audio file
- `GET /health` — Health check
- **Status:** ✅ Ready for Integration
### OCR Service — Text Extraction
- **URL:** `http://192.168.1.244:8896`
- **Technology:** Tesseract + EasyOCR
- **Functions:**
- Image → Text extraction
- Bounding boxes detection
- Multi-language support (uk, en, ru, pl, de, fr)
- Confidence scores
- **Endpoints:**
- `POST /api/ocr` — Extract text from base64 image
- `POST /api/ocr/upload` — Upload image file
- `GET /health` — Health check
- **Status:** ✅ Ready for Integration
### Web Search Service
- **URL:** `http://192.168.1.244:8897`
- **Technology:** DuckDuckGo + Google Search
- **Functions:**
- Real-time web search
- Region-specific search (ua-uk, us-en)
- JSON structured results
- Up to 10+ results per query
- **Endpoints:**
- `POST /api/search` — Search with JSON body
- `GET /api/search?query=...` — Search with query params
- `GET /health` — Health check
- **Status:** ✅ Ready for Integration
### Vector DB Service — Knowledge Base
- **URL:** `http://192.168.1.244:8898`
- **Technology:** ChromaDB + Sentence Transformers
- **Functions:**
- Vector database for documents
- Semantic search
- Document embeddings (all-MiniLM-L6-v2)
- RAG (Retrieval-Augmented Generation) support
- **Endpoints:**
- `POST /api/collections` — Create collection
- `GET /api/collections` — List collections
- `POST /api/documents` — Add documents
- `POST /api/search` — Semantic search
- `DELETE /api/documents` — Delete documents
- `GET /health` — Health check
- **Status:** ✅ Ready for Integration
---
## 🔄 Router Multimodal Support (NODE1)
### Enhanced /route endpoint
- **URL:** `http://144.76.224.179:9102/route`
- **New Payload Structure:**
```json
{
"agent": "sofia",
"message": "Analyze this image",
"mode": "chat",
"payload": {
"context": {
"system_prompt": "...",
"images": ["data:image/png;base64,..."],
"files": [{"name": "doc.pdf", "data": "..."}],
"audio": "data:audio/webm;base64,..."
}
}
}
```
### Vision Agents
- **Sofia** (grok-4.1, xAI) — Vision + Code + Files
- **Spectra** (qwen3-vl:latest, Ollama) — Vision + Language
### Features:
- 📷 Image processing (PIL)
- 📎 File processing (PDF, TXT, MD)
- 🎤 Audio transcription (via STT Service)
- 🌐 Web search integration
- 📚 Knowledge Base / RAG
**Status:** 🔄 Integration in Progress
---
## 📱 Telegram Gateway Multimodal Updates
### Enhanced Features:
- 🎤 **Voice Messages** → Auto-transcription via STT Service
- 📷 **Photos** → Vision analysis via Sofia/Spectra
- 📎 **Documents** → Text extraction via OCR/Parser
- 🌐 **Web Search** → Real-time search results
### Workflow:
```
Telegram Bot → Voice/Photo/File
Gateway → STT/OCR/Parser Service
Router → Vision/LLM Agent
Response → Telegram Bot
```
**Status:** 🔄 Integration in Progress
---
## 📊 All Services Port Summary
| Service | Port | Node | Technology | Status |
|---------|------|------|------------|--------|
| Frontend | 8899 | Local | React + Vite | ✅ |
| STT Service | 8895 | НОДА2 | Whisper AI | ✅ Ready |
| OCR Service | 8896 | НОДА2 | Tesseract + EasyOCR | ✅ Ready |
| Web Search | 8897 | НОДА2 | DuckDuckGo + Google | ✅ Ready |
| Vector DB | 8898 | НОДА2 | ChromaDB | ✅ Ready |
| Router | 9102 | NODE1 | FastAPI + Ollama | 🔄 Multimodal |
| Telegram Gateway | 9200 | NODE1 | FastAPI + NATS | 🔄 Enhanced |
| Swapper NODE1 | 8890 | NODE1 | LLM Manager | ✅ |
| Swapper NODE2 | 8890 | НОДА2 | LLM Manager | ✅ |
---
**Last Updated:** 2025-11-23 by Auto AI
**Maintained by:** Ivan Tytar & DAARION Team
**Status:** ✅ Production Ready (🔄 Multimodal Integration in Progress)
---
## 🎨 Multimodal Integration (v2.1.0)
### Router Multimodal API (NODE1)
**Version:** 1.1.0-multimodal
**Endpoint:** `http://144.76.224.179:9102/route`
**Features:**
```json
{
"features": [
"multimodal",
"vision",
"stt",
"ocr",
"web-search"
]
}
```
**Request Format:**
```json
{
"agent": "daarwizz",
"message": "User message",
"mode": "chat",
"images": ["data:image/jpeg;base64,..."],
"files": [{"name": "doc.pdf", "content": "base64...", "type": "application/pdf"}],
"audio": "base64_encoded_audio",
"web_search_query": "search query",
"language": "uk"
}
```
**Vision Agents:**
- `sofia` - Sofia Vision Agent (qwen3-vl:8b)
- `spectra` - Spectra Vision Agent (qwen3-vl:8b)
**Обробка:**
- Vision agents → images передаються напряму
- Звичайні agents → images конвертуються через OCR
- Audio → транскрибується через STT
- Files → текст витягується (PDF, TXT, MD)
---
### Telegram Gateway Multimodal (NODE1)
**Location:** `/opt/microdao-daarion/gateway-bot/`
**Handlers:** `gateway_multimodal_handlers.py`
**Supported Content Types:**
- 🎤 Voice messages → STT → Router
- 📸 Photos → Vision/OCR → Router
- 📎 Documents → Text extraction → Router
**Example Flow:**
```
1. User sends voice to @DAARWIZZBot
2. Gateway downloads from Telegram
3. Gateway sends base64 audio to Router
4. Router transcribes via STT (or fallback)
5. Router processes with agent LLM
6. Gateway sends response back to Telegram
```
**Telegram Bot Tokens (реальні з BOT_CONFIGS):**
1. CLAN: `$CLAN_TELEGRAM_BOT_TOKEN` (@CLAN_bot)
2. DAARWIZZ: `$DAARWIZZ_TELEGRAM_BOT_TOKEN` (@DAARWIZZBot)
3. DRUID: `$DRUID_TELEGRAM_BOT_TOKEN` (@DRUIDBot)
4. EONARCH: `$EONARCH_TELEGRAM_BOT_TOKEN` (@EONARCHBot)
5. GREENFOOD: `$GREENFOOD_TELEGRAM_BOT_TOKEN` (@GREENFOODBot) - має CrewAI команду
6. Helion: `$HELION_TELEGRAM_BOT_TOKEN` (@HelionBot)
7. NUTRA: `$NUTRA_TELEGRAM_BOT_TOKEN` (@NUTRABot)
8. Soul: `$SOUL_TELEGRAM_BOT_TOKEN` (@SoulBot)
9. Yaromir: `$YAROMIR_TELEGRAM_BOT_TOKEN` (@YaromirBot) - CrewAI Orchestrator
**ВСЬОГО: 9 Telegram ботів** (перевірено в BOT_CONFIGS)
**Webhook Pattern:** `https://gateway.daarion.city/{bot_id}/telegram/webhook`
**Multimodal Support:**
- ✅ Всі 9 ботів підтримують voice/photo/document через universal webhook
**CrewAI команди (внутрішні агенти, БЕЗ Telegram ботів):**
- **Yaromir** (Orchestrator) → делегує:
- Вождь (Strategic, qwen2.5:14b)
- Проводник (Mentor, qwen2.5:7b)
- Домір (Harmony, qwen2.5:3b)
- Создатель (Innovation, qwen2.5:14b)
- **GREENFOOD** (Orchestrator) → має свою CrewAI команду
**Примітка:** Вождь, Проводник, Домір, Создатель мають промпти (`*_prompt.txt`) але НЕ мають Telegram токенів. Вони працюють тільки всередині CrewAI workflow.
---
### Frontend Multimodal UI
**Location:** `src/components/microdao/`
**Components:**
- `MicroDaoOrchestratorChatEnhanced.tsx` - Enhanced chat with multimodal
- `MultimodalInput.tsx` - Input component (images/files/voice/web-search)
**Features:**
- ✅ Switch toggle для розширеного режиму
- ✅ Image upload (drag & drop, click)
- ✅ File upload (PDF, TXT, MD)
- ✅ Voice recording (Web Audio API)
- ✅ Web search integration
- ✅ Real-time preview
**Usage:**
1. Open `http://localhost:8899/microdao/daarion`
2. Enable "Розширений режим" (switch)
3. Upload images, files, or record voice
4. Send to agent
---
### НОДА2 Multimodal Services
**Location:** MacBook M4 Max (`192.168.1.33`)
| Service | Port | Status | Notes |
|---------|------|--------|-------|
| STT (Whisper) | 8895 | ⚠️ Docker issue | Fallback працює |
| OCR (Tesseract/EasyOCR) | 8896 | ⚠️ Docker issue | Fallback працює |
| Web Search | 8897 | ✅ HEALTHY | DuckDuckGo + Google |
| Vector DB (ChromaDB) | 8898 | ✅ HEALTHY | RAG ready |
**Fallback Mechanism:**
- Router має fallback логіку для недоступних сервісів
- Якщо STT недоступний → повертається помилка (graceful)
- Якщо OCR недоступний → fallback на базовий text extraction
---
### Testing Multimodal
#### 1. Router API
```bash
# Health check
curl http://144.76.224.179:9102/health
# Basic text
curl -X POST http://144.76.224.179:9102/route \
-H 'Content-Type: application/json' \
-d '{"agent":"daarwizz","message":"Привіт","mode":"chat"}'
# With image (Vision)
curl -X POST http://144.76.224.179:9102/route \
-H 'Content-Type: application/json' \
-d '{
"agent":"sofia",
"message":"Опиши це зображення",
"images":["data:image/jpeg;base64,/9j/4AAQ..."],
"mode":"chat"
}'
```
#### 2. Telegram Bots (9 реальних ботів)
**Всі боти (з BOT_CONFIGS):**
```
@CLAN_bot, @DAARWIZZBot, @DRUIDBot, @EONARCHBot,
@GREENFOODBot, @HelionBot, @NUTRABot, @SoulBot, @YaromirBot
```
**Тести:**
1. Send voice message: "Привіт, як справи?"
2. Send photo with caption: "Що на цьому фото?"
3. Send document: "Проаналізуй цей документ"
**CrewAI Workflow (через @YaromirBot):**
```
User → @YaromirBot (Telegram)
Yaromir Orchestrator
↓ (CrewAI delegation)
┌────┴────┬────────┬─────────┐
↓ ↓ ↓ ↓
Вождь Проводник Домир Создатель
(Internal CrewAI agents - NO Telegram bots)
Yaromir → Response → Telegram
```
**Примітка:** Вождь, Проводник, Домір, Создатель НЕ є окремими Telegram ботами. Вони працюють тільки всередині CrewAI коли Yaromir делегує завдання.
#### 3. Frontend
```
1. Open http://localhost:8899/microdao/daarion
2. Enable "Розширений режим"
3. Upload image
4. Upload file
5. Record voice
```
---
### Implementation Files
**Router (NODE1):**
- `/app/multimodal/handlers.py` - Multimodal обробники
- `/app/http_api.py` - Updated with multimodal support
**Gateway (NODE1):**
- `/opt/microdao-daarion/gateway-bot/gateway_multimodal_handlers.py`
- `/opt/microdao-daarion/gateway-bot/http_api.py` (updated)
**Frontend:**
- `src/pages/MicroDaoCabinetPage.tsx`
- `src/components/microdao/MicroDaoOrchestratorChatEnhanced.tsx`
- `src/components/microdao/chat/MultimodalInput.tsx`
**НОДА2 Services:**
- `services/stt-service/`
- `services/ocr-service/`
- `services/web-search-service/`
- `services/vector-db-service/`
---
### Documentation
**Created Files:**
- `/tmp/MULTIMODAL-INTEGRATION-FINAL-REPORT.md`
- `/tmp/TELEGRAM-GATEWAY-MULTIMODAL-INTEGRATION.md`
- `/tmp/MULTIMODAL-INTEGRATION-SUCCESS.md`
- `/tmp/COMPLETE-MULTIMODAL-ECOSYSTEM.md`
- `ROUTER-MULTIMODAL-SUPPORT.md`
**Time Invested:** ~6.5 hours
**Status:** 95% Complete
**Production Ready:** ✅ Yes (with fallbacks)
---
## 🔒 Security & Incident Response
### Incident #1: Network Scanning & Server Lockdown (Dec 6, 2025 - Jan 8, 2026)
**Timeline:**
- **Dec 6, 2025 10:56 UTC**: Automated SSH scanning detected from server
- **Dec 6, 2025 11:00 UTC**: Hetzner locked server IP (144.76.224.179)
- **Jan 8, 2026 18:00 UTC**: Unlock request approved, server recovered
**Root Cause:**
- Server compromised with cryptocurrency miner (`catcal`, `G4NQXBp`) via `daarion-web` container
- Miner performed network scanning of Hetzner internal network (10.126.0.0/16)
- ~500+ SSH connection attempts to internal IP range triggered automated block
- High CPU load (35+) from mining process
**Impact:**
- ❌ Server unavailable for 33 days
- ❌ All services down
- ❌ Telegram bots offline
- ❌ Lost production data/monitoring
**Resolution:**
1. ✅ Server recovered via rescue mode
2. ✅ Compromised `daarion-web` container stopped and removed
3. ✅ Cryptocurrency miner processes killed
4. ✅ Firewall rules implemented to block internal network access
5. ✅ Monitoring script deployed for future scanning attempts
**Prevention Measures:**
**Firewall Rules:**
```bash
# Block Hetzner internal networks
iptables -I OUTPUT -d 10.0.0.0/8 -j DROP
iptables -I OUTPUT -d 172.16.0.0/12 -j DROP
# Allow only necessary ports
iptables -I OUTPUT -d 10.0.0.0/8 -p tcp --dport 443 -j ACCEPT
iptables -I OUTPUT -d 10.0.0.0/8 -p tcp --dport 80 -j ACCEPT
# Log blocked attempts
iptables -I OUTPUT -d 10.0.0.0/8 -j LOG --log-prefix "BLOCKED_INTERNAL_SCAN: "
# Save rules
iptables-save > /etc/iptables/rules.v4
```
**Monitoring:**
- Script: `/root/monitor_scanning.sh`
- Runs every 15 minutes via cron
- Logs to `/var/log/scan_attempts.log`
- Checks for:
- Suspicious network activity in Docker logs
- iptables blocked connection attempts
- Keywords: `10.126`, `172.16`, `scan`, `probe`
**Security Checklist:**
- [ ] Review all Docker images for vulnerabilities
- [ ] Implement container security scanning (Trivy/Clair)
- [ ] Enable Docker Content Trust
- [ ] Set up intrusion detection (fail2ban)
- [ ] Regular security audits
- [ ] Container resource limits (CPU/memory)
- [ ] Network segmentation for containers
**References:**
- Hetzner Incident ID: `L00280548`
- Guideline: https://docs.hetzner.com/robot/dedicated-server/troubleshooting/guideline-in-case-of-server-locking/
- Recovery Scripts: `/root/prevent_scanning.sh`, `/root/monitor_scanning.sh`
**Lessons Learned:**
1. 🔴 **Never expose containers without security scanning**
2. 🟡 **Implement egress firewall rules from day 1**
3. 🟢 **Monitor outgoing connections, not just incoming**
4. 🔵 **Have disaster recovery plan documented**
5. 🟣 **Regular security audits are critical**
---
### Incident #2: Recurring Compromise After Container Restart (Jan 9, 2026)
**Timeline:**
- **Jan 9, 2026 09:35 UTC**: NEW abuse report received (AbuseID: 10F3971:2A)
- **Jan 9, 2026 09:40 UTC**: Server reachable, `daarion-web` container auto-restarted after server reboot
- **Jan 9, 2026 09:45 UTC**: NEW crypto miners detected (`softirq`, `vrarhpb`), critical CPU load (25-35)
- **Jan 9, 2026 09:50 UTC**: Emergency mitigation started
- **Jan 9, 2026 10:05 UTC**: All malicious processes stopped, container/images removed permanently
- **Jan 9, 2026 10:15 UTC**: Retry test registered with Hetzner, system load normalized
- **Deadline**: 2026-01-09 12:54 UTC for statement submission
**Root Cause:**
- **Compromised Docker Image**: `daarion-web:latest` image itself was compromised or had vulnerability
- **Automatic Restart**: Container had `restart: unless-stopped` policy in docker-compose.yml
- **Insufficient Cleanup**: Incident #1 removed container but left Docker image intact
- **Server Reboot**: Between incidents, server rebooted → docker-compose auto-restarted from infected image
- **Re-infection**: NEW malware variant installed (different miners than Incident #1)
**Discovery Details:**
```bash
# System state at discovery
root@NODE1:~# uptime
10:40:02 up 1 day, 2:15, 2 users, load average: 30.52, 32.61, 33.45
# Malicious processes (user 1001 = daarion-web container)
root@NODE1:~# ps aux | grep "1001"
1001 1234567 99.9 2.5 softirq [running]
1001 1234568 99.8 2.3 vrarhpb [running]
# Zombie processes
root@NODE1:~# ps aux | grep defunct | wc -l
1499
# Container status
root@NODE1:~# docker ps
CONTAINER ID IMAGE ... STATUS
78e22c0ee972 daarion-web ... Up 2 hours
```
**Impact:**
- ❌ **Second abuse report from Hetzner** (risk of permanent IP ban)
- ❌ CPU load: 25-35 (critical, normal is 1-5)
- ❌ 1499 zombie processes
- ❌ Network scanning resumed (SSH probing)
- ⚠️ **Server lockdown deadline**: 2026-01-09 12:54 UTC (~3.5 hours)
**Emergency Mitigation (Completed):**
```bash
# 1. Kill malicious processes
killall -9 softirq vrarhpb
kill -9 $(ps aux | awk '$1 == "1001" {print $2}')
# 2. Stop and remove container PERMANENTLY
docker stop daarion-web
docker rm daarion-web
# 3. DELETE Docker images (critical step missed in Incident #1)
docker rmi 78e22c0ee972 # daarion-web:latest
docker rmi 608e203fb5ac # microdao-daarion-web:latest
# 4. Clean zombie processes
kill -9 $(ps aux | awk '$8 == "Z" {print $3}')
# 5. Verify system load normalized
uptime # Load: 4.19 (NORMAL)
ps aux | grep defunct | wc -l # 5 zombies (NORMAL)
# 6. Enhanced firewall rules
/root/block_ssh_scanning.sh # SSH rate limiting + port scan blocking
# 7. Register retry test with Hetzner
curl https://statement-abuse.hetzner.com/retries/?token=28b2c7e67a409659f6c823e863887
# Result: {"status":"registered","next_check":"2026-01-09T11:00:00Z"}
```
**Current Status:**
- ✅ All malicious processes terminated
- ✅ Container removed permanently
- ✅ Docker images deleted (NOT just stopped)
- ✅ System load: 4.19 (normalized from 30+)
- ✅ Zombie processes: 5 (cleaned from 1499)
- ✅ Enhanced firewall active (SSH rate limiting, port scan blocking)
- ✅ Retry test registered and verified
- ⏳ **PENDING**: User statement submission to Hetzner (URGENT)
**What is daarion-web?**
- Next.js frontend application (port 3000)
- Provides web UI for MicroDAO agents
- **NOT critical for core functionality**:
- ✅ Router (port 9102) - RUNNING
- ✅ Gateway (port 8883) - RUNNING
- ✅ All 9 Telegram bots - WORKING
- ✅ Orchestrator API (port 8899) - RUNNING
- **Status**: DISABLED until secure rebuild completed
**Prevention Measures (Enhanced):**
**1. Container Restart Prevention:**
```yaml
# docker-compose.yml - UPDATED
services:
daarion-web:
restart: "no" # Changed from "unless-stopped"
# OR remove service entirely until rebuilt
```
**2. Firewall Enhancement:**
```bash
# /root/block_ssh_scanning.sh
# - SSH rate limiting (max 4 attempts/min)
# - Port scan detection and blocking
# - Enhanced logging
```
**3. Mandatory Cleanup Procedure:**
```bash
# When removing compromised containers:
1. docker stop <container>
2. docker rm <container>
3. docker rmi <image> # ⚠️ CRITICAL - remove image too!
4. Verify: docker images # Check image deleted
5. Edit docker-compose.yml # Set restart: "no"
6. Monitor: ps aux, uptime # Verify no recurrence
```
**4. Docker Image Security:**
- [ ] Scan all images with Trivy before deployment
- [ ] Rebuild daarion-web from CLEAN source code only
- [ ] Enable Docker Content Trust (signed images)
- [ ] Use read-only filesystem where possible
- [ ] Drop all unnecessary capabilities
- [ ] Implement resource limits (CPU/memory)
**Next Steps:**
1. 🔴 **URGENT**: Submit statement to Hetzner before deadline (2026-01-09 12:54 UTC)
- URL: https://statement-abuse.hetzner.com/statements/?token=28b2c7e67a409659f6c823e863887
- Content: See `/Users/apple/github-projects/microdao-daarion/TASK_REBUILD_DAARION_WEB.md`
2. 🟡 Monitor server for 24 hours post-statement
3. 🟢 Complete daarion-web secure rebuild (see `TASK_REBUILD_DAARION_WEB.md`)
4. 🔵 Security audit all remaining containers
5. 🟣 Implement automated security scanning pipeline
**References:**
- Hetzner Incident ID: `10F3971:2A` (AbuseID)
- Deadline: 2026-01-09 12:54:00 UTC
- Statement URL: https://statement-abuse.hetzner.com/statements/?token=28b2c7e67a409659f6c823e863887
- Retry Test: https://statement-abuse.hetzner.com/retries/?token=28b2c7e67a409659f6c823e863887
- Task Document: `/Users/apple/github-projects/microdao-daarion/TASK_REBUILD_DAARION_WEB.md`
- Recovery Scripts: `/root/prevent_scanning.sh`, `/root/block_ssh_scanning.sh`, `/root/monitor_scanning.sh`
**Lessons Learned (Incident #2 Specific):**
1. 🔴 **ALWAYS delete Docker images, not just containers** - Critical oversight
2. 🟡 **Auto-restart policies are dangerous for compromised containers**
3. 🟢 **Compromised images can survive container removal**
4. 🔵 **Different malware variants can re-infect from same image**
5. 🟣 **Complete removal = container + image + restart policy change**
6. ⚫ **Immediate image deletion prevents automatic re-compromise**
---
### Incident #3: Postgres:15-alpine Compromised Image (Jan 9, 2026)
**Timeline:**
- **Jan 9, 2026 20:00 UTC**: Routine security check discovered high CPU load
- **Jan 9, 2026 20:47 UTC**: Load average 17+ detected, investigation started
- **Jan 9, 2026 20:52 UTC**: Crypto miner `cpioshuf` discovered (1764% CPU)
- **Jan 9, 2026 20:54 UTC**: First cleanup - killed process, removed files
- **Jan 9, 2026 20:54 UTC**: Miner auto-restarted as `ipcalcpg_recvlogical`
- **Jan 9, 2026 21:00 UTC**: Stopped all postgres:15-alpine containers
- **Jan 9, 2026 21:00 UTC**: Deleted compromised image
- **Jan 9, 2026 21:54 UTC**: **NEW variant discovered** - `mysql` (933% CPU)
- **Jan 9, 2026 22:06 UTC**: Migrated to postgres:14-alpine
- **Jan 9, 2026 22:07 UTC**: System clean, load normalized to 0.40
**Root Cause:**
- **Compromised Official Image**: `postgres:15-alpine` (SHA: b3968e348b48f1198cc6de6611d055dbad91cd561b7990c406c3fc28d7095b21)
- **Either**: Image on Docker Hub compromised **OR** PostgreSQL 15 has unpatched vulnerability
- **Persistent Infection**: Malware embedded in image layers, survives container restarts
- **Auto-restart**: Orphan containers kept respawning with compromised image
**Malware Variants Discovered (3 different):**
1. **`cpioshuf`** (user 70, /tmp/.perf.c/cpioshuf) - 1764% CPU
2. **`ipcalcpg_recvlogical`** (user 70, /tmp/.perf.c/ipcalcpg_recvlogical) - immediate restart after #1
3. **`mysql`** (user 70, /tmp/mysql) - 933% CPU, discovered 1 hour later
**Affected Containers:**
- `daarion-postgres` (postgres:15-alpine) - main victim
- `dagi-postgres` (postgres:15-alpine) - also using same image
- `docker-db-1` (postgres:15-alpine) - Dify database
**Impact:**
- ❌ CPU load: 17+ (critical)
- ❌ Multiple crypto miners running simultaneously
- ❌ System performance degraded for ~2 hours
- ❌ 10 zombie processes (wget spawned by miners)
- ⚠️ **Dify also affected** (used same compromised image)
**Emergency Response:**
```bash
# Discovery
root@NODE1:~# top -b -n 1 | head -10
PID USER %CPU COMMAND
2294271 70 1764 cpioshuf # MINER #1
root@NODE1:~# ls -la /proc/2294271/exe
lrwxrwxrwx 1 70 70 0 Jan 9 20:53 /proc/2294271/exe -> /tmp/.perf.c/cpioshuf
# Kill and cleanup (repeated 3 times for 3 variants)
kill -9 2294271 2310302 2314793 2366898
rm -rf /tmp/.perf.c /tmp/mysql
# Remove ALL postgres:15-alpine
docker stop daarion-postgres dagi-postgres docker-db-1
docker rm daarion-postgres dagi-postgres docker-db-1
docker rmi b3968e348b48 -f
# Verify clean
uptime # Load: 0.40 (CLEAN!)
ps aux | awk '$3 > 50' # No processes
# Switch to postgres:14-alpine
sed -i 's/postgres:15-alpine/postgres:14-alpine/g' docker-compose.yml
docker pull postgres:14-alpine
docker compose up -d postgres
```
**Current Status:**
- ✅ All 3 miner variants killed
- ✅ All postgres:15-alpine containers removed
- ✅ Compromised image deleted and BLOCKED
- ✅ Migrated to postgres:14-alpine
- ✅ Dify removed entirely (precautionary)
- ✅ System load: 0.40 (normalized from 17+)
- ✅ No active miners detected
**Why This Happened:**
- Incident #2 focused on `daarion-web`, missed that postgres also compromised
- Multiple docker-compose files spawned orphan `daarion-postgres` containers
- Compromised image kept respawning miners after cleanup
- Official Docker Hub image either:
- Was temporarily compromised, OR
- PostgreSQL 15 has supply chain vulnerability
**CRITICAL: Postgres:15-alpine BANNED:**
```bash
# NEVER USE THIS IMAGE AGAIN
postgres:15-alpine
SHA: b3968e348b48f1198cc6de6611d055dbad91cd561b7990c406c3fc28d7095b21
# Use instead:
postgres:14-alpine ✅ SAFE (verified)
postgres:16-alpine ⚠️ Need to test
```
**Prevention Measures:**
1. **Image Pinning by SHA** (not tag)
2. **Security scanning before deployment** (Trivy, Grype)
3. **Regular audit of running containers**
4. **Monitor CPU spikes** (alert if >5 load average)
5. **Block orphan container spawning**
6. **Use specific SHAs, not :latest or :15-alpine tags**
**Files to Monitor:**
```bash
# Common miner locations found
/tmp/.perf.c/
/tmp/mysql
/tmp/*perf*
/tmp/cpio*
/tmp/ipcalc*
# Check regularly
find /tmp -type f -executable -mtime -1
ps aux | awk '$3 > 50'
```
**Additional Actions Taken:**
- ✅ Removed entire Dify installation (used same postgres:15-alpine)
- ✅ Cleaned all /tmp suspicious files
- ✅ Audited all postgres containers
- ✅ Switched all services to postgres:14-alpine
**Lessons Learned (Incident #3 Specific):**
1. 🔴 **Official images can be compromised** - Never trust blindly
2. 🟡 **Scan images before use** - Trivy/Grype mandatory
3. 🟢 **Pin images by SHA, not tag** - :15-alpine can change
4. 🔵 **Orphan containers are dangerous** - Use --remove-orphans
5. 🟣 **Multiple malware variants** - Miners have fallback payloads
6. ⚫ **Monitor /tmp for executables** - Common miner location
7. ⚪ **One compromise can spread** - Dify used same image
**Next Steps:**
1. 🔴 Report postgres:15-alpine to Docker Security team
2. 🟡 Implement Trivy scanning in CI/CD
3. 🟢 Pin all images by SHA in all docker-compose files
4. 🔵 Set up automated CPU spike alerts
5. 🟣 Regular /tmp cleanup cron job
6. ⚫ Audit all remaining containers for other compromised images
---
### Incident #4: ALL PostgreSQL Images Show Malware — NODE1 Host Compromise Suspected (Jan 10, 2026)
**Timeline:**
- **Jan 10, 2026**: Testing postgres:16-alpine — malware artifacts found
- **Jan 10, 2026**: Testing postgres:14 (non-alpine) — malware artifacts found
- **Jan 10, 2026**: Testing postgres:16 (Debian) — malware artifacts found
**Confirmed "Compromised" Images (on NODE1):**
```bash
# ALL of these show malware artifacts when run on NODE1:
❌ postgres:15-alpine # Incident #3
❌ postgres:16-alpine # NEW
❌ postgres:14 # NEW (non-alpine!)
❌ postgres:16 # NEW (Debian base!)
```
**Malware Artifacts (IOC):**
```bash
/tmp/httpd # ~10MB, crypto miner (xmrig variant)
/tmp/.perf.c/ # perfctl malware staging directory
```
**🔴 CRITICAL ASSESSMENT:**
**This is NOT "all Docker Hub official images are infected".**
**This is most likely NODE1 HOST COMPROMISE** (perfctl/cryptominer persistence).
**Evidence supporting HOST compromise (not image compromise):**
| Evidence | Explanation |
|----------|-------------|
| `/tmp/.perf.c/` directory | Classic perfctl malware staging directory |
| `/tmp/httpd` ~10MB | Typical xmrig miner with Apache masquerade |
| ALL postgres variants affected | Statistically impossible for Docker Hub |
| NODE1 had 3 previous incidents | Already compromised (Incidents #1, #2, #3) |
| `tmpfs noexec` didn't help | Malware runs from HOST, not container |
| Same IOCs across different images | Infection happens post-pull, not in image |
**Probable Attack Vector (perfctl family):**
- Initial compromise via Incident #1 or #2 (daarion-web container)
- Persistence mechanism survived container/image cleanup
- Malware hooks into Docker daemon or uses cron/systemd
- Infects ANY new container on startup via:
- Modified docker daemon
- LD_PRELOAD injection
- Kernel module
- Cron job that monitors new containers
**🔬 VERIFICATION PROCEDURE (REQUIRED):**
```bash
# ═══════════════════════════════════════════════════════════════
# STEP 1: Get image digest from NODE1
# ═══════════════════════════════════════════════════════════════
ssh root@144.76.224.179 "docker inspect --format='{{index .RepoDigests 0}}' postgres:16"
# Example output: postgres@sha256:abc123...
# ═══════════════════════════════════════════════════════════════
# STEP 2: On CLEAN host (MacBook/NODE2), pull SAME digest
# ═══════════════════════════════════════════════════════════════
# On your MacBook (NOT NODE1!):
docker pull postgres:16@sha256:<digest_from_step1>
# ═══════════════════════════════════════════════════════════════
# STEP 3: Run on clean host and check /tmp
# ═══════════════════════════════════════════════════════════════
docker run --rm -it postgres:16@sha256:<digest> sh -c "ls -la /tmp/ && find /tmp -type f"
# EXPECTED RESULTS:
# - If /tmp is EMPTY on clean host → IMAGE IS CLEAN → NODE1 IS COMPROMISED
# - If /tmp has httpd/.perf.c on clean host → IMAGE IS COMPROMISED → Report to Docker
# ═══════════════════════════════════════════════════════════════
# STEP 4: Check NODE1 host for persistence mechanisms
# ═══════════════════════════════════════════════════════════════
ssh root@144.76.224.179 << 'REMOTE_CHECK'
echo "=== CRON ==="
crontab -l 2>/dev/null
cat /etc/crontab
ls -la /etc/cron.d/
echo "=== SYSTEMD ==="
systemctl list-units --type=service | grep -iE "perf|miner|http|crypto"
echo "=== LD_PRELOAD ==="
cat /etc/ld.so.preload 2>/dev/null
echo $LD_PRELOAD
echo "=== KERNEL MODULES ==="
lsmod | head -20
echo "=== SUSPICIOUS PROCESSES ==="
ps aux | grep -E "(httpd|xmrig|kdevtmp|kinsing|perfctl|\.perf)" | grep -v grep
echo "=== NETWORK TO MINING POOLS ==="
ss -anp | grep -E "(3333|4444|5555|8080|8888)" | head -10
echo "=== SSH AUTHORIZED KEYS ==="
cat /root/.ssh/authorized_keys
echo "=== DOCKER DAEMON CONFIG ==="
cat /etc/docker/daemon.json 2>/dev/null
REMOTE_CHECK
```
**🔴 DECISION MATRIX:**
| Verification Result | Conclusion | Action |
|---------------------|------------|--------|
| Clean host: no malware | **NODE1 COMPROMISED** | Full rebuild of NODE1 |
| Clean host: same malware | **Docker Hub compromised** | Report to Docker Security |
**If NODE1 Confirmed Compromised (most likely):**
1. 🔴 **STOP using NODE1 immediately** for any workloads
2. 🔴 **Rotate ALL secrets** that NODE1 ever accessed:
```
- SSH keys (generate new on clean machine)
- Telegram bot tokens (regenerate via @BotFather)
- PostgreSQL passwords
- All API keys in .env
- JWT secrets
- Neo4j credentials
- Redis password (if any)
```
3. 🔴 **Full OS reinstall** (not cleanup!):
- Request fresh install from Hetzner Robot
- Or use rescue mode + full disk wipe
- New SSH keys generated on clean machine
4. 🟡 **Verify images on clean host BEFORE deploying to new NODE1**
5. 🟢 **Implement proper security controls** (see Prevention below)
**Alternative PostgreSQL Sources (if Docker Hub suspected):**
```bash
# GitHub Container Registry (GHCR)
docker pull ghcr.io/docker-library/postgres:16-alpine
# Quay.io (Red Hat operated)
docker pull quay.io/fedora/postgresql-16
# Build from official Dockerfile (most secure)
git clone https://github.com/docker-library/postgres.git
cd postgres/16/alpine
docker build -t postgres:16-alpine-verified .
# Then scan with Trivy before use
trivy image postgres:16-alpine-verified
```
**NODE1 Persistence Locations to Check:**
```bash
# File-based persistence
/etc/cron.d/*
/etc/crontab
/var/spool/cron/*
/etc/systemd/system/*.service
/etc/init.d/*
/etc/rc.local
/root/.bashrc
/root/.profile
/etc/ld.so.preload
# Memory/process persistence
/dev/shm/*
/run/*
/var/run/*
# Docker-specific
/var/lib/docker/
/etc/docker/daemon.json
~/.docker/config.json
# Kernel-level (advanced)
/lib/modules/*/
/proc/modules
```
**References:**
- perfctl malware: https://blog.exatrack.com/Perfctl-using-portainer-and-new-persistences/
- Similar reports: https://github.com/docker-library/postgres/issues/1307
- Docker Hub attacks: https://jfrog.com/blog/attacks-on-docker-with-millions-of-malicious-repositories-spread-malware-and-phishing-scams/
**Lessons Learned (Incident #4 Specific):**
1. 🔴 **Host compromise masquerades as image compromise** — Always verify on clean host
2. 🟡 **Previous incidents leave persistence** — Cleanup is not enough, rebuild required
3. 🟢 **perfctl family is sophisticated** — Survives container restarts, image deletions
4. 🔵 **Multiple images "infected" = host problem** — Statistical impossibility otherwise
5. 🟣 **NODE1 is UNTRUSTED** — Do not use until full rebuild + verification
**Current Status:**
-**Verification pending** — Need to test same digest on clean host
- 🔴 **NODE1 unsafe** — Do not deploy PostgreSQL or any new containers
- 🟡 **Secrets rotation needed** — Assume all NODE1 secrets compromised
---