Commit Graph

203 Commits

Author SHA1 Message Date
Apple
6ac7c8f4b5 docs: expand lint scope batch25 (2 files) 2026-02-16 07:01:54 -08:00
Apple
949b0a608e docs: expand lint scope batch24 (2 files) 2026-02-16 07:00:18 -08:00
Apple
842704d7e7 docs: expand lint scope batch23 (2 files) 2026-02-16 06:57:58 -08:00
Apple
1de120ddfc docs: expand lint scope batch22 (2 files) 2026-02-16 06:56:43 -08:00
Apple
d5bd8748a1 docs: expand lint scope batch21 (2 files) 2026-02-16 06:55:23 -08:00
Apple
1fbd3009b8 docs: expand lint scope batch20 (2 files) 2026-02-16 06:41:54 -08:00
Apple
8789487037 docs: expand lint scope batch19 (2 files) 2026-02-16 06:40:21 -08:00
Apple
6e0096eea7 docs: expand lint scope batch18 (2 files) 2026-02-16 06:38:39 -08:00
Apple
db4107ba4a docs: expand lint scope batch17 (2 files) 2026-02-16 06:22:45 -08:00
Apple
162f3567df docs: expand lint scope batch16 (2 files) 2026-02-16 06:21:10 -08:00
Apple
8d2ae25cdc docs: expand lint scope batch15 (2 files) 2026-02-16 06:14:15 -08:00
Apple
e9bbcde418 docs: expand lint scope batch14 (2 files) 2026-02-16 05:49:02 -08:00
Apple
a4aa1c42aa docs: expand lint scope batch13 (2 files) 2026-02-16 05:47:10 -08:00
Apple
f2a450c159 docs: expand lint scope batch12 (2 files) 2026-02-16 05:45:39 -08:00
Apple
d84a83f639 docs: expand lint scope batch11 (2 files) 2026-02-16 05:44:43 -08:00
Apple
fa07356205 docs: expand lint scope batch10 (2 files) 2026-02-16 04:32:13 -08:00
Apple
16574d1db2 docs: expand lint scope batch9 (2 files) 2026-02-16 04:28:01 -08:00
Apple
210c6426b7 docs: expand lint scope batch8 (2 files) 2026-02-16 04:27:12 -08:00
Apple
083a622817 docs: expand lint scope batch7 (2 files) 2026-02-16 04:05:40 -08:00
Apple
e6221fef67 docs: expand lint scope batch6 (2 files) 2026-02-16 04:02:10 -08:00
Apple
de7533f97e docs: add session preflight and expand lint scope batch5 2026-02-16 03:53:56 -08:00
Apple
9c9f4fa182 docs: expand lint scope batch4 (3 files) 2026-02-16 03:47:51 -08:00
Apple
831f361f0f docs: expand lint scope batch3 (6 files) 2026-02-16 03:44:58 -08:00
Apple
1a00cd4413 docs: expand lint scope batch2 (12 files) 2026-02-16 02:53:53 -08:00
Apple
08dcfea960 docs: expand lint scope batch1 (13 files) 2026-02-16 02:40:49 -08:00
Apple
b722e28338 docs: add local scheduled maintenance runner (no auto-push) 2026-02-16 02:37:29 -08:00
Apple
5f2fd7905f docs: sync consolidation and session starter 2026-02-16 02:32:27 -08:00
Apple
3146e74ce8 docs: sync consolidation and session starter 2026-02-16 02:32:27 -08:00
Apple
fc2d86bd1b docs: sync consolidation and session starter 2026-02-16 02:32:08 -08:00
Apple
8ba71f240f docs: sync consolidation and session starter 2026-02-16 02:32:08 -08:00
Apple
a46a70c014 fix(ops): Add network aliases and stabilize DNS for NODA1
- docker-compose.node1.yml: Add network aliases (router, gateway,
  memory-service, qdrant, nats, neo4j) to eliminate manual
  `docker network connect --alias` commands
- docker-compose.node1.yml: ROUTER_URL now uses env variable with
  fallback: ${ROUTER_URL:-http://router:8000}
- docker-compose.node1.yml: Increase router healthcheck start_period
  to 30s and retries to 5
- .gitignore: Add noda1-credentials.local.mdc (local-only SSH creds)
- scripts/node1/verify_agents.sh: Improved output with agent list
- docs: Add NODA1-AGENT-VERIFICATION.md, NODA1-AGENT-ARCHITECTURE.md,
  NODA1-VERIFICATION-REPORT-2026-02-03.md
- config/README.md: How to add new agents
- .cursor/rules/, .cursor/skills/: NODA1 operations skill for Cursor

Root cause fixed: Gateway could not resolve 'router' DNS name when
Router container was named 'dagi-staging-router' without alias.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-03 05:55:56 -08:00
Apple
0c8bef82f4 feat: Add Alateya, Clan, Eonarch agents + fix gateway-router connection
## Agents Added
- Alateya: R&D, biotech, innovations
- Clan (Spirit): Community spirit agent
- Eonarch: Consciousness evolution agent

## Changes
- docker-compose.node1.yml: Added tokens for all 3 new agents
- gateway-bot/http_api.py: Added configs and webhook endpoints
- gateway-bot/clan_prompt.txt: New prompt file
- gateway-bot/eonarch_prompt.txt: New prompt file

## Fixes
- Fixed ROUTER_URL from :9102 to :8000 (internal container port)
- All 9 Telegram agents now working

## Documentation
- Created PROJECT-MASTER-INDEX.md - single entry point
- Added various status documents and scripts

Tokens configured:
- Helion, NUTRA, Agromatrix (existing)
- Alateya, Clan, Eonarch (new)
- Druid, GreenFood, DAARWIZZ (configured)
2026-01-28 06:40:34 -08:00
Apple
5290287058 feat: implement TTS, Document processing, and Memory Service /facts API
- TTS: xtts-v2 integration with voice cloning support
- Document: docling integration for PDF/DOCX/PPTX processing
- Memory Service: added /facts/upsert, /facts/{key}, /facts endpoints
- Added required dependencies (TTS, docling)
2026-01-17 08:16:37 -08:00
Apple
3478dfce5f 🔒 КРИТИЧНО: Видалено паролі/API ключі з документів + закрито NodePort
Some checks failed
Build and Deploy Docs / build-and-deploy (push) Has been cancelled
- Видалено всі паролі та API ключі з документів
- Замінено на посилання на Vault
- Закрито NodePort для Memory Service (тільки internal)
- Створено SECURITY-ROTATION-PLAN.md
- Створено ARCHITECTURE-150-NODES.md (план для 150 нод)
- Оновлено config.py (видалено hardcoded Cohere key)
2026-01-10 09:46:03 -08:00
Apple
f7bf935a21 NODE3: Memory Service мігровано з Docker в K8s
Some checks failed
Build and Deploy Docs / build-and-deploy (push) Has been cancelled
- NODE3 додано до K3s кластера як worker (llm80-che-1-1)
- Memory Service працює в K8s на NODE3 (pod: memory-service-node3-*)
- Docker контейнер зупинено та видалено
- Оновлено MEMORY-MODULE-STATUS.md v3.1.0
2026-01-10 09:26:59 -08:00
Apple
116bf5f3f3 Memory Service запущено на всіх нодах + Cohere API налаштовано
Some checks failed
Build and Deploy Docs / build-and-deploy (push) Has been cancelled
- NODE1: Memory Service в K8s (port 30800) 
- NODE2: Memory Service в Docker (port 8001) 
- NODE3: Memory Service в Docker (port 8001) 
- Всі ноди: Cohere API налаштовано для embeddings 
- NODE2: ComfyUI перевірено (macOS App, port 8000) 
- Оновлено MEMORY-MODULE-STATUS.md v3.0.0
2026-01-10 09:13:20 -08:00
Apple
6b02349300 🧠 Update Memory Module Status v2.1.0
Some checks failed
Build and Deploy Docs / build-and-deploy (push) Has been cancelled
- NODE2: PostgreSQL + Agent Memory Schema 
- NODE3: ComfyUI installed (v0.8.2, PyTorch+CUDA) 
- All nodes now have full memory stack
- Added critical TODOs: Memory Service & Cohere API
2026-01-10 09:00:17 -08:00
Apple
f4ccf7c570 🧠 Complete Memory Stack setup across all nodes
Some checks failed
Build and Deploy Docs / build-and-deploy (push) Has been cancelled
- NODE1: Neo4j (K8s), NVIDIA RTX 4000 + CUDA 13.1
- NODE2: Fixed Neo4j & Qdrant containers
- NODE3: Full stack (PostgreSQL + Qdrant + Neo4j)
- Updated MEMORY-MODULE-STATUS.md v2.0.0
2026-01-10 08:26:42 -08:00
Apple
8aee29d42d 📊 Add Memory Module Status Report across all nodes
Some checks failed
Build and Deploy Docs / build-and-deploy (push) Has been cancelled
2026-01-10 08:11:12 -08:00
Apple
1c247ea40c 📝 Update context docs with session logging system
Some checks failed
Build and Deploy Docs / build-and-deploy (push) Has been cancelled
- Added Session Logging System section to INFRASTRUCTURE.md
- Added Git Multi-Remote configuration (GitHub + Gitea + GitLab)
- Updated version to 2.5.0
- Added logging commands reference
- Updated infrastructure_quick_ref.ipynb with new features
- Added SSH tunnel instructions for GitLab access
2026-01-10 04:58:01 -08:00
Apple
744c149300 Add automated session logging system
Some checks failed
Build and Deploy Docs / build-and-deploy (push) Has been cancelled
- Created logs/ structure (sessions, operations, incidents)
- Added session-start/log/end scripts
- Installed Git hooks for auto-logging commits/pushes
- Added shell integration for zsh
- Created CHANGELOG.md
- Documented today's session (2026-01-10)
2026-01-10 04:53:17 -08:00
Apple
778907cf0e docs: add NODE3 (Threadripper PRO + RTX 3090) to infrastructure
Added NODE3 - AI/ML Workstation Specification:

Hardware:
- CPU: AMD Ryzen Threadripper PRO 5975WX (32 cores / 64 threads, 3.6 GHz boost)
- RAM: 128GB DDR4
- GPU: NVIDIA GeForce RTX 3090 24GB GDDR6X
  - 10496 CUDA cores
  - CUDA 13.0, Driver 580.95.05
- Storage: Samsung SSD 990 PRO 4TB NVMe
  - Root: 100GB (27% used)
  - Available for expansion: 3.5TB

System:
- Hostname: llm80-che-1-1
- IP: 80.77.35.151:33147
- OS: Ubuntu 24.04.3 LTS (Noble Numbat)
- Container Runtime: MicroK8s + containerd
- Uptime: 24/7

Security Status:  CLEAN (verified 2026-01-09)
- No crypto miners detected
- 0 zombie processes
- CPU load: 0.17 (very low)
- GPU utilization: 0% (ready for workloads)

Services Running:
- Port 3000 - Unknown service (needs investigation)
- Port 8080 - Unknown service (needs investigation)
- Port 11434 - Ollama (localhost only)
- Port 27017/27019 - MongoDB (localhost only)
- Kubernetes API: 16443
- K8s services: 10248-10259, 25000

Recommended Use Cases:
- 🤖 Large LLM inference (Llama 70B, Qwen 72B, Mixtral 8x22B)
- 🧠 Model training and fine-tuning
- 🎨 Stable Diffusion XL image generation
- 🔬 AI/ML research and experimentation
- 🚀 Kubernetes-based AI service orchestration

Files Updated:
- INFRASTRUCTURE.md v2.4.0
- docs/infrastructure_quick_ref.ipynb v2.3.0

NODE3 is the most powerful node in the infrastructure:
- Most CPU cores: 32c/64t (vs 16c M4 Max)
- Most RAM: 128GB (vs 64GB)
- Dedicated GPU: RTX 3090 24GB VRAM
- Largest storage: 4TB NVMe (vs 2TB)

Co-Authored-By: Warp <agent@warp.dev>
2026-01-09 05:53:16 -08:00
Apple
21691aa042 docs: document Security Incident #2 - recurring container compromise
Security Incident #2 Emergency Response (Jan 9, 2026):
- Documented second compromise with NEW crypto miners (softirq, vrarhpb)
- Root cause: Docker image auto-restarted after server reboot
- Emergency mitigation completed (processes killed, container/images removed, load normalized)
- Created comprehensive rebuild task document: TASK_REBUILD_DAARION_WEB.md
- Updated INFRASTRUCTURE.md v2.3.0 with Incident #2 timeline and lessons learned
- Updated infrastructure_quick_ref.ipynb v2.2.0 with security status

Critical Changes:
- daarion-web container permanently disabled until secure rebuild
- Docker images DELETED (not just container stopped)
- Enhanced firewall rules (SSH rate limiting, port scan blocking)
- Retry test registered with Hetzner
- System load normalized: 30+ → 4.19
- Zombie processes cleaned: 1499 → 5

Files Created/Updated:
1. TASK_REBUILD_DAARION_WEB.md - Detailed rebuild instructions for Cursor agent
2. INFRASTRUCTURE.md - Added Incident #2 to Security section
3. docs/infrastructure_quick_ref.ipynb - Updated security status and version

Lessons Learned:
- ALWAYS delete Docker images, not just containers
- Auto-restart policies are dangerous for compromised containers
- Complete removal = container + image + restart policy change

Status: Emergency mitigation complete, statement submission pending (deadline: 2026-01-09 12:54 UTC)

Hetzner Incident ID: 10F3971:2A (AbuseID)

Co-Authored-By: Warp <agent@warp.dev>
2026-01-09 02:08:13 -08:00
Apple
a1091b03a3 docs: add Cursor Agent SSH access instructions for NODE1
- Add detailed SSH connection guide for Cursor agents
- Include common commands, safety checks, and troubleshooting
- Add interactive session example and best practices
- Update INFRASTRUCTURE.md with section for Cursor agents
- Update infrastructure_quick_ref.ipynb with SSH access configuration
- Provide complete workflow examples for remote operations

Co-Authored-By: Warp <agent@warp.dev>
2026-01-09 02:08:13 -08:00
Apple
e829fe66f2 docs: security incident resolution & firewall implementation
- Document network scanning incident (Dec 6 2025 - Jan 8 2026)
- Add firewall rules to prevent internal network access
- Deploy monitoring script for scanning attempts
- Update INFRASTRUCTURE.md v2.2.0 with Security section
- Update infrastructure_quick_ref.ipynb v2.1.0
- Root cause: compromised daarion-web container with crypto miner
- Resolution: container removed, firewall applied, monitoring deployed

Co-Authored-By: Warp <agent@warp.dev>
2026-01-09 02:08:13 -08:00
GitHub Action
e3a8b7464a docs: auto-update repository information [skip ci] 2025-12-08 09:30:23 +00:00
Apple
ad3026e32d docs: Document root cause of daily data loss and fix 2025-12-05 02:42:44 -08:00
Apple
70b528f5cf docs: Add documentation for periodic data loss fix 2025-12-05 02:36:49 -08:00
Apple
db3b74e1ba fix: Integrate asset URL fix into recovery process and update docs 2025-12-03 10:13:19 -08:00
Apple
83b7e8f372 docs: Add database stability fix documentation 2025-12-03 10:00:11 -08:00