Files
microdao-daarion/INFRASTRUCTURE-MULTIMODAL-UPDATE.md
Apple 3de3c8cb36 feat: Add presence heartbeat for Matrix online status
- matrix-gateway: POST /internal/matrix/presence/online endpoint
- usePresenceHeartbeat hook with activity tracking
- Auto away after 5 min inactivity
- Offline on page close/visibility change
- Integrated in MatrixChatRoom component
2025-11-27 00:19:40 -08:00

12 KiB
Raw Permalink Blame History

📋 Оновлення INFRASTRUCTURE.md та infrastructure_quick_ref.ipynb

Дата: 2025-11-23
Мета: Додати інформацію про мультимодальні сервіси в стартові контекстні файли


🎯 ЩО ДОДАТИ

1. Нові сервіси на НОДА2:

  • STT Service (:8895) - Speech-to-Text (Whisper AI)
  • OCR Service (:8896) - Optical Character Recognition (Tesseract + EasyOCR)
  • Web Search Service (:8897) - Web Search (DuckDuckGo + Google)
  • Vector DB Service (:8898) - Vector Database (ChromaDB)

2. Оновлення Router (NODE1):

  • Multimodal Support - підтримка images/files/audio в payload
  • Vision Agents - Sofia (grok-4.1), Spectra (qwen3-vl)

3. Оновлення Telegram Gateway:

  • STT Integration - автоматична транскрипція голосу
  • Vision Integration - обробка фото
  • OCR Integration - витяг тексту з зображень

📝 ЗМІНИ ДЛЯ INFRASTRUCTURE.md

Додати в розділ "Services":

## 🎤 Multimodal Services (НОДА2)

### STT Service - Speech-to-Text
- **URL:** http://192.168.1.244:8895
- **Technology:** OpenAI Whisper AI (base model)
- **Functions:**
  - Voice → Text transcription
  - Ukrainian, English, Russian support
  - Auto-transcription for Telegram bots
- **Endpoints:**
  - POST /api/stt - Transcribe base64 audio
  - POST /api/stt/upload - Upload audio file
  - GET /health - Health check

### OCR Service - Text Extraction
- **URL:** http://192.168.1.244:8896
- **Technology:** Tesseract + EasyOCR
- **Functions:**
  - Image → Text extraction
  - Bounding boxes detection
  - Multi-language support (uk, en, ru, pl, de, fr)
  - Confidence scores
- **Endpoints:**
  - POST /api/ocr - Extract text from base64 image
  - POST /api/ocr/upload - Upload image file
  - GET /health - Health check

### Web Search Service
- **URL:** http://192.168.1.244:8897
- **Technology:** DuckDuckGo + Google Search
- **Functions:**
  - Real-time web search
  - Region-specific search (ua-uk, us-en)
  - JSON structured results
  - Up to 10+ results per query
- **Endpoints:**
  - POST /api/search - Search with JSON body
  - GET /api/search?query=... - Search with query params
  - GET /health - Health check

### Vector DB Service - Knowledge Base
- **URL:** http://192.168.1.244:8898
- **Technology:** ChromaDB + Sentence Transformers
- **Functions:**
  - Vector database for documents
  - Semantic search
  - Document embeddings (all-MiniLM-L6-v2)
  - RAG (Retrieval-Augmented Generation) support
- **Endpoints:**
  - POST /api/collections - Create collection
  - GET /api/collections - List collections
  - POST /api/documents - Add documents
  - POST /api/search - Semantic search
  - DELETE /api/documents - Delete documents
  - GET /health - Health check

---

## 🔄 Router Multimodal Support (NODE1)

### Enhanced /route endpoint
- **URL:** http://144.76.224.179:9102/route
- **New Payload Structure:**

```json
{
  "agent": "sofia",
  "message": "Analyze this image",
  "mode": "chat",
  "payload": {
    "context": {
      "system_prompt": "...",
      "images": ["data:image/png;base64,..."],
      "files": [{"name": "doc.pdf", "data": "..."}],
      "audio": "data:audio/webm;base64,..."
    }
  }
}

Vision Agents

  • Sofia (grok-4.1, xAI) - Vision + Code + Files
  • Spectra (qwen3-vl:latest, Ollama) - Vision + Language

Features:

  • 📷 Image processing (PIL)
  • 📎 File processing (PDF, TXT, MD)
  • 🎤 Audio transcription (via STT Service)
  • 🌐 Web search integration
  • 📚 Knowledge Base / RAG

📱 Telegram Gateway Updates

Enhanced Features:

  • 🎤 Voice Messages → Auto-transcription via STT Service
  • 📷 Photos → Vision analysis via Sofia/Spectra
  • 📎 Documents → Text extraction via OCR/Parser
  • 🌐 Web Search → Real-time search results

Workflow:

Telegram Bot → Voice/Photo/File
    ↓
Gateway → STT/OCR/Parser Service
    ↓
Router → Vision/LLM Agent
    ↓
Response → Telegram Bot

📊 Service Ports Summary

Service Port Node Technology Status
Frontend 8899 Local React + Vite
STT Service 8895 НОДА2 Whisper AI
OCR Service 8896 НОДА2 Tesseract + EasyOCR
Web Search 8897 НОДА2 DuckDuckGo + Google
Vector DB 8898 НОДА2 ChromaDB
Router 9102 NODE1 FastAPI + Ollama Multimodal
Telegram Gateway 9200 NODE1 FastAPI + NATS Enhanced
Swapper NODE1 8890 NODE1 LLM Manager
Swapper NODE2 8890 НОДА2 LLM Manager

🌐 Network Configuration

НОДА2 → NODE1 Communication:

# Multimodal Services accessible from NODE1
STT_SERVICE_URL=http://192.168.1.244:8895
OCR_SERVICE_URL=http://192.168.1.244:8896
WEB_SEARCH_URL=http://192.168.1.244:8897
VECTOR_DB_URL=http://192.168.1.244:8898

Firewall Rules (НОДА2):

sudo ufw allow 8895/tcp  # STT Service
sudo ufw allow 8896/tcp  # OCR Service
sudo ufw allow 8897/tcp  # Web Search
sudo ufw allow 8898/tcp  # Vector DB

🚀 Deployment Status

Completed:

  • Frontend Multimodal UI (Enhanced Chat)
  • STT Service (Whisper AI)
  • OCR Service (Tesseract + EasyOCR)
  • Web Search Service (DuckDuckGo + Google)
  • Vector DB Service (ChromaDB)
  • Router Multimodal code prepared

🔄 In Progress:

  • Router Multimodal integration on NODE1
  • Telegram Gateway STT integration
  • Telegram Gateway Vision integration
  • Network setup (firewall, SSH tunnels)
  • End-to-end testing

📅 Timeline:

  • Code Ready: 2025-11-23
  • Integration: ~7-9 hours
  • Expected Complete: TBD

📖 Documentation

Created Files:

  1. COMPLETE-MULTIMODAL-ECOSYSTEM.md - Full ecosystem overview
  2. MULTIMODAL-IMPLEMENTATION-COMPLETE.md - Implementation details
  3. ROUTER-MULTIMODAL-SUPPORT.md - Router documentation
  4. ROUTER-MULTIMODAL-INTEGRATION-GUIDE.md - Integration guide
  5. NODE1-MULTIMODAL-SERVICES-STATUS.md - Current status
  6. services/stt-service/README.md - STT documentation
  7. services/ocr-service/ - OCR service files
  8. services/web-search-service/ - Web Search files
  9. services/vector-db-service/ - Vector DB files

Integration Scripts:

  • services/router-multimodal/router_multimodal.py - Router integration code
  • services/*/docker-compose.yml - Docker deployment configs

🎯 Key Features

For Users:

  • 🎤 Voice to Text - Speak and get transcribed
  • 📷 Image Analysis - Upload images for AI analysis
  • 📷 OCR - Extract text from images/scans
  • 📎 Document Processing - Upload and analyze documents
  • 🌐 Web Search - Real-time internet search
  • 📚 Knowledge Base - Store and search documents

For Agents:

  • 👁️ Vision - Sofia (grok-4.1), Spectra (qwen3-vl)
  • 👂 Hearing - STT (Whisper AI)
  • 📖 Reading - OCR (Tesseract + EasyOCR)
  • 🔍 Searching - Web Search (DuckDuckGo + Google)
  • 🧠 Memory - Vector DB (ChromaDB)
  • 💬 Speaking - Text responses

  • INFRASTRUCTURE.md ← Update this file
  • docs/infrastructure_quick_ref.ipynb ← Update this notebook
  • PROJECT_CONTEXT.md - Quick project context
  • CURSOR_WORKFLOW.md - Workflow guide

📞 Integration Support

Contact: DAARION Development Team
Date: 2025-11-23
Version: 2.0.0
Status: Ready for Integration


Next Steps:

  1. Update INFRASTRUCTURE.md with new sections
  2. Update infrastructure_quick_ref.ipynb with new cells
  3. Deploy Router Multimodal on NODE1
  4. Deploy Multimodal Services on НОДА2
  5. Configure network access
  6. Run end-to-end tests

---

## 📝 INFRASTRUCTURE_QUICK_REF.IPYNB UPDATES

### Нова секція (додати після існуючих):

```python
# %% [markdown]
# ## 🎤 Мультимодальні Сервіси (НОДА2)

# %%
multimodal_services = {
    "STT Service": {
        "url": "http://192.168.1.244:8895",
        "technology": "OpenAI Whisper AI",
        "features": ["Voice→Text", "Ukrainian/English/Russian", "Telegram integration"],
        "endpoints": ["/api/stt", "/api/stt/upload", "/health"]
    },
    "OCR Service": {
        "url": "http://192.168.1.244:8896",
        "technology": "Tesseract + EasyOCR",
        "features": ["Image→Text", "Bounding boxes", "6 languages", "Confidence scores"],
        "endpoints": ["/api/ocr", "/api/ocr/upload", "/health"]
    },
    "Web Search": {
        "url": "http://192.168.1.244:8897",
        "technology": "DuckDuckGo + Google",
        "features": ["Real-time search", "Region-specific", "10+ results"],
        "endpoints": ["/api/search", "/health"]
    },
    "Vector DB": {
        "url": "http://192.168.1.244:8898",
        "technology": "ChromaDB + Sentence Transformers",
        "features": ["Vector database", "Semantic search", "RAG support"],
        "endpoints": ["/api/collections", "/api/documents", "/api/search", "/health"]
    }
}

import pandas as pd
pd.DataFrame(multimodal_services).T

# %% [markdown]
# ## 🤖 Vision Agents (NODE1)

# %%
vision_agents = {
    "Sofia": {
        "model": "grok-4.1",
        "provider": "xAI",
        "supports_vision": True,
        "supports_files": True,
        "description": "Vision + Code analysis"
    },
    "Spectra": {
        "model": "qwen3-vl:latest",
        "provider": "Ollama",
        "supports_vision": True,
        "supports_files": False,
        "description": "Vision + Language"
    }
}

pd.DataFrame(vision_agents).T

# %% [markdown]
# ## 📊 Порти всіх сервісів

# %%
all_ports = {
    "Frontend": {"port": 8899, "node": "Local", "status": "✅"},
    "STT Service": {"port": 8895, "node": "НОДА2", "status": "✅"},
    "OCR Service": {"port": 8896, "node": "НОДА2", "status": "✅"},
    "Web Search": {"port": 8897, "node": "НОДА2", "status": "✅"},
    "Vector DB": {"port": 8898, "node": "НОДА2", "status": "✅"},
    "Router": {"port": 9102, "node": "NODE1", "status": "✅ Multimodal"},
    "Telegram Gateway": {"port": 9200, "node": "NODE1", "status": "✅ Enhanced"},
    "Swapper NODE1": {"port": 8890, "node": "NODE1", "status": "✅"},
    "Swapper NODE2": {"port": 8890, "node": "НОДА2", "status": "✅"}
}

pd.DataFrame(all_ports).T

# %% [markdown]
# ## 🔄 Мультимодальні можливості

# %%
multimodal_capabilities = {
    "Текст": {"frontend": "✅", "telegram": "✅", "status": "ПРАЦЮЄ"},
    "Голос→Текст": {"frontend": "✅", "telegram": "🔄", "status": "ІНТЕГРАЦІЯ"},
    "Зображення→Vision": {"frontend": "✅", "telegram": "🔄", "status": "ІНТЕГРАЦІЯ"},
    "Зображення→OCR": {"frontend": "✅", "telegram": "🔄", "status": "ІНТЕГРАЦІЯ"},
    "Документи": {"frontend": "✅", "telegram": "⚠️", "status": "ЧАСТКОВА"},
    "Веб-пошук": {"frontend": "✅", "telegram": "🔄", "status": "ІНТЕГРАЦІЯ"},
    "Knowledge Base": {"frontend": "✅", "telegram": "❌", "status": "ГОТОВИЙ"}
}

pd.DataFrame(multimodal_capabilities).T

# %% [markdown]
# ## 📅 Версія та оновлення

# %%
version_info = {
    "version": "2.0.0",
    "date": "2025-11-23",
    "major_changes": [
        "Додано STT Service (Whisper AI)",
        "Додано OCR Service (Tesseract + EasyOCR)",
        "Додано Web Search Service",
        "Додано Vector DB Service (ChromaDB)",
        "Розширено Router з multimodal підтримкою",
        "Оновлено Telegram Gateway з STT/Vision"
    ],
    "integration_status": "🔄 В процесі (~7-9 годин)"
}

print("Версія:", version_info["version"])
print("Дата:", version_info["date"])
print("\nОсновні зміни:")
for change in version_info["major_changes"]:
    print(f"  • {change}")
print(f"\nСтатус: {version_info['integration_status']}")

ГОТОВО

Документація підготовлена для оновлення стартових контекстних файлів!