- matrix-gateway: POST /internal/matrix/presence/online endpoint - usePresenceHeartbeat hook with activity tracking - Auto away after 5 min inactivity - Offline on page close/visibility change - Integrated in MatrixChatRoom component
12 KiB
12 KiB
📋 Оновлення INFRASTRUCTURE.md та infrastructure_quick_ref.ipynb
Дата: 2025-11-23
Мета: Додати інформацію про мультимодальні сервіси в стартові контекстні файли
🎯 ЩО ДОДАТИ
1. Нові сервіси на НОДА2:
- STT Service (:8895) - Speech-to-Text (Whisper AI)
- OCR Service (:8896) - Optical Character Recognition (Tesseract + EasyOCR)
- Web Search Service (:8897) - Web Search (DuckDuckGo + Google)
- Vector DB Service (:8898) - Vector Database (ChromaDB)
2. Оновлення Router (NODE1):
- Multimodal Support - підтримка images/files/audio в payload
- Vision Agents - Sofia (grok-4.1), Spectra (qwen3-vl)
3. Оновлення Telegram Gateway:
- STT Integration - автоматична транскрипція голосу
- Vision Integration - обробка фото
- OCR Integration - витяг тексту з зображень
📝 ЗМІНИ ДЛЯ INFRASTRUCTURE.md
Додати в розділ "Services":
## 🎤 Multimodal Services (НОДА2)
### STT Service - Speech-to-Text
- **URL:** http://192.168.1.244:8895
- **Technology:** OpenAI Whisper AI (base model)
- **Functions:**
- Voice → Text transcription
- Ukrainian, English, Russian support
- Auto-transcription for Telegram bots
- **Endpoints:**
- POST /api/stt - Transcribe base64 audio
- POST /api/stt/upload - Upload audio file
- GET /health - Health check
### OCR Service - Text Extraction
- **URL:** http://192.168.1.244:8896
- **Technology:** Tesseract + EasyOCR
- **Functions:**
- Image → Text extraction
- Bounding boxes detection
- Multi-language support (uk, en, ru, pl, de, fr)
- Confidence scores
- **Endpoints:**
- POST /api/ocr - Extract text from base64 image
- POST /api/ocr/upload - Upload image file
- GET /health - Health check
### Web Search Service
- **URL:** http://192.168.1.244:8897
- **Technology:** DuckDuckGo + Google Search
- **Functions:**
- Real-time web search
- Region-specific search (ua-uk, us-en)
- JSON structured results
- Up to 10+ results per query
- **Endpoints:**
- POST /api/search - Search with JSON body
- GET /api/search?query=... - Search with query params
- GET /health - Health check
### Vector DB Service - Knowledge Base
- **URL:** http://192.168.1.244:8898
- **Technology:** ChromaDB + Sentence Transformers
- **Functions:**
- Vector database for documents
- Semantic search
- Document embeddings (all-MiniLM-L6-v2)
- RAG (Retrieval-Augmented Generation) support
- **Endpoints:**
- POST /api/collections - Create collection
- GET /api/collections - List collections
- POST /api/documents - Add documents
- POST /api/search - Semantic search
- DELETE /api/documents - Delete documents
- GET /health - Health check
---
## 🔄 Router Multimodal Support (NODE1)
### Enhanced /route endpoint
- **URL:** http://144.76.224.179:9102/route
- **New Payload Structure:**
```json
{
"agent": "sofia",
"message": "Analyze this image",
"mode": "chat",
"payload": {
"context": {
"system_prompt": "...",
"images": ["data:image/png;base64,..."],
"files": [{"name": "doc.pdf", "data": "..."}],
"audio": "data:audio/webm;base64,..."
}
}
}
Vision Agents
- Sofia (grok-4.1, xAI) - Vision + Code + Files
- Spectra (qwen3-vl:latest, Ollama) - Vision + Language
Features:
- 📷 Image processing (PIL)
- 📎 File processing (PDF, TXT, MD)
- 🎤 Audio transcription (via STT Service)
- 🌐 Web search integration
- 📚 Knowledge Base / RAG
📱 Telegram Gateway Updates
Enhanced Features:
- 🎤 Voice Messages → Auto-transcription via STT Service
- 📷 Photos → Vision analysis via Sofia/Spectra
- 📎 Documents → Text extraction via OCR/Parser
- 🌐 Web Search → Real-time search results
Workflow:
Telegram Bot → Voice/Photo/File
↓
Gateway → STT/OCR/Parser Service
↓
Router → Vision/LLM Agent
↓
Response → Telegram Bot
📊 Service Ports Summary
| Service | Port | Node | Technology | Status |
|---|---|---|---|---|
| Frontend | 8899 | Local | React + Vite | ✅ |
| STT Service | 8895 | НОДА2 | Whisper AI | ✅ |
| OCR Service | 8896 | НОДА2 | Tesseract + EasyOCR | ✅ |
| Web Search | 8897 | НОДА2 | DuckDuckGo + Google | ✅ |
| Vector DB | 8898 | НОДА2 | ChromaDB | ✅ |
| Router | 9102 | NODE1 | FastAPI + Ollama | ✅ Multimodal |
| Telegram Gateway | 9200 | NODE1 | FastAPI + NATS | ✅ Enhanced |
| Swapper NODE1 | 8890 | NODE1 | LLM Manager | ✅ |
| Swapper NODE2 | 8890 | НОДА2 | LLM Manager | ✅ |
🌐 Network Configuration
НОДА2 → NODE1 Communication:
# Multimodal Services accessible from NODE1
STT_SERVICE_URL=http://192.168.1.244:8895
OCR_SERVICE_URL=http://192.168.1.244:8896
WEB_SEARCH_URL=http://192.168.1.244:8897
VECTOR_DB_URL=http://192.168.1.244:8898
Firewall Rules (НОДА2):
sudo ufw allow 8895/tcp # STT Service
sudo ufw allow 8896/tcp # OCR Service
sudo ufw allow 8897/tcp # Web Search
sudo ufw allow 8898/tcp # Vector DB
🚀 Deployment Status
✅ Completed:
- Frontend Multimodal UI (Enhanced Chat)
- STT Service (Whisper AI)
- OCR Service (Tesseract + EasyOCR)
- Web Search Service (DuckDuckGo + Google)
- Vector DB Service (ChromaDB)
- Router Multimodal code prepared
🔄 In Progress:
- Router Multimodal integration on NODE1
- Telegram Gateway STT integration
- Telegram Gateway Vision integration
- Network setup (firewall, SSH tunnels)
- End-to-end testing
📅 Timeline:
- Code Ready: 2025-11-23
- Integration: ~7-9 hours
- Expected Complete: TBD
📖 Documentation
Created Files:
- COMPLETE-MULTIMODAL-ECOSYSTEM.md - Full ecosystem overview
- MULTIMODAL-IMPLEMENTATION-COMPLETE.md - Implementation details
- ROUTER-MULTIMODAL-SUPPORT.md - Router documentation
- ROUTER-MULTIMODAL-INTEGRATION-GUIDE.md - Integration guide
- NODE1-MULTIMODAL-SERVICES-STATUS.md - Current status
- services/stt-service/README.md - STT documentation
- services/ocr-service/ - OCR service files
- services/web-search-service/ - Web Search files
- services/vector-db-service/ - Vector DB files
Integration Scripts:
- services/router-multimodal/router_multimodal.py - Router integration code
- services/*/docker-compose.yml - Docker deployment configs
🎯 Key Features
For Users:
- 🎤 Voice to Text - Speak and get transcribed
- 📷 Image Analysis - Upload images for AI analysis
- 📷 OCR - Extract text from images/scans
- 📎 Document Processing - Upload and analyze documents
- 🌐 Web Search - Real-time internet search
- 📚 Knowledge Base - Store and search documents
For Agents:
- 👁️ Vision - Sofia (grok-4.1), Spectra (qwen3-vl)
- 👂 Hearing - STT (Whisper AI)
- 📖 Reading - OCR (Tesseract + EasyOCR)
- 🔍 Searching - Web Search (DuckDuckGo + Google)
- 🧠 Memory - Vector DB (ChromaDB)
- 💬 Speaking - Text responses
🔗 Related Documents
- INFRASTRUCTURE.md ← Update this file
- docs/infrastructure_quick_ref.ipynb ← Update this notebook
- PROJECT_CONTEXT.md - Quick project context
- CURSOR_WORKFLOW.md - Workflow guide
📞 Integration Support
Contact: DAARION Development Team
Date: 2025-11-23
Version: 2.0.0
Status: ✅ Ready for Integration
Next Steps:
- Update INFRASTRUCTURE.md with new sections
- Update infrastructure_quick_ref.ipynb with new cells
- Deploy Router Multimodal on NODE1
- Deploy Multimodal Services on НОДА2
- Configure network access
- Run end-to-end tests
---
## 📝 INFRASTRUCTURE_QUICK_REF.IPYNB UPDATES
### Нова секція (додати після існуючих):
```python
# %% [markdown]
# ## 🎤 Мультимодальні Сервіси (НОДА2)
# %%
multimodal_services = {
"STT Service": {
"url": "http://192.168.1.244:8895",
"technology": "OpenAI Whisper AI",
"features": ["Voice→Text", "Ukrainian/English/Russian", "Telegram integration"],
"endpoints": ["/api/stt", "/api/stt/upload", "/health"]
},
"OCR Service": {
"url": "http://192.168.1.244:8896",
"technology": "Tesseract + EasyOCR",
"features": ["Image→Text", "Bounding boxes", "6 languages", "Confidence scores"],
"endpoints": ["/api/ocr", "/api/ocr/upload", "/health"]
},
"Web Search": {
"url": "http://192.168.1.244:8897",
"technology": "DuckDuckGo + Google",
"features": ["Real-time search", "Region-specific", "10+ results"],
"endpoints": ["/api/search", "/health"]
},
"Vector DB": {
"url": "http://192.168.1.244:8898",
"technology": "ChromaDB + Sentence Transformers",
"features": ["Vector database", "Semantic search", "RAG support"],
"endpoints": ["/api/collections", "/api/documents", "/api/search", "/health"]
}
}
import pandas as pd
pd.DataFrame(multimodal_services).T
# %% [markdown]
# ## 🤖 Vision Agents (NODE1)
# %%
vision_agents = {
"Sofia": {
"model": "grok-4.1",
"provider": "xAI",
"supports_vision": True,
"supports_files": True,
"description": "Vision + Code analysis"
},
"Spectra": {
"model": "qwen3-vl:latest",
"provider": "Ollama",
"supports_vision": True,
"supports_files": False,
"description": "Vision + Language"
}
}
pd.DataFrame(vision_agents).T
# %% [markdown]
# ## 📊 Порти всіх сервісів
# %%
all_ports = {
"Frontend": {"port": 8899, "node": "Local", "status": "✅"},
"STT Service": {"port": 8895, "node": "НОДА2", "status": "✅"},
"OCR Service": {"port": 8896, "node": "НОДА2", "status": "✅"},
"Web Search": {"port": 8897, "node": "НОДА2", "status": "✅"},
"Vector DB": {"port": 8898, "node": "НОДА2", "status": "✅"},
"Router": {"port": 9102, "node": "NODE1", "status": "✅ Multimodal"},
"Telegram Gateway": {"port": 9200, "node": "NODE1", "status": "✅ Enhanced"},
"Swapper NODE1": {"port": 8890, "node": "NODE1", "status": "✅"},
"Swapper NODE2": {"port": 8890, "node": "НОДА2", "status": "✅"}
}
pd.DataFrame(all_ports).T
# %% [markdown]
# ## 🔄 Мультимодальні можливості
# %%
multimodal_capabilities = {
"Текст": {"frontend": "✅", "telegram": "✅", "status": "ПРАЦЮЄ"},
"Голос→Текст": {"frontend": "✅", "telegram": "🔄", "status": "ІНТЕГРАЦІЯ"},
"Зображення→Vision": {"frontend": "✅", "telegram": "🔄", "status": "ІНТЕГРАЦІЯ"},
"Зображення→OCR": {"frontend": "✅", "telegram": "🔄", "status": "ІНТЕГРАЦІЯ"},
"Документи": {"frontend": "✅", "telegram": "⚠️", "status": "ЧАСТКОВА"},
"Веб-пошук": {"frontend": "✅", "telegram": "🔄", "status": "ІНТЕГРАЦІЯ"},
"Knowledge Base": {"frontend": "✅", "telegram": "❌", "status": "ГОТОВИЙ"}
}
pd.DataFrame(multimodal_capabilities).T
# %% [markdown]
# ## 📅 Версія та оновлення
# %%
version_info = {
"version": "2.0.0",
"date": "2025-11-23",
"major_changes": [
"Додано STT Service (Whisper AI)",
"Додано OCR Service (Tesseract + EasyOCR)",
"Додано Web Search Service",
"Додано Vector DB Service (ChromaDB)",
"Розширено Router з multimodal підтримкою",
"Оновлено Telegram Gateway з STT/Vision"
],
"integration_status": "🔄 В процесі (~7-9 годин)"
}
print("Версія:", version_info["version"])
print("Дата:", version_info["date"])
print("\nОсновні зміни:")
for change in version_info["major_changes"]:
print(f" • {change}")
print(f"\nСтатус: {version_info['integration_status']}")
✅ ГОТОВО
Документація підготовлена для оновлення стартових контекстних файлів!