Files
microdao-daarion/INFRASTRUCTURE-MULTIMODAL-UPDATE.md
Apple 3de3c8cb36 feat: Add presence heartbeat for Matrix online status
- matrix-gateway: POST /internal/matrix/presence/online endpoint
- usePresenceHeartbeat hook with activity tracking
- Auto away after 5 min inactivity
- Offline on page close/visibility change
- Integrated in MatrixChatRoom component
2025-11-27 00:19:40 -08:00

406 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 📋 Оновлення INFRASTRUCTURE.md та infrastructure_quick_ref.ipynb
**Дата:** 2025-11-23
**Мета:** Додати інформацію про мультимодальні сервіси в стартові контекстні файли
---
## 🎯 ЩО ДОДАТИ
### 1. Нові сервіси на НОДА2:
- **STT Service** (:8895) - Speech-to-Text (Whisper AI)
- **OCR Service** (:8896) - Optical Character Recognition (Tesseract + EasyOCR)
- **Web Search Service** (:8897) - Web Search (DuckDuckGo + Google)
- **Vector DB Service** (:8898) - Vector Database (ChromaDB)
### 2. Оновлення Router (NODE1):
- **Multimodal Support** - підтримка images/files/audio в payload
- **Vision Agents** - Sofia (grok-4.1), Spectra (qwen3-vl)
### 3. Оновлення Telegram Gateway:
- **STT Integration** - автоматична транскрипція голосу
- **Vision Integration** - обробка фото
- **OCR Integration** - витяг тексту з зображень
---
## 📝 ЗМІНИ ДЛЯ INFRASTRUCTURE.md
### Додати в розділ "Services":
```markdown
## 🎤 Multimodal Services (НОДА2)
### STT Service - Speech-to-Text
- **URL:** http://192.168.1.244:8895
- **Technology:** OpenAI Whisper AI (base model)
- **Functions:**
- Voice → Text transcription
- Ukrainian, English, Russian support
- Auto-transcription for Telegram bots
- **Endpoints:**
- POST /api/stt - Transcribe base64 audio
- POST /api/stt/upload - Upload audio file
- GET /health - Health check
### OCR Service - Text Extraction
- **URL:** http://192.168.1.244:8896
- **Technology:** Tesseract + EasyOCR
- **Functions:**
- Image → Text extraction
- Bounding boxes detection
- Multi-language support (uk, en, ru, pl, de, fr)
- Confidence scores
- **Endpoints:**
- POST /api/ocr - Extract text from base64 image
- POST /api/ocr/upload - Upload image file
- GET /health - Health check
### Web Search Service
- **URL:** http://192.168.1.244:8897
- **Technology:** DuckDuckGo + Google Search
- **Functions:**
- Real-time web search
- Region-specific search (ua-uk, us-en)
- JSON structured results
- Up to 10+ results per query
- **Endpoints:**
- POST /api/search - Search with JSON body
- GET /api/search?query=... - Search with query params
- GET /health - Health check
### Vector DB Service - Knowledge Base
- **URL:** http://192.168.1.244:8898
- **Technology:** ChromaDB + Sentence Transformers
- **Functions:**
- Vector database for documents
- Semantic search
- Document embeddings (all-MiniLM-L6-v2)
- RAG (Retrieval-Augmented Generation) support
- **Endpoints:**
- POST /api/collections - Create collection
- GET /api/collections - List collections
- POST /api/documents - Add documents
- POST /api/search - Semantic search
- DELETE /api/documents - Delete documents
- GET /health - Health check
---
## 🔄 Router Multimodal Support (NODE1)
### Enhanced /route endpoint
- **URL:** http://144.76.224.179:9102/route
- **New Payload Structure:**
```json
{
"agent": "sofia",
"message": "Analyze this image",
"mode": "chat",
"payload": {
"context": {
"system_prompt": "...",
"images": ["data:image/png;base64,..."],
"files": [{"name": "doc.pdf", "data": "..."}],
"audio": "data:audio/webm;base64,..."
}
}
}
```
### Vision Agents
- **Sofia** (grok-4.1, xAI) - Vision + Code + Files
- **Spectra** (qwen3-vl:latest, Ollama) - Vision + Language
### Features:
- 📷 Image processing (PIL)
- 📎 File processing (PDF, TXT, MD)
- 🎤 Audio transcription (via STT Service)
- 🌐 Web search integration
- 📚 Knowledge Base / RAG
---
## 📱 Telegram Gateway Updates
### Enhanced Features:
- 🎤 **Voice Messages** → Auto-transcription via STT Service
- 📷 **Photos** → Vision analysis via Sofia/Spectra
- 📎 **Documents** → Text extraction via OCR/Parser
- 🌐 **Web Search** → Real-time search results
### Workflow:
```
Telegram Bot → Voice/Photo/File
Gateway → STT/OCR/Parser Service
Router → Vision/LLM Agent
Response → Telegram Bot
```
---
## 📊 Service Ports Summary
| Service | Port | Node | Technology | Status |
|---------|------|------|------------|--------|
| Frontend | 8899 | Local | React + Vite | ✅ |
| STT Service | 8895 | НОДА2 | Whisper AI | ✅ |
| OCR Service | 8896 | НОДА2 | Tesseract + EasyOCR | ✅ |
| Web Search | 8897 | НОДА2 | DuckDuckGo + Google | ✅ |
| Vector DB | 8898 | НОДА2 | ChromaDB | ✅ |
| Router | 9102 | NODE1 | FastAPI + Ollama | ✅ Multimodal |
| Telegram Gateway | 9200 | NODE1 | FastAPI + NATS | ✅ Enhanced |
| Swapper NODE1 | 8890 | NODE1 | LLM Manager | ✅ |
| Swapper NODE2 | 8890 | НОДА2 | LLM Manager | ✅ |
---
## 🌐 Network Configuration
### НОДА2 → NODE1 Communication:
```bash
# Multimodal Services accessible from NODE1
STT_SERVICE_URL=http://192.168.1.244:8895
OCR_SERVICE_URL=http://192.168.1.244:8896
WEB_SEARCH_URL=http://192.168.1.244:8897
VECTOR_DB_URL=http://192.168.1.244:8898
```
### Firewall Rules (НОДА2):
```bash
sudo ufw allow 8895/tcp # STT Service
sudo ufw allow 8896/tcp # OCR Service
sudo ufw allow 8897/tcp # Web Search
sudo ufw allow 8898/tcp # Vector DB
```
---
## 🚀 Deployment Status
### ✅ Completed:
- [x] Frontend Multimodal UI (Enhanced Chat)
- [x] STT Service (Whisper AI)
- [x] OCR Service (Tesseract + EasyOCR)
- [x] Web Search Service (DuckDuckGo + Google)
- [x] Vector DB Service (ChromaDB)
- [x] Router Multimodal code prepared
### 🔄 In Progress:
- [ ] Router Multimodal integration on NODE1
- [ ] Telegram Gateway STT integration
- [ ] Telegram Gateway Vision integration
- [ ] Network setup (firewall, SSH tunnels)
- [ ] End-to-end testing
### 📅 Timeline:
- **Code Ready:** 2025-11-23
- **Integration:** ~7-9 hours
- **Expected Complete:** TBD
---
## 📖 Documentation
### Created Files:
1. **COMPLETE-MULTIMODAL-ECOSYSTEM.md** - Full ecosystem overview
2. **MULTIMODAL-IMPLEMENTATION-COMPLETE.md** - Implementation details
3. **ROUTER-MULTIMODAL-SUPPORT.md** - Router documentation
4. **ROUTER-MULTIMODAL-INTEGRATION-GUIDE.md** - Integration guide
5. **NODE1-MULTIMODAL-SERVICES-STATUS.md** - Current status
6. **services/stt-service/README.md** - STT documentation
7. **services/ocr-service/** - OCR service files
8. **services/web-search-service/** - Web Search files
9. **services/vector-db-service/** - Vector DB files
### Integration Scripts:
- **services/router-multimodal/router_multimodal.py** - Router integration code
- **services/*/docker-compose.yml** - Docker deployment configs
---
## 🎯 Key Features
### For Users:
- 🎤 **Voice to Text** - Speak and get transcribed
- 📷 **Image Analysis** - Upload images for AI analysis
- 📷 **OCR** - Extract text from images/scans
- 📎 **Document Processing** - Upload and analyze documents
- 🌐 **Web Search** - Real-time internet search
- 📚 **Knowledge Base** - Store and search documents
### For Agents:
- 👁️ **Vision** - Sofia (grok-4.1), Spectra (qwen3-vl)
- 👂 **Hearing** - STT (Whisper AI)
- 📖 **Reading** - OCR (Tesseract + EasyOCR)
- 🔍 **Searching** - Web Search (DuckDuckGo + Google)
- 🧠 **Memory** - Vector DB (ChromaDB)
- 💬 **Speaking** - Text responses
---
## 🔗 Related Documents
- **INFRASTRUCTURE.md** ← Update this file
- **docs/infrastructure_quick_ref.ipynb** ← Update this notebook
- **PROJECT_CONTEXT.md** - Quick project context
- **CURSOR_WORKFLOW.md** - Workflow guide
---
## 📞 Integration Support
**Contact:** DAARION Development Team
**Date:** 2025-11-23
**Version:** 2.0.0
**Status:** ✅ Ready for Integration
---
**Next Steps:**
1. Update INFRASTRUCTURE.md with new sections
2. Update infrastructure_quick_ref.ipynb with new cells
3. Deploy Router Multimodal on NODE1
4. Deploy Multimodal Services on НОДА2
5. Configure network access
6. Run end-to-end tests
```
---
## 📝 INFRASTRUCTURE_QUICK_REF.IPYNB UPDATES
### Нова секція (додати після існуючих):
```python
# %% [markdown]
# ## 🎤 Мультимодальні Сервіси (НОДА2)
# %%
multimodal_services = {
"STT Service": {
"url": "http://192.168.1.244:8895",
"technology": "OpenAI Whisper AI",
"features": ["Voice→Text", "Ukrainian/English/Russian", "Telegram integration"],
"endpoints": ["/api/stt", "/api/stt/upload", "/health"]
},
"OCR Service": {
"url": "http://192.168.1.244:8896",
"technology": "Tesseract + EasyOCR",
"features": ["Image→Text", "Bounding boxes", "6 languages", "Confidence scores"],
"endpoints": ["/api/ocr", "/api/ocr/upload", "/health"]
},
"Web Search": {
"url": "http://192.168.1.244:8897",
"technology": "DuckDuckGo + Google",
"features": ["Real-time search", "Region-specific", "10+ results"],
"endpoints": ["/api/search", "/health"]
},
"Vector DB": {
"url": "http://192.168.1.244:8898",
"technology": "ChromaDB + Sentence Transformers",
"features": ["Vector database", "Semantic search", "RAG support"],
"endpoints": ["/api/collections", "/api/documents", "/api/search", "/health"]
}
}
import pandas as pd
pd.DataFrame(multimodal_services).T
# %% [markdown]
# ## 🤖 Vision Agents (NODE1)
# %%
vision_agents = {
"Sofia": {
"model": "grok-4.1",
"provider": "xAI",
"supports_vision": True,
"supports_files": True,
"description": "Vision + Code analysis"
},
"Spectra": {
"model": "qwen3-vl:latest",
"provider": "Ollama",
"supports_vision": True,
"supports_files": False,
"description": "Vision + Language"
}
}
pd.DataFrame(vision_agents).T
# %% [markdown]
# ## 📊 Порти всіх сервісів
# %%
all_ports = {
"Frontend": {"port": 8899, "node": "Local", "status": "✅"},
"STT Service": {"port": 8895, "node": "НОДА2", "status": "✅"},
"OCR Service": {"port": 8896, "node": "НОДА2", "status": "✅"},
"Web Search": {"port": 8897, "node": "НОДА2", "status": "✅"},
"Vector DB": {"port": 8898, "node": "НОДА2", "status": "✅"},
"Router": {"port": 9102, "node": "NODE1", "status": "✅ Multimodal"},
"Telegram Gateway": {"port": 9200, "node": "NODE1", "status": "✅ Enhanced"},
"Swapper NODE1": {"port": 8890, "node": "NODE1", "status": "✅"},
"Swapper NODE2": {"port": 8890, "node": "НОДА2", "status": "✅"}
}
pd.DataFrame(all_ports).T
# %% [markdown]
# ## 🔄 Мультимодальні можливості
# %%
multimodal_capabilities = {
"Текст": {"frontend": "✅", "telegram": "✅", "status": "ПРАЦЮЄ"},
"Голос→Текст": {"frontend": "✅", "telegram": "🔄", "status": "ІНТЕГРАЦІЯ"},
"Зображення→Vision": {"frontend": "✅", "telegram": "🔄", "status": "ІНТЕГРАЦІЯ"},
"Зображення→OCR": {"frontend": "✅", "telegram": "🔄", "status": "ІНТЕГРАЦІЯ"},
"Документи": {"frontend": "✅", "telegram": "⚠️", "status": "ЧАСТКОВА"},
"Веб-пошук": {"frontend": "✅", "telegram": "🔄", "status": "ІНТЕГРАЦІЯ"},
"Knowledge Base": {"frontend": "✅", "telegram": "❌", "status": "ГОТОВИЙ"}
}
pd.DataFrame(multimodal_capabilities).T
# %% [markdown]
# ## 📅 Версія та оновлення
# %%
version_info = {
"version": "2.0.0",
"date": "2025-11-23",
"major_changes": [
"Додано STT Service (Whisper AI)",
"Додано OCR Service (Tesseract + EasyOCR)",
"Додано Web Search Service",
"Додано Vector DB Service (ChromaDB)",
"Розширено Router з multimodal підтримкою",
"Оновлено Telegram Gateway з STT/Vision"
],
"integration_status": "🔄 В процесі (~7-9 годин)"
}
print("Версія:", version_info["version"])
print("Дата:", version_info["date"])
print("\nОсновні зміни:")
for change in version_info["major_changes"]:
print(f" • {change}")
print(f"\nСтатус: {version_info['integration_status']}")
```
---
## ✅ ГОТОВО
Документація підготовлена для оновлення стартових контекстних файлів!