- matrix-gateway: POST /internal/matrix/presence/online endpoint - usePresenceHeartbeat hook with activity tracking - Auto away after 5 min inactivity - Offline on page close/visibility change - Integrated in MatrixChatRoom component
406 lines
12 KiB
Markdown
406 lines
12 KiB
Markdown
# 📋 Оновлення INFRASTRUCTURE.md та infrastructure_quick_ref.ipynb
|
||
|
||
**Дата:** 2025-11-23
|
||
**Мета:** Додати інформацію про мультимодальні сервіси в стартові контекстні файли
|
||
|
||
---
|
||
|
||
## 🎯 ЩО ДОДАТИ
|
||
|
||
### 1. Нові сервіси на НОДА2:
|
||
|
||
- **STT Service** (:8895) - Speech-to-Text (Whisper AI)
|
||
- **OCR Service** (:8896) - Optical Character Recognition (Tesseract + EasyOCR)
|
||
- **Web Search Service** (:8897) - Web Search (DuckDuckGo + Google)
|
||
- **Vector DB Service** (:8898) - Vector Database (ChromaDB)
|
||
|
||
### 2. Оновлення Router (NODE1):
|
||
|
||
- **Multimodal Support** - підтримка images/files/audio в payload
|
||
- **Vision Agents** - Sofia (grok-4.1), Spectra (qwen3-vl)
|
||
|
||
### 3. Оновлення Telegram Gateway:
|
||
|
||
- **STT Integration** - автоматична транскрипція голосу
|
||
- **Vision Integration** - обробка фото
|
||
- **OCR Integration** - витяг тексту з зображень
|
||
|
||
---
|
||
|
||
## 📝 ЗМІНИ ДЛЯ INFRASTRUCTURE.md
|
||
|
||
### Додати в розділ "Services":
|
||
|
||
```markdown
|
||
## 🎤 Multimodal Services (НОДА2)
|
||
|
||
### STT Service - Speech-to-Text
|
||
- **URL:** http://192.168.1.244:8895
|
||
- **Technology:** OpenAI Whisper AI (base model)
|
||
- **Functions:**
|
||
- Voice → Text transcription
|
||
- Ukrainian, English, Russian support
|
||
- Auto-transcription for Telegram bots
|
||
- **Endpoints:**
|
||
- POST /api/stt - Transcribe base64 audio
|
||
- POST /api/stt/upload - Upload audio file
|
||
- GET /health - Health check
|
||
|
||
### OCR Service - Text Extraction
|
||
- **URL:** http://192.168.1.244:8896
|
||
- **Technology:** Tesseract + EasyOCR
|
||
- **Functions:**
|
||
- Image → Text extraction
|
||
- Bounding boxes detection
|
||
- Multi-language support (uk, en, ru, pl, de, fr)
|
||
- Confidence scores
|
||
- **Endpoints:**
|
||
- POST /api/ocr - Extract text from base64 image
|
||
- POST /api/ocr/upload - Upload image file
|
||
- GET /health - Health check
|
||
|
||
### Web Search Service
|
||
- **URL:** http://192.168.1.244:8897
|
||
- **Technology:** DuckDuckGo + Google Search
|
||
- **Functions:**
|
||
- Real-time web search
|
||
- Region-specific search (ua-uk, us-en)
|
||
- JSON structured results
|
||
- Up to 10+ results per query
|
||
- **Endpoints:**
|
||
- POST /api/search - Search with JSON body
|
||
- GET /api/search?query=... - Search with query params
|
||
- GET /health - Health check
|
||
|
||
### Vector DB Service - Knowledge Base
|
||
- **URL:** http://192.168.1.244:8898
|
||
- **Technology:** ChromaDB + Sentence Transformers
|
||
- **Functions:**
|
||
- Vector database for documents
|
||
- Semantic search
|
||
- Document embeddings (all-MiniLM-L6-v2)
|
||
- RAG (Retrieval-Augmented Generation) support
|
||
- **Endpoints:**
|
||
- POST /api/collections - Create collection
|
||
- GET /api/collections - List collections
|
||
- POST /api/documents - Add documents
|
||
- POST /api/search - Semantic search
|
||
- DELETE /api/documents - Delete documents
|
||
- GET /health - Health check
|
||
|
||
---
|
||
|
||
## 🔄 Router Multimodal Support (NODE1)
|
||
|
||
### Enhanced /route endpoint
|
||
- **URL:** http://144.76.224.179:9102/route
|
||
- **New Payload Structure:**
|
||
|
||
```json
|
||
{
|
||
"agent": "sofia",
|
||
"message": "Analyze this image",
|
||
"mode": "chat",
|
||
"payload": {
|
||
"context": {
|
||
"system_prompt": "...",
|
||
"images": ["data:image/png;base64,..."],
|
||
"files": [{"name": "doc.pdf", "data": "..."}],
|
||
"audio": "data:audio/webm;base64,..."
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### Vision Agents
|
||
- **Sofia** (grok-4.1, xAI) - Vision + Code + Files
|
||
- **Spectra** (qwen3-vl:latest, Ollama) - Vision + Language
|
||
|
||
### Features:
|
||
- 📷 Image processing (PIL)
|
||
- 📎 File processing (PDF, TXT, MD)
|
||
- 🎤 Audio transcription (via STT Service)
|
||
- 🌐 Web search integration
|
||
- 📚 Knowledge Base / RAG
|
||
|
||
---
|
||
|
||
## 📱 Telegram Gateway Updates
|
||
|
||
### Enhanced Features:
|
||
- 🎤 **Voice Messages** → Auto-transcription via STT Service
|
||
- 📷 **Photos** → Vision analysis via Sofia/Spectra
|
||
- 📎 **Documents** → Text extraction via OCR/Parser
|
||
- 🌐 **Web Search** → Real-time search results
|
||
|
||
### Workflow:
|
||
```
|
||
Telegram Bot → Voice/Photo/File
|
||
↓
|
||
Gateway → STT/OCR/Parser Service
|
||
↓
|
||
Router → Vision/LLM Agent
|
||
↓
|
||
Response → Telegram Bot
|
||
```
|
||
|
||
---
|
||
|
||
## 📊 Service Ports Summary
|
||
|
||
| Service | Port | Node | Technology | Status |
|
||
|---------|------|------|------------|--------|
|
||
| Frontend | 8899 | Local | React + Vite | ✅ |
|
||
| STT Service | 8895 | НОДА2 | Whisper AI | ✅ |
|
||
| OCR Service | 8896 | НОДА2 | Tesseract + EasyOCR | ✅ |
|
||
| Web Search | 8897 | НОДА2 | DuckDuckGo + Google | ✅ |
|
||
| Vector DB | 8898 | НОДА2 | ChromaDB | ✅ |
|
||
| Router | 9102 | NODE1 | FastAPI + Ollama | ✅ Multimodal |
|
||
| Telegram Gateway | 9200 | NODE1 | FastAPI + NATS | ✅ Enhanced |
|
||
| Swapper NODE1 | 8890 | NODE1 | LLM Manager | ✅ |
|
||
| Swapper NODE2 | 8890 | НОДА2 | LLM Manager | ✅ |
|
||
|
||
---
|
||
|
||
## 🌐 Network Configuration
|
||
|
||
### НОДА2 → NODE1 Communication:
|
||
```bash
|
||
# Multimodal Services accessible from NODE1
|
||
STT_SERVICE_URL=http://192.168.1.244:8895
|
||
OCR_SERVICE_URL=http://192.168.1.244:8896
|
||
WEB_SEARCH_URL=http://192.168.1.244:8897
|
||
VECTOR_DB_URL=http://192.168.1.244:8898
|
||
```
|
||
|
||
### Firewall Rules (НОДА2):
|
||
```bash
|
||
sudo ufw allow 8895/tcp # STT Service
|
||
sudo ufw allow 8896/tcp # OCR Service
|
||
sudo ufw allow 8897/tcp # Web Search
|
||
sudo ufw allow 8898/tcp # Vector DB
|
||
```
|
||
|
||
---
|
||
|
||
## 🚀 Deployment Status
|
||
|
||
### ✅ Completed:
|
||
- [x] Frontend Multimodal UI (Enhanced Chat)
|
||
- [x] STT Service (Whisper AI)
|
||
- [x] OCR Service (Tesseract + EasyOCR)
|
||
- [x] Web Search Service (DuckDuckGo + Google)
|
||
- [x] Vector DB Service (ChromaDB)
|
||
- [x] Router Multimodal code prepared
|
||
|
||
### 🔄 In Progress:
|
||
- [ ] Router Multimodal integration on NODE1
|
||
- [ ] Telegram Gateway STT integration
|
||
- [ ] Telegram Gateway Vision integration
|
||
- [ ] Network setup (firewall, SSH tunnels)
|
||
- [ ] End-to-end testing
|
||
|
||
### 📅 Timeline:
|
||
- **Code Ready:** 2025-11-23
|
||
- **Integration:** ~7-9 hours
|
||
- **Expected Complete:** TBD
|
||
|
||
---
|
||
|
||
## 📖 Documentation
|
||
|
||
### Created Files:
|
||
1. **COMPLETE-MULTIMODAL-ECOSYSTEM.md** - Full ecosystem overview
|
||
2. **MULTIMODAL-IMPLEMENTATION-COMPLETE.md** - Implementation details
|
||
3. **ROUTER-MULTIMODAL-SUPPORT.md** - Router documentation
|
||
4. **ROUTER-MULTIMODAL-INTEGRATION-GUIDE.md** - Integration guide
|
||
5. **NODE1-MULTIMODAL-SERVICES-STATUS.md** - Current status
|
||
6. **services/stt-service/README.md** - STT documentation
|
||
7. **services/ocr-service/** - OCR service files
|
||
8. **services/web-search-service/** - Web Search files
|
||
9. **services/vector-db-service/** - Vector DB files
|
||
|
||
### Integration Scripts:
|
||
- **services/router-multimodal/router_multimodal.py** - Router integration code
|
||
- **services/*/docker-compose.yml** - Docker deployment configs
|
||
|
||
---
|
||
|
||
## 🎯 Key Features
|
||
|
||
### For Users:
|
||
- 🎤 **Voice to Text** - Speak and get transcribed
|
||
- 📷 **Image Analysis** - Upload images for AI analysis
|
||
- 📷 **OCR** - Extract text from images/scans
|
||
- 📎 **Document Processing** - Upload and analyze documents
|
||
- 🌐 **Web Search** - Real-time internet search
|
||
- 📚 **Knowledge Base** - Store and search documents
|
||
|
||
### For Agents:
|
||
- 👁️ **Vision** - Sofia (grok-4.1), Spectra (qwen3-vl)
|
||
- 👂 **Hearing** - STT (Whisper AI)
|
||
- 📖 **Reading** - OCR (Tesseract + EasyOCR)
|
||
- 🔍 **Searching** - Web Search (DuckDuckGo + Google)
|
||
- 🧠 **Memory** - Vector DB (ChromaDB)
|
||
- 💬 **Speaking** - Text responses
|
||
|
||
---
|
||
|
||
## 🔗 Related Documents
|
||
|
||
- **INFRASTRUCTURE.md** ← Update this file
|
||
- **docs/infrastructure_quick_ref.ipynb** ← Update this notebook
|
||
- **PROJECT_CONTEXT.md** - Quick project context
|
||
- **CURSOR_WORKFLOW.md** - Workflow guide
|
||
|
||
---
|
||
|
||
## 📞 Integration Support
|
||
|
||
**Contact:** DAARION Development Team
|
||
**Date:** 2025-11-23
|
||
**Version:** 2.0.0
|
||
**Status:** ✅ Ready for Integration
|
||
|
||
---
|
||
|
||
**Next Steps:**
|
||
1. Update INFRASTRUCTURE.md with new sections
|
||
2. Update infrastructure_quick_ref.ipynb with new cells
|
||
3. Deploy Router Multimodal on NODE1
|
||
4. Deploy Multimodal Services on НОДА2
|
||
5. Configure network access
|
||
6. Run end-to-end tests
|
||
```
|
||
|
||
---
|
||
|
||
## 📝 INFRASTRUCTURE_QUICK_REF.IPYNB UPDATES
|
||
|
||
### Нова секція (додати після існуючих):
|
||
|
||
```python
|
||
# %% [markdown]
|
||
# ## 🎤 Мультимодальні Сервіси (НОДА2)
|
||
|
||
# %%
|
||
multimodal_services = {
|
||
"STT Service": {
|
||
"url": "http://192.168.1.244:8895",
|
||
"technology": "OpenAI Whisper AI",
|
||
"features": ["Voice→Text", "Ukrainian/English/Russian", "Telegram integration"],
|
||
"endpoints": ["/api/stt", "/api/stt/upload", "/health"]
|
||
},
|
||
"OCR Service": {
|
||
"url": "http://192.168.1.244:8896",
|
||
"technology": "Tesseract + EasyOCR",
|
||
"features": ["Image→Text", "Bounding boxes", "6 languages", "Confidence scores"],
|
||
"endpoints": ["/api/ocr", "/api/ocr/upload", "/health"]
|
||
},
|
||
"Web Search": {
|
||
"url": "http://192.168.1.244:8897",
|
||
"technology": "DuckDuckGo + Google",
|
||
"features": ["Real-time search", "Region-specific", "10+ results"],
|
||
"endpoints": ["/api/search", "/health"]
|
||
},
|
||
"Vector DB": {
|
||
"url": "http://192.168.1.244:8898",
|
||
"technology": "ChromaDB + Sentence Transformers",
|
||
"features": ["Vector database", "Semantic search", "RAG support"],
|
||
"endpoints": ["/api/collections", "/api/documents", "/api/search", "/health"]
|
||
}
|
||
}
|
||
|
||
import pandas as pd
|
||
pd.DataFrame(multimodal_services).T
|
||
|
||
# %% [markdown]
|
||
# ## 🤖 Vision Agents (NODE1)
|
||
|
||
# %%
|
||
vision_agents = {
|
||
"Sofia": {
|
||
"model": "grok-4.1",
|
||
"provider": "xAI",
|
||
"supports_vision": True,
|
||
"supports_files": True,
|
||
"description": "Vision + Code analysis"
|
||
},
|
||
"Spectra": {
|
||
"model": "qwen3-vl:latest",
|
||
"provider": "Ollama",
|
||
"supports_vision": True,
|
||
"supports_files": False,
|
||
"description": "Vision + Language"
|
||
}
|
||
}
|
||
|
||
pd.DataFrame(vision_agents).T
|
||
|
||
# %% [markdown]
|
||
# ## 📊 Порти всіх сервісів
|
||
|
||
# %%
|
||
all_ports = {
|
||
"Frontend": {"port": 8899, "node": "Local", "status": "✅"},
|
||
"STT Service": {"port": 8895, "node": "НОДА2", "status": "✅"},
|
||
"OCR Service": {"port": 8896, "node": "НОДА2", "status": "✅"},
|
||
"Web Search": {"port": 8897, "node": "НОДА2", "status": "✅"},
|
||
"Vector DB": {"port": 8898, "node": "НОДА2", "status": "✅"},
|
||
"Router": {"port": 9102, "node": "NODE1", "status": "✅ Multimodal"},
|
||
"Telegram Gateway": {"port": 9200, "node": "NODE1", "status": "✅ Enhanced"},
|
||
"Swapper NODE1": {"port": 8890, "node": "NODE1", "status": "✅"},
|
||
"Swapper NODE2": {"port": 8890, "node": "НОДА2", "status": "✅"}
|
||
}
|
||
|
||
pd.DataFrame(all_ports).T
|
||
|
||
# %% [markdown]
|
||
# ## 🔄 Мультимодальні можливості
|
||
|
||
# %%
|
||
multimodal_capabilities = {
|
||
"Текст": {"frontend": "✅", "telegram": "✅", "status": "ПРАЦЮЄ"},
|
||
"Голос→Текст": {"frontend": "✅", "telegram": "🔄", "status": "ІНТЕГРАЦІЯ"},
|
||
"Зображення→Vision": {"frontend": "✅", "telegram": "🔄", "status": "ІНТЕГРАЦІЯ"},
|
||
"Зображення→OCR": {"frontend": "✅", "telegram": "🔄", "status": "ІНТЕГРАЦІЯ"},
|
||
"Документи": {"frontend": "✅", "telegram": "⚠️", "status": "ЧАСТКОВА"},
|
||
"Веб-пошук": {"frontend": "✅", "telegram": "🔄", "status": "ІНТЕГРАЦІЯ"},
|
||
"Knowledge Base": {"frontend": "✅", "telegram": "❌", "status": "ГОТОВИЙ"}
|
||
}
|
||
|
||
pd.DataFrame(multimodal_capabilities).T
|
||
|
||
# %% [markdown]
|
||
# ## 📅 Версія та оновлення
|
||
|
||
# %%
|
||
version_info = {
|
||
"version": "2.0.0",
|
||
"date": "2025-11-23",
|
||
"major_changes": [
|
||
"Додано STT Service (Whisper AI)",
|
||
"Додано OCR Service (Tesseract + EasyOCR)",
|
||
"Додано Web Search Service",
|
||
"Додано Vector DB Service (ChromaDB)",
|
||
"Розширено Router з multimodal підтримкою",
|
||
"Оновлено Telegram Gateway з STT/Vision"
|
||
],
|
||
"integration_status": "🔄 В процесі (~7-9 годин)"
|
||
}
|
||
|
||
print("Версія:", version_info["version"])
|
||
print("Дата:", version_info["date"])
|
||
print("\nОсновні зміни:")
|
||
for change in version_info["major_changes"]:
|
||
print(f" • {change}")
|
||
print(f"\nСтатус: {version_info['integration_status']}")
|
||
```
|
||
|
||
---
|
||
|
||
## ✅ ГОТОВО
|
||
|
||
Документація підготовлена для оновлення стартових контекстних файлів!
|
||
|