feat(production): sync all modified production files to git
Includes updates across gateway, router, node-worker, memory-service, aurora-service, swapper, sofiia-console UI and node2 infrastructure: - gateway-bot: Dockerfile, http_api.py, druid/aistalk prompts, doc_service - services/router: main.py, router-config.yml, fabric_metrics, memory_retrieval, offload_client, prompt_builder - services/node-worker: worker.py, main.py, config.py, fabric_metrics - services/memory-service: Dockerfile, database.py, main.py, requirements - services/aurora-service: main.py (+399), kling.py, quality_report.py - services/swapper-service: main.py, swapper_config_node2.yaml - services/sofiia-console: static/index.html (console UI update) - config: agent_registry, crewai_agents/teams, router_agents - ops/fabric_preflight.sh: updated preflight checks - router-config.yml, docker-compose.node2.yml: infra updates - docs: NODA1-AGENT-ARCHITECTURE, fabric_contract updated Made-with: Cursor
This commit is contained in:
@@ -155,5 +155,180 @@ STT/TTS/OCR/Image **можуть бути різними** на різних н
|
||||
- **14 контейнерів** (router, node-worker, node-capabilities, nats, gateway, memory, qdrant, postgres, neo4j, redis, open-webui, sofiia-console, swapper)
|
||||
- **13 served моделей** (Ollama: 12 + llama_server: 1)
|
||||
- **29 installed artifacts** на диску (150.3GB LLM + 0.3GB TTS kokoro-v1_0)
|
||||
- **capabilities**: llm=Y, vision=Y, ocr=Y, stt=N, tts=N, image=N
|
||||
- `OCR_PROVIDER=vision_prompted`
|
||||
- **capabilities**: llm=Y, vision=Y, ocr=Y, stt=Y, tts=Y, image=N ← Phase 1 enabled
|
||||
- `STT_PROVIDER=memory_service`, `TTS_PROVIDER=memory_service`, `OCR_PROVIDER=vision_prompted`
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: STT/TTS via Memory Service delegation (2026-02-27)
|
||||
|
||||
### Мотивація
|
||||
|
||||
Увімкнення `stt=true` / `tts=true` в Fabric без нових мікросервісів і без ризику MLX-залежностей.
|
||||
|
||||
### Архітектура
|
||||
|
||||
```
|
||||
Fabric Router → find_nodes_with_capability("stt"/"tts") → NODA2 node-worker
|
||||
→ STT_PROVIDER=memory_service → stt_memory_service.transcribe()
|
||||
→ POST http://memory-service:8000/voice/stt (faster-whisper)
|
||||
→ {text, segments, language, meta}
|
||||
|
||||
Fabric Router → NODA2 node-worker
|
||||
→ TTS_PROVIDER=memory_service → tts_memory_service.synthesize()
|
||||
→ POST http://memory-service:8000/voice/tts (edge-tts: Polina/Ostap Neural uk-UA)
|
||||
→ {audio_b64, format="mp3", meta}
|
||||
```
|
||||
|
||||
### Контракти
|
||||
|
||||
**STT вхід:**
|
||||
```json
|
||||
{
|
||||
"audio_b64": "<base64>", // OR
|
||||
"audio_url": "http://...", // one is required
|
||||
"language": "uk", // optional
|
||||
"filename": "audio.wav" // optional
|
||||
}
|
||||
```
|
||||
|
||||
**STT вихід (fabric contract):**
|
||||
```json
|
||||
{"text": "...", "segments": [], "language": "uk", "meta": {...}, "provider": "memory_service"}
|
||||
```
|
||||
|
||||
**TTS вхід:**
|
||||
```json
|
||||
{"text": "...", "voice": "Polina", "speed": 1.0}
|
||||
```
|
||||
|
||||
**TTS вихід (fabric contract):**
|
||||
```json
|
||||
{"audio_b64": "<base64-mp3>", "format": "mp3", "meta": {...}, "provider": "memory_service"}
|
||||
```
|
||||
|
||||
### Обмеження Phase 1
|
||||
|
||||
- **ffmpeg=false**: лише формати що Memory Service ковтає нативно (WAV рекомендований)
|
||||
- **Текст TTS**: max 500 символів (Memory Service limit)
|
||||
- **Голоси TTS**: Polina (uk-UA-PolinaNeural), Ostap (uk-UA-OstapNeural), en-US-GuyNeural
|
||||
- **NODA1**: залишається `STT_PROVIDER=none` / `TTS_PROVIDER=none` (не заважає роутингу)
|
||||
|
||||
### Phase 2 (MLX upgrade — опційний)
|
||||
|
||||
Встановити `STT_PROVIDER=mlx_whisper` та/або `TTS_PROVIDER=mlx_kokoro` в docker-compose коли:
|
||||
- готовий ffmpeg або чітко обмежені формати
|
||||
- потрібний якісніший локальний TTS замість edge-tts
|
||||
- NODA2 Apple Silicon виграш від MLX
|
||||
|
||||
---
|
||||
|
||||
## Voice HA (Multi-node routing) — PR1–PR3
|
||||
|
||||
### Архітектура
|
||||
|
||||
```
|
||||
Browser → sofiia-console /api/voice/tts
|
||||
↓ VOICE_HA_ENABLED=false (default)
|
||||
memory-service:8000/voice/tts ← legacy direct
|
||||
|
||||
↓ VOICE_HA_ENABLED=true
|
||||
Router /v1/capability/voice_tts
|
||||
↓ (caps + scoring)
|
||||
node.{id}.voice.tts.request (NATS)
|
||||
↓
|
||||
node-worker (voice semaphore)
|
||||
↓
|
||||
memory-service/voice/tts
|
||||
```
|
||||
|
||||
### NATS Subjects (Voice HA — відокремлені від generic)
|
||||
|
||||
| Subject | Призначення |
|
||||
|---|---|
|
||||
| `node.{id}.voice.tts.request` | Voice TTS offload (окремий semaphore) |
|
||||
| `node.{id}.voice.llm.request` | Voice LLM inference (голосові guardrails) |
|
||||
| `node.{id}.voice.stt.request` | Voice STT transcription |
|
||||
|
||||
**Сумісність:** generic subjects (`node.{id}.tts.request` etc.) — незмінні.
|
||||
|
||||
### Capability Flags
|
||||
|
||||
Node Worker `/caps` повертає:
|
||||
```json
|
||||
{
|
||||
"capabilities": {
|
||||
"tts": true,
|
||||
"voice_tts": true,
|
||||
"voice_llm": true,
|
||||
"voice_stt": true
|
||||
},
|
||||
"voice_concurrency": {
|
||||
"voice_tts": 4,
|
||||
"voice_llm": 2,
|
||||
"voice_stt": 2
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`voice_tts=true` лише коли `TTS_PROVIDER != none` **і** NATS subscription активна.
|
||||
NCS агрегує ці флаги через `_derive_capabilities()`.
|
||||
|
||||
### Router Endpoints
|
||||
|
||||
| Endpoint | Дедлайн | Суб'єкт |
|
||||
|---|---|---|
|
||||
| `POST /v1/capability/voice_tts` | 3000ms | `node.{id}.voice.tts.request` |
|
||||
| `POST /v1/capability/voice_llm` | 9000ms (fast) / 12000ms (quality) | `node.{id}.voice.llm.request` |
|
||||
| `POST /v1/capability/voice_stt` | 6000ms | `node.{id}.voice.stt.request` |
|
||||
|
||||
Response headers: `X-Voice-Node`, `X-Voice-Mode` (local|remote), `X-Voice-Cap`.
|
||||
|
||||
### Scoring
|
||||
|
||||
```
|
||||
score = wait_ms + rtt_ms + p95_ms + mem_penalty - local_bonus
|
||||
mem_penalty = 300 if mem_pressure == "high"
|
||||
local_bonus = VOICE_PREFER_LOCAL_BONUS (default 200ms)
|
||||
```
|
||||
|
||||
Якщо `score_local <= score_best_remote + LOCAL_THRESHOLD_MS` → вибирається локальна нода.
|
||||
|
||||
### BFF Feature Flag
|
||||
|
||||
```yaml
|
||||
# docker-compose.node2-sofiia.yml
|
||||
VOICE_HA_ENABLED: "false" # default — legacy direct path
|
||||
VOICE_HA_ROUTER_URL: "http://router:8000" # Router для HA offload
|
||||
```
|
||||
|
||||
Активація: `VOICE_HA_ENABLED=true` + rebuild `sofiia-console`.
|
||||
Деактивація: `VOICE_HA_ENABLED=false` — повертається до direct memory-service.
|
||||
|
||||
### Метрики (Prometheus)
|
||||
|
||||
**node-worker** (`/prom_metrics`):
|
||||
- `node_worker_voice_jobs_total{cap,status}`
|
||||
- `node_worker_voice_inflight{cap}`
|
||||
- `node_worker_voice_latency_ms{cap}` (histogram)
|
||||
|
||||
**router** (`/fabric_metrics`):
|
||||
- `fabric_voice_capability_requests_total{cap,status}`
|
||||
- `fabric_voice_offload_total{cap,node,status}`
|
||||
- `fabric_voice_breaker_state{cap,node}` (1=open)
|
||||
- `fabric_voice_score_ms{cap}` (histogram)
|
||||
|
||||
### Контракт: No Silent Fallback
|
||||
|
||||
- Будь-який fallback (busy, broken, timeout) логує `WARNING` + інкрементує Prometheus counter
|
||||
- `TOO_BUSY` включає `retry_after_ms` hint для Router failover
|
||||
- Circuit breaker per `node+voice_cap` — не змішується з generic CB
|
||||
|
||||
### Тести
|
||||
|
||||
`tests/test_voice_ha.py` — 28 тестів:
|
||||
- Node Worker voice caps + semaphore isolation
|
||||
- Router fabric_metrics voice helpers
|
||||
- BFF `VOICE_HA_ENABLED` feature flag
|
||||
- Voice scoring logic (local prefer, mem penalty, remote wins when saturated)
|
||||
- No silent fallback invariants
|
||||
|
||||
Reference in New Issue
Block a user