Apple
0603184524
feat(sofiia-console): add safe script executor for allowlisted runbook steps
...
- adds safe_executor.py: REPO_ROOT confinement, strict script allowlist,
env key allowlist (STRICT/SOFIIA_URL/BFF_A/BFF_B/NODE_ID/AGENT_ID),
stdin=DEVNULL, 8KB output cap, timeout clamp (max 300s), non-root warn
- integrates script action_type into runbook_runner: next_step handles
http_check and script branches; running_as_root -> step_status=warn
- extends runbook_parser: rehearsal-v1 now includes 3 built-in script steps
(preflight, idempotency smoke, generate evidence) after http_checks
- adds tests/test_sofiia_safe_executor.py: 12 tests covering path traversal,
absolute path, non-allowlist, env drop, timeout, exit_code, mocked subprocess
Made-with: Cursor
2026-03-03 04:57:22 -08:00
Apple
ad8bddf595
feat(sofiia-console): add guided runbook runner with http checks and audit integration
...
adds runbook_runs/runbook_steps state machine
parses markdown runbooks into guided steps
supports allowlisted http_check (health/metrics/audit)
integrates runbook execution with audit trail
exposes authenticated runbook runs API
Made-with: Cursor
2026-03-03 04:49:19 -08:00
Apple
4db1774a34
feat(sofiia-console): rank runbook search results with bm25
...
FTS path: score = bm25(docs_chunks_fts), ORDER BY score ASC; LIKE fallback: score null; test asserts score key present
Made-with: Cursor
2026-03-03 04:36:52 -08:00
Apple
63fec4371a
feat(sofiia-console): add runbooks index status endpoint
...
GET /api/runbooks/status returns docs_root, indexed_files, indexed_chunks, last_indexed_at, fts_available; docs_index_meta table and set on rebuild
Made-with: Cursor
2026-03-03 04:35:18 -08:00
Apple
ef3ff80645
feat(sofiia-console): add docs index and runbook search API (FTS5)
...
adds SQLite docs index (files/chunks + FTS5) and CLI rebuild
exposes authenticated runbook search/preview/raw endpoints
Made-with: Cursor
2026-03-03 04:26:34 -08:00
Apple
bddb6cd75a
docs(dev): index release evidence template in runbook README
...
Made-with: Cursor
2026-03-03 04:00:15 -08:00
Apple
3c199be6d3
docs(dev): index release and rehearsal runbooks in docs/runbook
...
Made-with: Cursor
2026-03-03 03:55:29 -08:00
Apple
55a5e541df
docs(dev): add v1 30-min rehearsal execution checklist
...
includes preflight, restart, smoke, observation, evidence steps
defines success criteria and metrics to collect for next-step decision
Made-with: Cursor
2026-03-03 03:54:53 -08:00
Apple
ad74e4c0ba
docs(dev): add sofiia-console post-release review template
...
Made-with: Cursor
2026-03-02 10:20:24 -08:00
Apple
3df414d35a
docs(dev): add sofiia-console v1 technical release announcement
...
Made-with: Cursor
2026-03-02 10:17:53 -08:00
Apple
e75fd334bf
ops(dev): add release evidence auto-generator script
...
Made-with: Cursor
2026-03-02 10:13:06 -08:00
Apple
47073ba761
docs(dev): add release runbook for sofiia-console
...
Made-with: Cursor
2026-03-02 10:00:08 -08:00
Apple
6a0d2ff103
ops(dev): extend preflight with audit retention checks
...
Made-with: Cursor
2026-03-02 09:59:22 -08:00
Apple
1d18634c01
ops(dev): add audit retention pruning script
...
Made-with: Cursor
2026-03-02 09:47:39 -08:00
Apple
e2c2333b6f
feat(sofiia-console): protect audit endpoint with admin token
...
Made-with: Cursor
2026-03-02 09:42:10 -08:00
Apple
11e0ba7264
feat(sofiia-console): add audit query endpoint with cursor pagination
...
Made-with: Cursor
2026-03-02 09:36:11 -08:00
Apple
9e70fc83d2
ops(dev): add secrets rotation runbook and sofiia-console preflight checks
...
Made-with: Cursor
2026-03-02 09:32:18 -08:00
Apple
3246440ac8
feat(sofiia-console): add audit trail for operator actions
...
Made-with: Cursor
2026-03-02 09:29:14 -08:00
Apple
9b89ace2fc
feat(sofiia-console): add rate limiting for chat send (per-chat and per-operator)
...
Made-with: Cursor
2026-03-02 09:24:21 -08:00
Apple
de8002eacd
ops(dev): add redis idempotency A/B smoke script
...
Made-with: Cursor
2026-03-02 09:14:28 -08:00
Apple
d85aa507a2
docs(dev): add redis docker-compose smoke snippet for sofiia-console
...
Made-with: Cursor
2026-03-02 09:11:45 -08:00
Apple
9f085509dd
test(sofiia-console): cover redis idempotency backend
...
Made-with: Cursor
2026-03-02 09:08:54 -08:00
Apple
3b16739671
feat(sofiia-console): add RedisIdempotencyStore backend
...
Made-with: Cursor
2026-03-02 09:08:52 -08:00
Apple
0b30775ac1
feat(sofiia-console): add structured json logging for chat ops
...
Made-with: Cursor
2026-03-02 08:24:54 -08:00
Apple
98555aa483
test(sofiia-console): add multi-node e2e routing test
...
Made-with: Cursor
2026-03-02 08:18:59 -08:00
Apple
e504df7dfa
feat(sofiia-console): harden cursor pagination with tie-breaker
...
Version cursor payloads and keep backward compatibility while adding dedicated tie-breaker regression coverage for equal timestamps to prevent pagination duplicates and gaps.
Made-with: Cursor
2026-03-02 08:12:19 -08:00
Apple
0c626943d6
refactor(sofiia-console): extract idempotency store abstraction
...
Move idempotency TTL/LRU logic into a dedicated store module with a swap-ready interface and wire chat send flow to use store get/set semantics without changing API behavior.
Made-with: Cursor
2026-03-02 08:11:13 -08:00
Apple
b9c548f1a6
test(sofiia-console): cover noda2 router_url fallback in legacy local run
...
Add regression coverage for router URL resolution when NODE_ID is unset and ROUTER_URL is present, and verify explicit NODES_NODA2_ROUTER_URL keeps higher priority.
Made-with: Cursor
2026-03-02 08:00:35 -08:00
Apple
93f94030f4
feat(sofiia-console): expose /metrics and add basic ops counters
...
Expose Prometheus-style metrics endpoint and add counters for send requests, idempotency replays, and cursor pagination calls, including a safe in-process fallback exposition when prometheus_client is unavailable.
Made-with: Cursor
2026-03-02 04:52:04 -08:00
Apple
d9ce366538
feat(sofiia-console): idempotency_key, cursor pagination, and noda2 router fallback
...
Add BFF runtime support for chat idempotency (header priority over body) with bounded in-memory TTL/LRU replay cache, implement cursor-based pagination for chats and messages, and add a safe NODA2 local router fallback for legacy runs without NODE_ID.
Made-with: Cursor
2026-03-02 04:14:58 -08:00
Apple
5a886a56ca
test(sofiia-console): cover idempotency and cursor pagination contracts
...
Add focused API contract tests for chat idempotency, cursor pagination, and node routing behavior using isolated local fixtures and mocked upstream inference.
Made-with: Cursor
2026-03-02 04:03:30 -08:00
Apple
f16bab2cb9
chore(aurora): support keychain/env loading for kling credentials on launchd
2026-03-01 06:26:17 -08:00
Apple
1ea4464838
feat(aurora-smart): add dual-stack orchestration with policy, audit, and UI toggle
2026-03-01 06:21:17 -08:00
Apple
5b4c4f92ba
feat(aurora): add detection overlays with face/plate boxes in compare UI
2026-03-01 05:00:29 -08:00
Apple
79f26ab683
feat(aurora-ui): add interactive pre-analysis controls and quality report
2026-03-01 04:10:10 -08:00
Apple
fe0f2e23c2
feat(aurora): expose quality report API and proxy via sofiia console
2026-03-01 03:59:54 -08:00
Apple
c230abe9cf
fix(aurora): harden Kling integration and surface config diagnostics
2026-03-01 03:55:16 -08:00
Apple
ff97d3cf4a
fix(console): route Aurora Kling enhance via standard proxy base URL
2026-03-01 03:48:19 -08:00
Apple
4e9091b96c
fix(aurora): avoid port clash with native launchd instance on NODA2
2026-03-01 03:36:47 -08:00
Apple
91559a720b
fix(node2): mount config into router for tool governance policies
2026-03-01 03:27:08 -08:00
Apple
49afb1df99
docs(audit): add NODA2 Sofiia tools audit and full matrix
2026-03-01 01:42:57 -08:00
Apple
57632699c0
chore(cleanup): remove obsolete compose version and trim router Dockerfile
2026-03-01 01:37:30 -08:00
Apple
de234112f3
feat(node2): wire calendar-service and core automation tools in router
2026-03-01 01:37:13 -08:00
Apple
9a36020316
P3.5-P3.7: 2-layer inventory, capability routing, STT/TTS adapters, Dev Contract
...
NCS:
- _collect_worker_caps() fetches capability flags from node-worker /caps
- _derive_capabilities() merges served model types + worker provider flags
- installed_artifacts replaces inventory_only (disk scan with DISK_SCAN_PATHS env)
- New endpoints: /capabilities/caps, /capabilities/installed
Node Worker:
- STT_PROVIDER, TTS_PROVIDER, OCR_PROVIDER, IMAGE_PROVIDER env flags
- /caps endpoint returns capabilities + providers for NCS aggregation
- STT adapter (providers/stt_mlx_whisper.py) — remote + local mode
- TTS adapter (providers/tts_mlx_kokoro.py) — remote + local mode
- OCR handler via vision_prompted (ollama_vision with OCR prompt)
- NATS subjects: node.{id}.stt/tts/ocr/image.request
Router:
- POST /v1/capability/{stt,tts,ocr,image} — capability-based offload routing
- GET /v1/capabilities — global view with capabilities_by_node
- require_fresh_caps(ttl) preflight guard
- find_nodes_with_capability(cap) + load-based node selection
Ops:
- ops/fabric_snapshot.py — full runtime snapshot collector
- ops/fabric_preflight.sh — quick check + snapshot save + diff
- docs/fabric_contract.md — Dev Contract v0.1 (preflight-first)
- tests/test_fabric_contract.py — CI enforcement (6 tests)
Made-with: Cursor
2026-02-27 05:24:09 -08:00
Apple
194c87f53c
feat(fabric): decommission Swapper from critical path, NCS = source of truth
...
- Node Worker: replace swapper_vision with ollama_vision (direct Ollama API)
- Node Worker: add NATS subjects for stt/tts/image (stubs ready)
- Node Worker: remove SWAPPER_URL dependency from config
- Router: vision calls go directly to Ollama /api/generate with images
- Router: local LLM calls go directly to Ollama /api/generate
- Router: add OLLAMA_URL and PREFER_NODE_WORKER=true feature flag
- Router: /v1/models now uses NCS global capabilities pool
- NCS: SWAPPER_URL="" -> skip Swapper probing (status=disabled)
- Swapper configs: remove all hardcoded model lists, keep only runtime
URLs, timeouts, limits
- docker-compose.node1.yml: add OLLAMA_URL, PREFER_NODE_WORKER for router;
SWAPPER_URL= for NCS; remove swapper-service from node-worker depends_on
- docker-compose.node2-sofiia.yml: same changes for NODA2
Swapper service still runs but is NOT in the critical inference path.
Source of truth for models is now NCS -> Ollama /api/tags.
Made-with: Cursor
2026-02-27 04:16:16 -08:00
Apple
90080c632a
fix(fabric): use broadcast subject for NATS capabilities discovery
...
NATS wildcards (node.*.capabilities.get) only work for subscriptions,
not for publish. Switch to a dedicated broadcast subject
(fabric.capabilities.discover) that all NCS instances subscribe to,
enabling proper scatter-gather discovery across nodes.
Made-with: Cursor
2026-02-27 03:20:13 -08:00
Apple
a6531507df
merge: integrate remote codex/sync-node1-runtime with fabric layer changes
...
Resolve conflicts in docker-compose.node1.yml, services/router/main.py,
and gateway-bot/services/doc_service.py — keeping both fabric layer
(NCS, node-worker, Prometheus) and document ingest/query endpoints.
Made-with: Cursor
2026-02-27 03:09:12 -08:00
Apple
ed7ad49d3a
P3.2+P3.3+P3.4: NODA1 node-worker + NATS auth config + Prometheus counters
...
P3.2 — Multi-node deployment:
- Added node-worker service to docker-compose.node1.yml (NODE_ID=noda1)
- NCS NODA1 now has NODE_WORKER_URL for metrics collection
- Fixed NODE_ID consistency: router NODA1 uses 'noda1'
- NODA2 node-worker/NCS gets NCS_REPORT_URL for latency reporting
P3.3 — NATS accounts/auth (opt-in config):
- config/nats-server.conf with 3 accounts: SYS, FABRIC, APP
- Per-user topic permissions (router, ncs, node_worker)
- Leafnode listener :7422 with auth
- Not yet activated (requires credential provisioning)
P3.4 — Prometheus counters:
- Router /fabric_metrics: caps_refresh, caps_stale, model_select,
offload_total, breaker_state, score_ms histogram
- Node Worker /prom_metrics: jobs_total, inflight gauge, latency_ms histogram
- NCS /prom_metrics: runtime_health, runtime_p50/p95, node_wait_ms
- All bound to 127.0.0.1 (not externally exposed)
Made-with: Cursor
2026-02-27 03:03:18 -08:00
Apple
a605b8c43e
P3.1: GPU/Queue-aware routing — NCS metrics + scoring-based model selection
...
NCS (services/node-capabilities/metrics.py):
- NodeLoad: inflight_jobs, queue_depth, concurrency_limit, estimated_wait_ms,
cpu_load_1m, mem_pressure (macOS + Linux), rtt_ms_to_hub
- RuntimeLoad: per-runtime healthy, p50_ms, p95_ms from rolling 50-sample window
- POST /capabilities/report_latency for node-worker → NCS reporting
- NCS fetches worker metrics via NODE_WORKER_URL
Node Worker:
- GET /metrics endpoint (inflight, concurrency, latency buffers)
- Latency tracking per job type (llm/vision) with rolling buffer
- Fire-and-forget latency reporting to NCS after each successful job
Router (model_select v3):
- score_candidate(): wait + model_latency + cross_node_penalty + prefer_bonus
- LOCAL_THRESHOLD_MS=250: prefer local if within threshold of remote
- ModelSelection.score field for observability
- Structured [score] logs with chosen node, model, and score breakdown
Tests: 19 new (12 scoring + 7 NCS metrics), 36 total pass
Docs: ops/runbook_p3_1.md, ops/CHANGELOG_FABRIC.md
No breaking changes to JobRequest/JobResponse or capabilities schema.
Made-with: Cursor
2026-02-27 02:55:44 -08:00
Apple
c4b94a327d
P2.2+P2.3: NATS offload node-worker + router offload integration
...
Node Worker (services/node-worker/):
- NATS subscriber for node.{NODE_ID}.llm.request / vision.request
- Canonical JobRequest/JobResponse envelope (Pydantic)
- Idempotency cache (TTL 10min) with inflight dedup
- Deadline enforcement (DEADLINE_EXCEEDED on expired jobs)
- Concurrency limiter (semaphore, returns busy)
- Ollama + Swapper vision providers
Router offload (services/router/offload_client.py):
- NATS req/reply with configurable retries
- Circuit breaker per node+type (3 fails/60s → open 120s)
- Concurrency semaphore for remote requests
Model selection (services/router/model_select.py):
- exclude_nodes parameter for circuit-broken nodes
- force_local flag for fallback re-selection
- Integrated circuit breaker state awareness
Router /infer pipeline:
- Remote offload path when NCS selects remote node
- Automatic fallback: exclude failed node → force_local re-select
- Deadline propagation from router to node-worker
Tests: 17 unit tests (idempotency, deadline, circuit breaker)
Docs: ops/offload_routing.md (subjects, envelope, verification)
Made-with: Cursor
2026-02-27 02:44:05 -08:00