feat: MD pipeline — market-data-service hardening + SenpAI NATS consumer

Producer (market-data-service):
- Backpressure: smart drop policy (heartbeats→quotes→trades preserved)
- Heartbeat monitor: synthetic HeartbeatEvent on provider silence
- Graceful shutdown: WS→bus→storage→DB engine cleanup sequence
- Bybit V5 public WS provider (backup for Binance, no API key needed)
- FailoverManager: health-based provider switching with recovery
- NATS output adapter: md.events.{type}.{symbol} for SenpAI
- /bus-stats endpoint for backpressure monitoring
- Dockerfile + docker-compose.node1.yml integration
- 36 tests (parsing + bus + failover), requirements.lock

Consumer (senpai-md-consumer):
- NATSConsumer: subscribe md.events.>, queue group senpai-md, backpressure
- State store: LatestState + RollingWindow (deque, 60s)
- Feature engine: 11 features (mid, spread, VWAP, return, vol, latency)
- Rule-based signals: long/short on return+volume+spread conditions
- Publisher: rate-limited features + signals + alerts to NATS
- HTTP API: /health, /metrics, /state/latest, /features/latest, /stats
- 10 Prometheus metrics
- Dockerfile + docker-compose.senpai.yml
- 41 tests (parsing + state + features + rate-limit), requirements.lock

CI: ruff + pytest + smoke import for both services
Tests: 77 total passed, lint clean
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Apple
2026-02-09 11:46:15 -08:00
parent c50843933f
commit 09dee24342
47 changed files with 3930 additions and 56 deletions

View File

@@ -0,0 +1,8 @@
.venv/
__pycache__/
*.pyc
.pytest_cache/
.ruff_cache/
.env
tests/
.git/

View File

@@ -0,0 +1,38 @@
# SenpAI Market-Data Consumer Configuration
# Copy to .env and adjust as needed
# ── NATS ──────────────────────────────────────────────────────────────
NATS_URL=nats://localhost:4222
NATS_SUBJECT=md.events.>
NATS_QUEUE_GROUP=senpai-md
USE_JETSTREAM=false
# ── Internal queue ────────────────────────────────────────────────────
QUEUE_SIZE=50000
QUEUE_DROP_THRESHOLD=0.9
# ── Features / signals ───────────────────────────────────────────────
FEATURES_ENABLED=true
FEATURES_PUB_RATE_HZ=10
FEATURES_PUB_SUBJECT=senpai.features
SIGNALS_PUB_SUBJECT=senpai.signals
ALERTS_PUB_SUBJECT=senpai.alerts
# ── Rolling window ───────────────────────────────────────────────────
ROLLING_WINDOW_SECONDS=60.0
# ── Signal rules ─────────────────────────────────────────────────────
SIGNAL_RETURN_THRESHOLD=0.003
SIGNAL_VOLUME_THRESHOLD=1.0
SIGNAL_SPREAD_MAX_BPS=20.0
# ── Alert thresholds ─────────────────────────────────────────────────
ALERT_LATENCY_MS=1000.0
ALERT_GAP_SECONDS=30.0
# ── HTTP API ─────────────────────────────────────────────────────────
HTTP_HOST=0.0.0.0
HTTP_PORT=8892
# ── Logging ──────────────────────────────────────────────────────────
LOG_LEVEL=INFO

View File

@@ -0,0 +1,20 @@
# ── SenpAI Market-Data Consumer ─────────────────────────────────────────
FROM python:3.11-slim
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY senpai/ ./senpai/
COPY pyproject.toml .
HEALTHCHECK --interval=15s --timeout=5s --start-period=10s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8892/health')" || exit 1
EXPOSE 8892
ENTRYPOINT ["python", "-m", "senpai.md_consumer"]

View File

@@ -0,0 +1,242 @@
# SenpAI Market-Data Consumer
NATS subscriber + feature engine + signal bus for the SenpAI/Gordon trading agent.
Consumes normalised events from `market-data-service`, computes real-time features, and publishes signals back to NATS.
## Architecture
```
market-data-service SenpAI MD Consumer
┌──────────────┐ ┌────────────────────────────────┐
│ Binance WS │ │ │
│ Bybit WS │──► NATS ──────────► NATSConsumer │
│ Alpaca WS │ md.events.> │ ↓ (bounded queue) │
└──────────────┘ │ State Store │
│ ├─ LatestState (trade/quote)│
│ └─ RollingWindow (60s deque)│
│ ↓ │
│ Feature Engine │
│ ├─ mid, spread, vwap │
│ ├─ return_10s, vol_60s │
│ └─ latency p50/p95 │
│ ↓ │
│ Publisher ──► NATS │
│ ├─ senpai.features.{symbol} │
│ ├─ senpai.signals.{symbol} │
│ └─ senpai.alerts │
│ │
│ HTTP API (:8892) │
│ /health /metrics /stats │
│ /state/latest /features │
└────────────────────────────────┘
```
## Quick Start
### 1. Install
```bash
cd services/senpai-md-consumer
pip install -r requirements.txt
cp .env.example .env
```
### 2. Start NATS (if not running)
```bash
docker run -d --name nats -p 4222:4222 -p 8222:8222 nats:2.10-alpine --js -m 8222
```
### 3. Start market-data-service (producer)
```bash
cd ../market-data-service
python -m app run --provider binance --symbols BTCUSDT,ETHUSDT
```
### 4. Start SenpAI MD Consumer
```bash
cd ../senpai-md-consumer
python -m senpai.md_consumer
```
### 5. Verify
```bash
# Health
curl http://localhost:8892/health
# Stats
curl http://localhost:8892/stats
# Latest state
curl "http://localhost:8892/state/latest?symbol=BTCUSDT"
# Computed features
curl "http://localhost:8892/features/latest?symbol=BTCUSDT"
# Prometheus metrics
curl http://localhost:8892/metrics
```
## Docker
### Standalone (with NATS)
```bash
docker-compose -f docker-compose.senpai.yml up -d
```
### Part of NODE1 stack
```bash
docker-compose -f docker-compose.node1.yml up -d market-data-service senpai-md-consumer
```
## NATS Subjects
### Consumed (from market-data-service)
| Subject | Description |
|---|---|
| `md.events.trade.{symbol}` | Trade events |
| `md.events.quote.{symbol}` | Quote events |
| `md.events.book_l2.{symbol}` | L2 book snapshots |
| `md.events.heartbeat.__system__` | Provider heartbeats |
### Published (for SenpAI/other consumers)
| Subject | Description |
|---|---|
| `senpai.features.{symbol}` | Feature snapshots (rate-limited to 10Hz/symbol) |
| `senpai.signals.{symbol}` | Trade signals (long/short) |
| `senpai.alerts` | System alerts (latency, gaps, backpressure) |
## Features Computed
| Feature | Description |
|---|---|
| `mid` | (bid + ask) / 2 |
| `spread_abs` | ask - bid |
| `spread_bps` | spread in basis points |
| `trade_vwap_10s` | VWAP over 10 seconds |
| `trade_vwap_60s` | VWAP over 60 seconds |
| `trade_count_10s` | Number of trades in 10s |
| `trade_volume_10s` | Total volume in 10s |
| `return_10s` | Price return over 10 seconds |
| `realized_vol_60s` | Realised volatility (60s log-return std) |
| `latency_ms_p50` | p50 exchange-to-receive latency |
| `latency_ms_p95` | p95 exchange-to-receive latency |
## Signal Rules (MVP)
**Long signal** emitted when ALL conditions met:
- `return_10s > 0.3%` (configurable)
- `trade_volume_10s > 1.0` (configurable)
- `spread_bps < 20` (configurable)
**Short signal**: same but `return_10s < -0.3%`
## Backpressure Policy
- Queue < 90% → accept all events
- Queue >= 90% → drop heartbeats, quotes, book snapshots
- **Trades are NEVER dropped**
## HTTP Endpoints
| Endpoint | Description |
|---|---|
| `GET /health` | Service health + tracked symbols |
| `GET /metrics` | Prometheus metrics |
| `GET /state/latest?symbol=` | Latest trade + quote |
| `GET /features/latest?symbol=` | Current computed features |
| `GET /stats` | Queue fill, drops, events/sec |
## Prometheus Metrics
| Metric | Type | Description |
|---|---|---|
| `senpai_events_in_total` | Counter | Events received {type, provider} |
| `senpai_events_dropped_total` | Counter | Dropped events {reason, type} |
| `senpai_queue_fill_ratio` | Gauge | Queue fill 0..1 |
| `senpai_processing_latency_ms` | Histogram | Processing latency |
| `senpai_feature_publish_total` | Counter | Feature publishes {symbol} |
| `senpai_signals_emitted_total` | Counter | Signals {symbol, direction} |
| `senpai_nats_connected` | Gauge | NATS connection status |
## Tests
```bash
pytest tests/ -v
```
41 tests:
- 11 model parsing tests (tolerant parsing, edge cases)
- 10 state/rolling window tests (eviction, lookup)
- 16 feature math tests (VWAP, vol, signals, percentile)
- 5 rate-limit tests (publish throttling, error handling)
## Troubleshooting
### NATS connection refused
```
nats.error: error=could not connect to server
```
Ensure NATS is running:
```bash
docker run -d --name nats -p 4222:4222 nats:2.10-alpine --js
```
Or check `NATS_URL` in `.env`.
### No events arriving (queue stays at 0)
1. Verify `market-data-service` is running and `NATS_ENABLED=true`
2. Check subject match: producer publishes to `md.events.trade.BTCUSDT`, consumer subscribes to `md.events.>`
3. Check NATS monitoring: `curl http://localhost:8222/connz` — both services should appear
### JetStream errors
If `USE_JETSTREAM=true` but NATS started without `--js`:
```bash
# Restart NATS with JetStream
docker rm -f nats
docker run -d -p 4222:4222 -p 8222:8222 nats:2.10-alpine --js -m 8222
```
Or set `USE_JETSTREAM=false` for core NATS (simpler, works for MVP).
### Port 8892 already in use
```bash
lsof -ti:8892 | xargs kill -9
```
### Features show `null` for all values
Normal on startup — features populate after first trade+quote arrive. Wait a few seconds for Binance data to flow through.
### No signals emitted
Signal rules require ALL conditions simultaneously:
- `return_10s > 0.3%` — needs price movement
- `volume_10s > 1.0` — needs trading activity
- `spread_bps < 20` — needs tight spread
In low-volatility markets, signals may be rare. Lower thresholds in `.env` for testing:
```env
SIGNAL_RETURN_THRESHOLD=0.001
SIGNAL_VOLUME_THRESHOLD=0.1
```
### High memory usage
Rolling windows grow per symbol. With many symbols, reduce window:
```env
ROLLING_WINDOW_SECONDS=30
```
## Configuration (ENV)
See `.env.example` for all available settings.
Key settings:
- `NATS_URL` — NATS server URL
- `FEATURES_PUB_RATE_HZ` — max feature publishes per symbol per second
- `SIGNAL_RETURN_THRESHOLD` — min return for signal trigger
- `ROLLING_WINDOW_SECONDS` — rolling window duration

View File

@@ -0,0 +1,40 @@
# SenpAI Market-Data Consumer + NATS
# Usage: docker-compose -f docker-compose.senpai.yml up -d
version: "3.8"
services:
nats:
image: nats:2.10-alpine
container_name: senpai-nats
ports:
- "4222:4222"
- "8222:8222" # monitoring
command: ["--js", "-m", "8222"]
restart: unless-stopped
senpai-md-consumer:
container_name: senpai-md-consumer
build:
context: .
dockerfile: Dockerfile
environment:
- NATS_URL=nats://nats:4222
- NATS_SUBJECT=md.events.>
- NATS_QUEUE_GROUP=senpai-md
- FEATURES_ENABLED=true
- FEATURES_PUB_RATE_HZ=10
- LOG_LEVEL=INFO
- HTTP_PORT=8892
ports:
- "8892:8892"
depends_on:
- nats
restart: unless-stopped
healthcheck:
test:
- CMD-SHELL
- python -c "import urllib.request; urllib.request.urlopen('http://localhost:8892/health')"
interval: 15s
timeout: 5s
retries: 3
start_period: 10s

View File

@@ -0,0 +1,13 @@
[project]
name = "senpai-md-consumer"
version = "0.1.0"
description = "SenpAI market-data consumer — NATS subscriber, feature engine, signal bus"
requires-python = ">=3.11"
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]
[tool.ruff]
target-version = "py311"
line-length = 100

View File

@@ -0,0 +1,16 @@
# Auto-generated pinned dependencies — 2026-02-09
# Install: pip install -r requirements.txt -c requirements.lock
annotated-types==0.7.0
nats-py==2.13.1
prometheus_client==0.24.1
pydantic==2.12.5
pydantic-settings==2.12.0
pydantic_core==2.41.5
python-dotenv==1.2.1
structlog==25.5.0
typing-inspection==0.4.2
typing_extensions==4.15.0
# Dev
pytest==9.0.2
pytest-asyncio==1.3.0
ruff==0.15.0

View File

@@ -0,0 +1,22 @@
# SenpAI Market-Data Consumer
# Python 3.11+
# Core
pydantic>=2.5
pydantic-settings>=2.1
# NATS
nats-py>=2.7
# Logging
structlog>=24.1
# Metrics
prometheus_client>=0.20
# Testing
pytest>=8.0
pytest-asyncio>=0.23
# Linting
ruff>=0.3

View File

@@ -0,0 +1,4 @@
"""Allow running as: python -m senpai.md_consumer"""
from senpai.md_consumer.main import cli
cli()

View File

@@ -0,0 +1,166 @@
"""
Minimal HTTP API — lightweight asyncio server (no framework dependency).
Endpoints:
GET /health → service health
GET /metrics → Prometheus metrics
GET /state/latest → latest trade/quote per symbol (?symbol=BTCUSDT)
GET /features/latest → latest computed features (?symbol=BTCUSDT)
GET /stats → queue fill, drops, events/sec
"""
from __future__ import annotations
import asyncio
import json
import logging
from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
from senpai.md_consumer.config import settings
from senpai.md_consumer.features import compute_features
from senpai.md_consumer.state import LatestState
logger = logging.getLogger(__name__)
# These are set by main.py at startup
_state: LatestState | None = None
_stats_fn = None # callable → dict
def set_state(state: LatestState) -> None:
global _state
_state = state
def set_stats_fn(fn) -> None:
global _stats_fn
_stats_fn = fn
async def _handler(reader: asyncio.StreamReader, writer: asyncio.StreamWriter):
"""Minimal HTTP request handler."""
try:
request_line = await asyncio.wait_for(reader.readline(), timeout=5.0)
request_str = request_line.decode("utf-8", errors="replace").strip()
parts = request_str.split()
if len(parts) < 2:
writer.close()
return
path = parts[1]
# Consume headers
while True:
line = await reader.readline()
if line in (b"\r\n", b"\n", b""):
break
# Parse query params
query_params: dict[str, str] = {}
if "?" in path:
base_path, query = path.split("?", 1)
for param in query.split("&"):
if "=" in param:
k, v = param.split("=", 1)
query_params[k] = v
else:
base_path = path
body, content_type, status = await _route(base_path, query_params)
response = (
f"HTTP/1.1 {status}\r\n"
f"Content-Type: {content_type}\r\n"
f"Content-Length: {len(body)}\r\n"
f"Connection: close\r\n"
f"\r\n"
)
writer.write(response.encode() + body)
await writer.drain()
except Exception:
pass
finally:
try:
writer.close()
await writer.wait_closed()
except Exception:
pass
async def _route(
path: str, params: dict[str, str]
) -> tuple[bytes, str, str]:
"""Route request to handler. Returns (body, content_type, status)."""
if path == "/health":
body = json.dumps({
"status": "ok",
"service": "senpai-md-consumer",
"symbols": _state.symbols if _state else [],
}).encode()
return body, "application/json", "200 OK"
elif path == "/metrics":
body = generate_latest()
return body, CONTENT_TYPE_LATEST, "200 OK"
elif path == "/state/latest":
symbol = params.get("symbol", "")
if not symbol:
body = json.dumps({"error": "missing ?symbol=XXX"}).encode()
return body, "application/json", "400 Bad Request"
if not _state:
body = json.dumps({"error": "not initialized"}).encode()
return body, "application/json", "503 Service Unavailable"
data = _state.to_dict(symbol)
body = json.dumps(data, ensure_ascii=False).encode()
return body, "application/json", "200 OK"
elif path == "/features/latest":
symbol = params.get("symbol", "")
if not symbol:
body = json.dumps({"error": "missing ?symbol=XXX"}).encode()
return body, "application/json", "400 Bad Request"
if not _state:
body = json.dumps({"error": "not initialized"}).encode()
return body, "application/json", "503 Service Unavailable"
features = compute_features(_state, symbol)
data = {"symbol": symbol.upper(), "features": features}
body = json.dumps(data, ensure_ascii=False).encode()
return body, "application/json", "200 OK"
elif path == "/stats":
if _stats_fn:
data = _stats_fn()
else:
data = {"error": "not initialized"}
body = json.dumps(data, ensure_ascii=False).encode()
return body, "application/json", "200 OK"
else:
body = json.dumps({"error": "not found"}).encode()
return body, "application/json", "404 Not Found"
async def start_api() -> asyncio.Server:
"""Start the HTTP server."""
server = await asyncio.start_server(
_handler,
settings.http_host,
settings.http_port,
)
logger.info(
"api.started",
extra={
"host": settings.http_host,
"port": settings.http_port,
"endpoints": [
"/health",
"/metrics",
"/state/latest?symbol=",
"/features/latest?symbol=",
"/stats",
],
},
)
return server

View File

@@ -0,0 +1,55 @@
"""
Configuration via pydantic-settings.
All settings from ENV or .env file.
"""
from __future__ import annotations
from pydantic_settings import BaseSettings, SettingsConfigDict
class Settings(BaseSettings):
model_config = SettingsConfigDict(
env_file=".env",
env_file_encoding="utf-8",
extra="ignore",
)
# ── NATS ──────────────────────────────────────────────────────────
nats_url: str = "nats://localhost:4222"
nats_subject: str = "md.events.>"
nats_queue_group: str = "senpai-md"
use_jetstream: bool = False
# ── Internal queue ────────────────────────────────────────────────
queue_size: int = 50_000
queue_drop_threshold: float = 0.9 # start dropping at 90%
# ── Features / signals ────────────────────────────────────────────
features_enabled: bool = True
features_pub_rate_hz: float = 10.0 # max publish rate per symbol
features_pub_subject: str = "senpai.features"
signals_pub_subject: str = "senpai.signals"
alerts_pub_subject: str = "senpai.alerts"
# ── Rolling window ────────────────────────────────────────────────
rolling_window_seconds: float = 60.0
# ── Signal rules (rule-based MVP) ─────────────────────────────────
signal_return_threshold: float = 0.003 # 0.3%
signal_volume_threshold: float = 1.0 # min volume in 10s
signal_spread_max_bps: float = 20.0 # max spread in bps
# ── Alert thresholds ──────────────────────────────────────────────
alert_latency_ms: float = 1000.0 # alert if p95 latency > this
alert_gap_seconds: float = 30.0 # alert if no events for N sec
# ── HTTP ──────────────────────────────────────────────────────────
http_host: str = "0.0.0.0"
http_port: int = 8892
# ── Logging ───────────────────────────────────────────────────────
log_level: str = "INFO"
settings = Settings()

View File

@@ -0,0 +1,248 @@
"""
Feature engine — incremental feature computation from rolling windows.
Features (per symbol):
- mid: (bid+ask)/2
- spread_abs: ask - bid
- spread_bps: spread_abs / mid * 10000
- trade_vwap_10s: VWAP over last 10 seconds
- trade_vwap_60s: VWAP over last 60 seconds
- trade_count_10s: number of trades in 10s
- trade_volume_10s: total volume in 10s
- return_10s: mid_now / mid_10s_ago - 1
- realized_vol_60s: std of log-returns over 60s
- latency_ms_p50: p50 exchange-to-receive latency
- latency_ms_p95: p95 exchange-to-receive latency
Rule-based signal (MVP):
- if return_10s > threshold AND volume_10s > threshold AND spread_bps < threshold
→ emit TradeSignal(direction="long")
- opposite for short
"""
from __future__ import annotations
import logging
import math
from senpai.md_consumer.config import settings
from senpai.md_consumer.models import FeatureSnapshot, TradeSignal
from senpai.md_consumer.state import LatestState, TradeRecord
logger = logging.getLogger(__name__)
def compute_features(state: LatestState, symbol: str) -> dict[str, float | None]:
"""
Compute all features for a symbol from current state.
Returns a flat dict of feature_name → value (None if not computable).
"""
sym = symbol.upper()
features: dict[str, float | None] = {}
quote = state.get_latest_quote(sym)
window = state.get_window(sym)
# ── Mid / Spread ──────────────────────────────────────────────────
if quote and quote.bid > 0 and quote.ask > 0:
mid = (quote.bid + quote.ask) / 2
spread_abs = quote.ask - quote.bid
spread_bps = (spread_abs / mid * 10_000) if mid > 0 else None
features["mid"] = mid
features["spread_abs"] = spread_abs
features["spread_bps"] = spread_bps
else:
features["mid"] = None
features["spread_abs"] = None
features["spread_bps"] = None
if not window:
# No rolling data yet — fill with None
features.update({
"trade_vwap_10s": None,
"trade_vwap_60s": None,
"trade_count_10s": None,
"trade_volume_10s": None,
"return_10s": None,
"realized_vol_60s": None,
"latency_ms_p50": None,
"latency_ms_p95": None,
})
return features
# ── VWAP ──────────────────────────────────────────────────────────
trades_10s = window.trades_since(10.0)
trades_60s = list(window.trades)
features["trade_vwap_10s"] = _vwap(trades_10s)
features["trade_vwap_60s"] = _vwap(trades_60s)
# ── Trade count / volume (10s) ────────────────────────────────────
features["trade_count_10s"] = float(len(trades_10s))
features["trade_volume_10s"] = sum(t.size for t in trades_10s) if trades_10s else 0.0
# ── Return 10s ────────────────────────────────────────────────────
features["return_10s"] = _return_over(window, features.get("mid"), 10.0)
# ── Realised volatility 60s ───────────────────────────────────────
features["realized_vol_60s"] = _realized_vol(trades_60s)
# ── Latency ───────────────────────────────────────────────────────
latencies = _latencies_ms(trades_60s)
if latencies:
latencies.sort()
features["latency_ms_p50"] = _percentile(latencies, 50)
features["latency_ms_p95"] = _percentile(latencies, 95)
else:
features["latency_ms_p50"] = None
features["latency_ms_p95"] = None
return features
def make_feature_snapshot(
state: LatestState, symbol: str
) -> FeatureSnapshot:
"""Create a FeatureSnapshot for publishing."""
features = compute_features(state, symbol)
return FeatureSnapshot(symbol=symbol.upper(), features=features)
def check_signal(
features: dict[str, float | None], symbol: str
) -> TradeSignal | None:
"""
Rule-based signal MVP.
Long if:
- return_10s > signal_return_threshold
- trade_volume_10s > signal_volume_threshold
- spread_bps < signal_spread_max_bps
Short if opposite return condition met.
"""
ret = features.get("return_10s")
vol = features.get("trade_volume_10s")
spread = features.get("spread_bps")
if ret is None or vol is None or spread is None:
return None
# Spread filter (both directions)
if spread > settings.signal_spread_max_bps:
return None
# Volume filter
if vol < settings.signal_volume_threshold:
return None
# Direction
if ret > settings.signal_return_threshold:
confidence = min(1.0, ret / (settings.signal_return_threshold * 3))
return TradeSignal(
symbol=symbol.upper(),
direction="long",
confidence=confidence,
reason=f"return_10s={ret:.4f} vol_10s={vol:.2f} spread={spread:.1f}bps",
features=features,
)
elif ret < -settings.signal_return_threshold:
confidence = min(1.0, abs(ret) / (settings.signal_return_threshold * 3))
return TradeSignal(
symbol=symbol.upper(),
direction="short",
confidence=confidence,
reason=f"return_10s={ret:.4f} vol_10s={vol:.2f} spread={spread:.1f}bps",
features=features,
)
return None
# ── Internal helpers ───────────────────────────────────────────────────
def _vwap(trades: list[TradeRecord]) -> float | None:
"""Volume-weighted average price."""
if not trades:
return None
total_value = sum(t.price * t.size for t in trades)
total_volume = sum(t.size for t in trades)
if total_volume <= 0:
return None
return total_value / total_volume
def _return_over(
window, current_mid: float | None, seconds: float
) -> float | None:
"""
Return over last N seconds.
Uses mid price from quotes if available, else latest trade price.
"""
if current_mid is None or current_mid <= 0:
return None
# Find the quote mid from N seconds ago
quotes = window.quotes_since(seconds)
if quotes:
oldest = quotes[0]
old_mid = (oldest.bid + oldest.ask) / 2
if old_mid > 0:
return current_mid / old_mid - 1
# Fallback: use trade prices
trades = window.trades_since(seconds)
if trades:
old_price = trades[0].price
if old_price > 0:
return current_mid / old_price - 1
return None
def _realized_vol(trades: list[TradeRecord]) -> float | None:
"""
Simple realised volatility: std of log-returns of trade prices.
"""
if len(trades) < 3:
return None
prices = [t.price for t in trades if t.price > 0]
if len(prices) < 3:
return None
log_returns = []
for i in range(1, len(prices)):
if prices[i - 1] > 0:
lr = math.log(prices[i] / prices[i - 1])
log_returns.append(lr)
if len(log_returns) < 2:
return None
mean = sum(log_returns) / len(log_returns)
variance = sum((r - mean) ** 2 for r in log_returns) / (len(log_returns) - 1)
return math.sqrt(variance)
def _latencies_ms(trades: list[TradeRecord]) -> list[float]:
"""Extract exchange-to-receive latencies in ms."""
latencies = []
for t in trades:
if t.ts_exchange is not None and t.ts_recv is not None:
lat = (t.ts_recv.timestamp() - t.ts_exchange.timestamp()) * 1000
if 0 < lat < 60_000: # sanity: 0-60s
latencies.append(lat)
return latencies
def _percentile(sorted_data: list[float], p: int) -> float:
"""Simple percentile from sorted list."""
if not sorted_data:
return 0.0
k = (len(sorted_data) - 1) * p / 100
f = math.floor(k)
c = math.ceil(k)
if f == c:
return sorted_data[int(k)]
return sorted_data[f] * (c - k) + sorted_data[c] * (k - f)

View File

@@ -0,0 +1,270 @@
"""
SenpAI Market-Data Consumer — entry point.
Orchestrates:
1. NATS subscription (md.events.>)
2. Event processing → state updates → feature computation
3. Feature/signal/alert publishing back to NATS
4. HTTP API for monitoring
Usage:
python -m senpai.md_consumer
"""
from __future__ import annotations
import asyncio
import logging
import signal
import time
import structlog
from senpai.md_consumer import api
from senpai.md_consumer import metrics as m
from senpai.md_consumer.config import settings
from senpai.md_consumer.features import (
check_signal,
make_feature_snapshot,
compute_features,
)
from senpai.md_consumer.models import (
AlertEvent,
EventType,
TradeEvent,
QuoteEvent,
)
from senpai.md_consumer.nats_consumer import NATSConsumer
from senpai.md_consumer.publisher import Publisher
from senpai.md_consumer.state import LatestState
logger = structlog.get_logger()
# ── Logging setup ──────────────────────────────────────────────────────
def setup_logging() -> None:
log_level = getattr(logging, settings.log_level.upper(), logging.INFO)
structlog.configure(
processors=[
structlog.contextvars.merge_contextvars,
structlog.processors.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.dev.ConsoleRenderer(),
],
wrapper_class=structlog.make_filtering_bound_logger(log_level),
context_class=dict,
logger_factory=structlog.PrintLoggerFactory(),
)
logging.basicConfig(level=log_level, format="%(message)s")
# ── Processing pipeline ───────────────────────────────────────────────
async def process_events(
consumer: NATSConsumer,
state: LatestState,
publisher: Publisher,
) -> None:
"""
Main processing loop:
1. Read event from queue
2. Update state
3. Compute features
4. Publish features + check signals
5. Check alerts
"""
last_alert_check = time.monotonic()
events_per_sec_count = 0
time.monotonic()
while True:
try:
event = await consumer.queue.get()
except asyncio.CancelledError:
break
proc_start = time.monotonic()
try:
# Update state based on event type
if event.event_type == EventType.TRADE:
assert isinstance(event, TradeEvent)
state.update_trade(event)
symbol = event.symbol
elif event.event_type == EventType.QUOTE:
assert isinstance(event, QuoteEvent)
state.update_quote(event)
symbol = event.symbol
elif event.event_type == EventType.HEARTBEAT:
# Heartbeats don't update state, just track
symbol = None
elif event.event_type == EventType.BOOK_L2:
# TODO: book updates
symbol = None
else:
symbol = None
# Compute features + publish (only for trade/quote events)
if symbol and settings.features_enabled:
snapshot = make_feature_snapshot(state, symbol)
await publisher.publish_features(snapshot)
# Check for trade signal
sig = check_signal(snapshot.features, symbol)
if sig:
await publisher.publish_signal(sig)
# Processing latency metric
proc_ms = (time.monotonic() - proc_start) * 1000
m.PROCESSING_LATENCY.observe(proc_ms)
# Events/sec tracking
events_per_sec_count += 1
except Exception as e:
logger.error(
"process.error",
error=str(e),
event_type=event.event_type.value if event else "?",
)
# Periodic alert checks (every 5 seconds)
now = time.monotonic()
if now - last_alert_check > 5.0:
last_alert_check = now
await _check_alerts(state, publisher, consumer)
async def _check_alerts(
state: LatestState,
publisher: Publisher,
consumer: NATSConsumer,
) -> None:
"""Check alert conditions and emit if needed."""
# Backpressure alert
fill = consumer.queue_fill_ratio
if fill > 0.8:
await publisher.publish_alert(
AlertEvent(
alert_type="backpressure",
level="warning" if fill < 0.95 else "critical",
message=f"Queue fill at {fill:.0%}",
details={"fill_ratio": fill},
)
)
# Latency alert (per symbol)
for sym in state.symbols:
features = compute_features(state, sym)
p95 = features.get("latency_ms_p95")
if p95 is not None and p95 > settings.alert_latency_ms:
await publisher.publish_alert(
AlertEvent(
alert_type="latency",
level="warning",
message=f"{sym} p95 latency {p95:.0f}ms > {settings.alert_latency_ms}ms",
details={"symbol": sym, "p95_ms": p95},
)
)
# ── Main ───────────────────────────────────────────────────────────────
async def main() -> None:
setup_logging()
logger.info("service.starting", nats_url=settings.nats_url)
# State store
state = LatestState(window_seconds=settings.rolling_window_seconds)
# NATS consumer
consumer = NATSConsumer()
await consumer.connect()
await consumer.subscribe()
# Publisher (reuses same NATS connection)
publisher = Publisher(consumer._nc)
# Wire up API
api.set_state(state)
def _get_stats() -> dict:
return {
"queue_size": consumer.queue.qsize(),
"queue_fill_ratio": round(consumer.queue_fill_ratio, 3),
"queue_max": settings.queue_size,
"events_processed": state.event_count,
"symbols_tracked": state.symbols,
"features_enabled": settings.features_enabled,
"nats_connected": bool(consumer._nc and consumer._nc.is_connected),
}
api.set_stats_fn(_get_stats)
# Start HTTP API
http_server = await api.start_api()
# Start processing loop
process_task = asyncio.create_task(
process_events(consumer, state, publisher)
)
# Graceful shutdown
shutdown_event = asyncio.Event()
def _signal_handler():
logger.info("service.shutdown_signal")
shutdown_event.set()
loop = asyncio.get_event_loop()
for sig in (signal.SIGINT, signal.SIGTERM):
try:
loop.add_signal_handler(sig, _signal_handler)
except NotImplementedError:
pass
logger.info(
"service.ready",
subject=settings.nats_subject,
queue_group=settings.nats_queue_group,
http_port=settings.http_port,
features_enabled=settings.features_enabled,
)
# Wait for shutdown
await shutdown_event.wait()
# ── Cleanup ───────────────────────────────────────────────────────
logger.info("service.shutting_down")
process_task.cancel()
try:
await process_task
except asyncio.CancelledError:
pass
await consumer.close()
http_server.close()
await http_server.wait_closed()
logger.info(
"service.stopped",
events_processed=state.event_count,
symbols=state.symbols,
)
def cli():
asyncio.run(main())
if __name__ == "__main__":
cli()

View File

@@ -0,0 +1,72 @@
"""
Prometheus metrics for SenpAI market-data consumer.
"""
from prometheus_client import Counter, Gauge, Histogram
# ── Inbound events ─────────────────────────────────────────────────────
EVENTS_IN = Counter(
"senpai_events_in_total",
"Total events received from NATS",
["event_type", "provider"],
)
EVENTS_DROPPED = Counter(
"senpai_events_dropped_total",
"Events dropped due to backpressure or errors",
["reason", "event_type"],
)
# ── Queue ──────────────────────────────────────────────────────────────
QUEUE_FILL = Gauge(
"senpai_queue_fill_ratio",
"Internal processing queue fill ratio (0..1)",
)
QUEUE_SIZE = Gauge(
"senpai_queue_size",
"Current number of items in processing queue",
)
# ── Processing ─────────────────────────────────────────────────────────
PROCESSING_LATENCY = Histogram(
"senpai_processing_latency_ms",
"End-to-end processing latency (NATS receive to feature publish) in ms",
buckets=[0.1, 0.5, 1, 2, 5, 10, 25, 50, 100, 250],
)
# ── Feature publishing ─────────────────────────────────────────────────
FEATURE_PUBLISH = Counter(
"senpai_feature_publish_total",
"Total feature snapshots published to NATS",
["symbol"],
)
FEATURE_PUBLISH_ERRORS = Counter(
"senpai_feature_publish_errors_total",
"Failed feature publishes",
["symbol"],
)
# ── Signals ────────────────────────────────────────────────────────────
SIGNALS_EMITTED = Counter(
"senpai_signals_emitted_total",
"Trade signals emitted",
["symbol", "direction"],
)
ALERTS_EMITTED = Counter(
"senpai_alerts_emitted_total",
"Alerts emitted",
["alert_type"],
)
# ── NATS connection ───────────────────────────────────────────────────
NATS_CONNECTED = Gauge(
"senpai_nats_connected",
"Whether NATS connection is alive (1=yes, 0=no)",
)
NATS_RECONNECTS = Counter(
"senpai_nats_reconnects_total",
"Number of NATS reconnections",
)

View File

@@ -0,0 +1,139 @@
"""
Domain models — mirrors market-data-service event contracts.
Tolerant parsing: unknown fields ignored, partial data accepted.
"""
from __future__ import annotations
import time
from datetime import datetime, timezone
from enum import Enum
from typing import Optional
from pydantic import BaseModel, Field
class EventType(str, Enum):
TRADE = "trade"
QUOTE = "quote"
BOOK_L2 = "book_l2"
HEARTBEAT = "heartbeat"
def _utc_now() -> datetime:
return datetime.now(timezone.utc)
def _mono_ns() -> int:
return time.monotonic_ns()
class BaseEvent(BaseModel, extra="ignore"):
"""Common fields — extra fields silently ignored."""
event_type: EventType
provider: str
ts_recv: datetime = Field(default_factory=_utc_now)
ts_recv_mono_ns: int = Field(default_factory=_mono_ns)
class TradeEvent(BaseEvent):
event_type: EventType = EventType.TRADE
symbol: str
price: float
size: float
ts_exchange: Optional[datetime] = None
side: Optional[str] = None
trade_id: Optional[str] = None
class QuoteEvent(BaseEvent):
event_type: EventType = EventType.QUOTE
symbol: str
bid: float
ask: float
bid_size: float
ask_size: float
ts_exchange: Optional[datetime] = None
class BookLevel(BaseModel, extra="ignore"):
price: float
size: float
class BookL2Event(BaseEvent):
event_type: EventType = EventType.BOOK_L2
symbol: str
bids: list[BookLevel] = Field(default_factory=list)
asks: list[BookLevel] = Field(default_factory=list)
ts_exchange: Optional[datetime] = None
class HeartbeatEvent(BaseEvent):
event_type: EventType = EventType.HEARTBEAT
# Union for parsing
Event = TradeEvent | QuoteEvent | BookL2Event | HeartbeatEvent
# ── Output models ──────────────────────────────────────────────────────
class FeatureSnapshot(BaseModel):
"""Published to senpai.features.{symbol}."""
symbol: str
ts: datetime = Field(default_factory=_utc_now)
features: dict[str, float | None]
class TradeSignal(BaseModel):
"""Published to senpai.signals.{symbol}."""
symbol: str
ts: datetime = Field(default_factory=_utc_now)
direction: str # "long" | "short"
confidence: float = 0.0 # 0..1
reason: str = ""
features: dict[str, float | None] = Field(default_factory=dict)
class AlertEvent(BaseModel):
"""Published to senpai.alerts."""
ts: datetime = Field(default_factory=_utc_now)
level: str = "warning" # "warning" | "critical"
alert_type: str # "latency" | "gap" | "backpressure"
message: str
details: dict = Field(default_factory=dict)
# ── Parsing helper ─────────────────────────────────────────────────────
_EVENT_MAP: dict[str, type[BaseEvent]] = {
"trade": TradeEvent,
"quote": QuoteEvent,
"book_l2": BookL2Event,
"heartbeat": HeartbeatEvent,
}
def parse_event(data: dict) -> Event | None:
"""
Parse a dict (from JSON) into the appropriate Event model.
Returns None if event_type is unknown or data is invalid.
"""
event_type = data.get("event_type")
if not event_type:
return None
cls = _EVENT_MAP.get(event_type)
if cls is None:
return None
try:
return cls.model_validate(data)
except Exception:
return None

View File

@@ -0,0 +1,229 @@
"""
NATS consumer — subscribes to md.events.> and feeds the processing pipeline.
Features:
- Queue group subscription (horizontal scaling)
- Bounded asyncio.Queue with backpressure drop policy
- Auto-reconnect via nats-py
- Optional JetStream durable consumer
"""
from __future__ import annotations
import asyncio
import json
import logging
import nats
from nats.aio.client import Client as NatsClient
from nats.aio.msg import Msg
from senpai.md_consumer.config import settings
from senpai.md_consumer.models import EventType, parse_event, Event
from senpai.md_consumer import metrics as m
logger = logging.getLogger(__name__)
# Events that can be dropped under backpressure (lowest priority first)
_DROPPABLE = {EventType.HEARTBEAT, EventType.QUOTE, EventType.BOOK_L2}
class NATSConsumer:
"""
Reads normalised events from NATS, validates, and puts into
a bounded asyncio.Queue for downstream processing.
Backpressure policy:
- Queue < 90% → accept all events
- Queue >= 90% → drop heartbeats, quotes, book snapshots
- Trades are NEVER dropped (critical for analytics)
"""
def __init__(self) -> None:
self._nc: NatsClient | None = None
self._sub = None
self._js_sub = None
self._queue: asyncio.Queue[Event] = asyncio.Queue(
maxsize=settings.queue_size
)
self._running = False
self._drop_count: dict[str, int] = {}
@property
def queue(self) -> asyncio.Queue[Event]:
return self._queue
@property
def queue_fill_ratio(self) -> float:
if settings.queue_size <= 0:
return 0.0
return self._queue.qsize() / settings.queue_size
async def connect(self) -> None:
"""Connect to NATS with auto-reconnect."""
self._nc = await nats.connect(
self._url,
reconnect_time_wait=2,
max_reconnect_attempts=-1,
name="senpai-md-consumer",
error_cb=self._on_error,
disconnected_cb=self._on_disconnected,
reconnected_cb=self._on_reconnected,
closed_cb=self._on_closed,
)
m.NATS_CONNECTED.set(1)
logger.info(
"nats.connected",
extra={"url": self._url, "subject": settings.nats_subject},
)
@property
def _url(self) -> str:
return settings.nats_url
async def subscribe(self) -> None:
"""Subscribe to market data events."""
if not self._nc:
raise RuntimeError("Not connected. Call connect() first.")
if settings.use_jetstream:
await self._subscribe_jetstream()
else:
await self._subscribe_core()
async def _subscribe_core(self) -> None:
"""Core NATS subscription with queue group."""
self._sub = await self._nc.subscribe(
settings.nats_subject,
queue=settings.nats_queue_group,
cb=self._on_message,
)
logger.info(
"nats.subscribed_core",
extra={
"subject": settings.nats_subject,
"queue_group": settings.nats_queue_group,
},
)
async def _subscribe_jetstream(self) -> None:
"""JetStream durable subscription."""
js = self._nc.jetstream()
# Try to create or bind to existing consumer
self._js_sub = await js.subscribe(
settings.nats_subject,
queue=settings.nats_queue_group,
durable="senpai-md-durable",
cb=self._on_message,
manual_ack=True,
)
logger.info(
"nats.subscribed_jetstream",
extra={
"subject": settings.nats_subject,
"durable": "senpai-md-durable",
},
)
async def _on_message(self, msg: Msg) -> None:
"""
Callback for each NATS message.
Parse → backpressure check → enqueue.
"""
try:
data = json.loads(msg.data)
except (json.JSONDecodeError, UnicodeDecodeError) as e:
m.EVENTS_DROPPED.labels(reason="parse_error", event_type="unknown").inc()
logger.warning("nats.parse_error", extra={"error": str(e)})
if settings.use_jetstream:
await msg.ack()
return
event = parse_event(data)
if event is None:
m.EVENTS_DROPPED.labels(reason="invalid_event", event_type="unknown").inc()
if settings.use_jetstream:
await msg.ack()
return
# Track inbound
m.EVENTS_IN.labels(
event_type=event.event_type.value,
provider=event.provider,
).inc()
# Backpressure check
fill = self.queue_fill_ratio
m.QUEUE_FILL.set(fill)
m.QUEUE_SIZE.set(self._queue.qsize())
if fill >= settings.queue_drop_threshold:
# Under pressure: only accept trades
if event.event_type in _DROPPABLE:
et = event.event_type.value
self._drop_count[et] = self._drop_count.get(et, 0) + 1
m.EVENTS_DROPPED.labels(
reason="backpressure",
event_type=et,
).inc()
if self._drop_count[et] % 1000 == 1:
logger.warning(
"nats.backpressure_drop",
extra={
"type": et,
"fill": f"{fill:.0%}",
"total_drops": self._drop_count,
},
)
if settings.use_jetstream:
await msg.ack()
return
# Enqueue
try:
self._queue.put_nowait(event)
except asyncio.QueueFull:
# Last resort: try to drop oldest non-trade
m.EVENTS_DROPPED.labels(
reason="queue_full", event_type=event.event_type.value
).inc()
if settings.use_jetstream:
await msg.ack()
async def close(self) -> None:
"""Graceful shutdown."""
self._running = False
if self._sub:
try:
await self._sub.unsubscribe()
except Exception:
pass
if self._nc:
try:
await self._nc.flush(timeout=5)
await self._nc.close()
except Exception:
pass
m.NATS_CONNECTED.set(0)
logger.info("nats.closed", extra={"drops": self._drop_count})
# ── NATS callbacks ────────────────────────────────────────────────
async def _on_error(self, e: Exception) -> None:
logger.error("nats.error", extra={"error": str(e)})
async def _on_disconnected(self) -> None:
m.NATS_CONNECTED.set(0)
logger.warning("nats.disconnected")
async def _on_reconnected(self) -> None:
m.NATS_CONNECTED.set(1)
m.NATS_RECONNECTS.inc()
logger.info("nats.reconnected")
async def _on_closed(self) -> None:
m.NATS_CONNECTED.set(0)
logger.info("nats.closed_callback")

View File

@@ -0,0 +1,119 @@
"""
Signal bus publisher — publishes features, signals, and alerts to NATS.
Rate-limiting: max N publishes per second per symbol (configurable).
"""
from __future__ import annotations
import logging
import time
from nats.aio.client import Client as NatsClient
from senpai.md_consumer.config import settings
from senpai.md_consumer.models import AlertEvent, FeatureSnapshot, TradeSignal
from senpai.md_consumer import metrics as m
logger = logging.getLogger(__name__)
class Publisher:
"""
Publishes FeatureSnapshots and TradeSignals to NATS.
Built-in per-symbol rate limiter.
"""
def __init__(self, nc: NatsClient) -> None:
self._nc = nc
self._last_publish: dict[str, float] = {} # symbol → monotonic time
self._min_interval = (
1.0 / settings.features_pub_rate_hz
if settings.features_pub_rate_hz > 0
else 0.1
)
def _rate_ok(self, symbol: str) -> bool:
"""Check if we can publish for this symbol (rate limiter)."""
now = time.monotonic()
last = self._last_publish.get(symbol, 0.0)
if now - last >= self._min_interval:
self._last_publish[symbol] = now
return True
return False
async def publish_features(self, snapshot: FeatureSnapshot) -> bool:
"""
Publish feature snapshot if rate limit allows.
Returns True if published, False if rate-limited or error.
"""
if not settings.features_enabled:
return False
symbol = snapshot.symbol.upper()
if not self._rate_ok(symbol):
return False
subject = f"{settings.features_pub_subject}.{symbol}"
try:
payload = snapshot.model_dump_json().encode("utf-8")
await self._nc.publish(subject, payload)
m.FEATURE_PUBLISH.labels(symbol=symbol).inc()
return True
except Exception as e:
m.FEATURE_PUBLISH_ERRORS.labels(symbol=symbol).inc()
logger.warning(
"publisher.feature_error",
extra={"symbol": symbol, "error": str(e)},
)
return False
async def publish_signal(self, signal: TradeSignal) -> bool:
"""Publish trade signal (no rate limit — signals are rare)."""
subject = f"{settings.signals_pub_subject}.{signal.symbol}"
try:
payload = signal.model_dump_json().encode("utf-8")
await self._nc.publish(subject, payload)
m.SIGNALS_EMITTED.labels(
symbol=signal.symbol,
direction=signal.direction,
).inc()
logger.info(
"publisher.signal_emitted",
extra={
"symbol": signal.symbol,
"direction": signal.direction,
"confidence": f"{signal.confidence:.2f}",
"reason": signal.reason,
},
)
return True
except Exception as e:
logger.error(
"publisher.signal_error",
extra={"symbol": signal.symbol, "error": str(e)},
)
return False
async def publish_alert(self, alert: AlertEvent) -> bool:
"""Publish alert event."""
subject = settings.alerts_pub_subject
try:
payload = alert.model_dump_json().encode("utf-8")
await self._nc.publish(subject, payload)
m.ALERTS_EMITTED.labels(alert_type=alert.alert_type).inc()
logger.warning(
"publisher.alert",
extra={
"type": alert.alert_type,
"level": alert.level,
"message": alert.message,
},
)
return True
except Exception as e:
logger.error(
"publisher.alert_error",
extra={"error": str(e)},
)
return False

View File

@@ -0,0 +1,238 @@
"""
State management — LatestState + RollingWindow.
All structures are asyncio-safe (no locks needed — single-threaded event loop).
"""
from __future__ import annotations
import time
from collections import deque
from dataclasses import dataclass
from datetime import datetime
from typing import Optional
from senpai.md_consumer.models import QuoteEvent, TradeEvent
@dataclass
class LatestTrade:
symbol: str
price: float
size: float
side: Optional[str]
provider: str
ts_recv: datetime
ts_exchange: Optional[datetime] = None
@dataclass
class LatestQuote:
symbol: str
bid: float
ask: float
bid_size: float
ask_size: float
provider: str
ts_recv: datetime
ts_exchange: Optional[datetime] = None
@dataclass
class TradeRecord:
"""Compact trade record for rolling window."""
price: float
size: float
ts: float # monotonic seconds
ts_exchange: Optional[datetime] = None
ts_recv: Optional[datetime] = None
@dataclass
class QuoteRecord:
"""Compact quote record for rolling window."""
bid: float
ask: float
bid_size: float
ask_size: float
ts: float # monotonic seconds
ts_exchange: Optional[datetime] = None
ts_recv: Optional[datetime] = None
class RollingWindow:
"""
Fixed-duration rolling window using deque.
Efficient: O(1) append, amortised O(1) eviction.
No pandas dependency.
"""
def __init__(self, window_seconds: float = 60.0) -> None:
self._window = window_seconds
self._trades: deque[TradeRecord] = deque()
self._quotes: deque[QuoteRecord] = deque()
def add_trade(self, trade: TradeRecord) -> None:
self._trades.append(trade)
self._evict_trades()
def add_quote(self, quote: QuoteRecord) -> None:
self._quotes.append(quote)
self._evict_quotes()
def _evict_trades(self) -> None:
cutoff = time.monotonic() - self._window
while self._trades and self._trades[0].ts < cutoff:
self._trades.popleft()
def _evict_quotes(self) -> None:
cutoff = time.monotonic() - self._window
while self._quotes and self._quotes[0].ts < cutoff:
self._quotes.popleft()
@property
def trades(self) -> deque[TradeRecord]:
self._evict_trades()
return self._trades
@property
def quotes(self) -> deque[QuoteRecord]:
self._evict_quotes()
return self._quotes
def trades_since(self, seconds_ago: float) -> list[TradeRecord]:
"""Return trades within the last N seconds."""
cutoff = time.monotonic() - seconds_ago
return [t for t in self._trades if t.ts >= cutoff]
def quotes_since(self, seconds_ago: float) -> list[QuoteRecord]:
"""Return quotes within the last N seconds."""
cutoff = time.monotonic() - seconds_ago
return [q for q in self._quotes if q.ts >= cutoff]
class LatestState:
"""
Maintains latest trade/quote per symbol + rolling windows.
"""
def __init__(self, window_seconds: float = 60.0) -> None:
self._window_seconds = window_seconds
self._latest_trade: dict[str, LatestTrade] = {}
self._latest_quote: dict[str, LatestQuote] = {}
self._windows: dict[str, RollingWindow] = {}
self._event_count = 0
def _get_window(self, symbol: str) -> RollingWindow:
if symbol not in self._windows:
self._windows[symbol] = RollingWindow(self._window_seconds)
return self._windows[symbol]
def update_trade(self, event: TradeEvent) -> None:
"""Update latest trade and rolling window."""
sym = event.symbol.upper()
self._latest_trade[sym] = LatestTrade(
symbol=sym,
price=event.price,
size=event.size,
side=event.side,
provider=event.provider,
ts_recv=event.ts_recv,
ts_exchange=event.ts_exchange,
)
self._get_window(sym).add_trade(
TradeRecord(
price=event.price,
size=event.size,
ts=time.monotonic(),
ts_exchange=event.ts_exchange,
ts_recv=event.ts_recv,
)
)
self._event_count += 1
def update_quote(self, event: QuoteEvent) -> None:
"""Update latest quote and rolling window."""
sym = event.symbol.upper()
self._latest_quote[sym] = LatestQuote(
symbol=sym,
bid=event.bid,
ask=event.ask,
bid_size=event.bid_size,
ask_size=event.ask_size,
provider=event.provider,
ts_recv=event.ts_recv,
ts_exchange=event.ts_exchange,
)
self._get_window(sym).add_quote(
QuoteRecord(
bid=event.bid,
ask=event.ask,
bid_size=event.bid_size,
ask_size=event.ask_size,
ts=time.monotonic(),
ts_exchange=event.ts_exchange,
ts_recv=event.ts_recv,
)
)
self._event_count += 1
def get_latest_trade(self, symbol: str) -> LatestTrade | None:
return self._latest_trade.get(symbol.upper())
def get_latest_quote(self, symbol: str) -> LatestQuote | None:
return self._latest_quote.get(symbol.upper())
def get_window(self, symbol: str) -> RollingWindow | None:
return self._windows.get(symbol.upper())
@property
def symbols(self) -> list[str]:
return sorted(
set(list(self._latest_trade.keys()) + list(self._latest_quote.keys()))
)
@property
def event_count(self) -> int:
return self._event_count
def to_dict(self, symbol: str) -> dict:
"""Serialise latest state for API."""
sym = symbol.upper()
result: dict = {"symbol": sym}
trade = self._latest_trade.get(sym)
if trade:
result["latest_trade"] = {
"price": trade.price,
"size": trade.size,
"side": trade.side,
"provider": trade.provider,
"ts_recv": trade.ts_recv.isoformat() if trade.ts_recv else None,
}
quote = self._latest_quote.get(sym)
if quote:
result["latest_quote"] = {
"bid": quote.bid,
"ask": quote.ask,
"bid_size": quote.bid_size,
"ask_size": quote.ask_size,
"provider": quote.provider,
"ts_recv": quote.ts_recv.isoformat() if quote.ts_recv else None,
}
window = self._windows.get(sym)
if window:
result["window"] = {
"trades_count": len(window.trades),
"quotes_count": len(window.quotes),
}
return result

View File

@@ -0,0 +1,212 @@
"""
Test feature computations — deterministic scenarios.
"""
import pytest
from senpai.md_consumer.features import (
_percentile,
_realized_vol,
_vwap,
check_signal,
compute_features,
)
from senpai.md_consumer.models import QuoteEvent, TradeEvent
from senpai.md_consumer.state import LatestState, TradeRecord
# ── VWAP ───────────────────────────────────────────────────────────────
def test_vwap_basic():
trades = [
TradeRecord(price=100.0, size=10.0, ts=0),
TradeRecord(price=200.0, size=10.0, ts=0),
]
# VWAP = (100*10 + 200*10) / (10+10) = 150
assert _vwap(trades) == 150.0
def test_vwap_weighted():
trades = [
TradeRecord(price=100.0, size=90.0, ts=0),
TradeRecord(price=200.0, size=10.0, ts=0),
]
# VWAP = (100*90 + 200*10) / 100 = 110
assert _vwap(trades) == 110.0
def test_vwap_empty():
assert _vwap([]) is None
def test_vwap_zero_volume():
trades = [TradeRecord(price=100.0, size=0.0, ts=0)]
assert _vwap(trades) is None
# ── Realized volatility ───────────────────────────────────────────────
def test_realized_vol_constant_price():
"""Constant price → 0 volatility."""
trades = [TradeRecord(price=100.0, size=1.0, ts=0) for _ in range(10)]
vol = _realized_vol(trades)
assert vol is not None
assert vol == 0.0
def test_realized_vol_two_prices():
"""Not enough data points → None."""
trades = [
TradeRecord(price=100.0, size=1.0, ts=0),
TradeRecord(price=101.0, size=1.0, ts=0),
]
assert _realized_vol(trades) is None # needs at least 3
def test_realized_vol_positive():
"""Variable prices should give positive volatility."""
trades = [
TradeRecord(price=100.0, size=1.0, ts=0),
TradeRecord(price=102.0, size=1.0, ts=0),
TradeRecord(price=99.0, size=1.0, ts=0),
TradeRecord(price=103.0, size=1.0, ts=0),
]
vol = _realized_vol(trades)
assert vol is not None
assert vol > 0
# ── Percentile ─────────────────────────────────────────────────────────
def test_percentile_basic():
data = [1.0, 2.0, 3.0, 4.0, 5.0]
assert _percentile(data, 50) == 3.0
assert _percentile(data, 0) == 1.0
assert _percentile(data, 100) == 5.0
def test_percentile_p95():
data = list(range(1, 101)) # 1..100
data_float = [float(x) for x in data]
p95 = _percentile(data_float, 95)
assert 95 <= p95 <= 96
# ── Full feature computation ──────────────────────────────────────────
def test_compute_features_with_state():
state = LatestState(window_seconds=60.0)
# Add quote
state.update_quote(QuoteEvent(
provider="binance",
symbol="BTCUSDT",
bid=70000.0,
ask=70002.0,
bid_size=5.0,
ask_size=3.0,
))
# Add some trades
for i in range(5):
state.update_trade(TradeEvent(
provider="binance",
symbol="BTCUSDT",
price=70000.0 + i * 10,
size=1.0,
))
features = compute_features(state, "BTCUSDT")
# Mid
assert features["mid"] == pytest.approx(70001.0)
# Spread
assert features["spread_abs"] == pytest.approx(2.0)
assert features["spread_bps"] is not None
assert features["spread_bps"] > 0
# Trade count
assert features["trade_count_10s"] == 5.0
# Volume
assert features["trade_volume_10s"] == 5.0
# VWAP should be defined
assert features["trade_vwap_10s"] is not None
assert features["trade_vwap_60s"] is not None
def test_compute_features_no_data():
state = LatestState(window_seconds=60.0)
features = compute_features(state, "BTCUSDT")
# All should be None
assert features["mid"] is None
assert features["spread_abs"] is None
assert features["trade_vwap_10s"] is None
# ── Signal detection ──────────────────────────────────────────────────
def test_check_signal_long():
"""Strong positive return + volume + tight spread → long signal."""
features = {
"return_10s": 0.005, # 0.5% (> 0.3% threshold)
"trade_volume_10s": 5.0, # > 1.0 threshold
"spread_bps": 3.0, # < 20 bps threshold
}
signal = check_signal(features, "BTCUSDT")
assert signal is not None
assert signal.direction == "long"
assert signal.confidence > 0
def test_check_signal_short():
"""Strong negative return → short signal."""
features = {
"return_10s": -0.005,
"trade_volume_10s": 5.0,
"spread_bps": 3.0,
}
signal = check_signal(features, "BTCUSDT")
assert signal is not None
assert signal.direction == "short"
def test_check_signal_no_trigger():
"""Small return → no signal."""
features = {
"return_10s": 0.0001,
"trade_volume_10s": 5.0,
"spread_bps": 3.0,
}
signal = check_signal(features, "BTCUSDT")
assert signal is None
def test_check_signal_wide_spread():
"""Wide spread → no signal (even with strong return)."""
features = {
"return_10s": 0.01,
"trade_volume_10s": 5.0,
"spread_bps": 50.0, # > 20 bps
}
signal = check_signal(features, "BTCUSDT")
assert signal is None
def test_check_signal_low_volume():
"""Low volume → no signal."""
features = {
"return_10s": 0.01,
"trade_volume_10s": 0.1, # < 1.0
"spread_bps": 3.0,
}
signal = check_signal(features, "BTCUSDT")
assert signal is None

View File

@@ -0,0 +1,154 @@
"""
Test event parsing from JSON payloads (mirrors market-data-service contracts).
"""
import json
from senpai.md_consumer.models import (
EventType,
TradeEvent,
QuoteEvent,
HeartbeatEvent,
parse_event,
)
# ── Trade events ───────────────────────────────────────────────────────
def test_parse_trade_basic():
data = {
"event_type": "trade",
"provider": "binance",
"symbol": "BTCUSDT",
"price": 70500.0,
"size": 1.5,
"ts_recv": "2026-02-09T12:00:00+00:00",
}
event = parse_event(data)
assert event is not None
assert isinstance(event, TradeEvent)
assert event.event_type == EventType.TRADE
assert event.symbol == "BTCUSDT"
assert event.price == 70500.0
assert event.size == 1.5
assert event.provider == "binance"
def test_parse_trade_with_extra_fields():
"""Unknown fields should be silently ignored (tolerant parsing)."""
data = {
"event_type": "trade",
"provider": "bybit",
"symbol": "ETHUSDT",
"price": 2100.0,
"size": 10.0,
"ts_recv": "2026-02-09T12:00:00+00:00",
"unknown_field": "should_be_ignored",
"another_extra": 42,
}
event = parse_event(data)
assert event is not None
assert event.symbol == "ETHUSDT"
def test_parse_trade_with_side_and_exchange_ts():
data = {
"event_type": "trade",
"provider": "binance",
"symbol": "BTCUSDT",
"price": 70000.0,
"size": 0.5,
"side": "buy",
"ts_exchange": "2026-02-09T12:00:00+00:00",
"ts_recv": "2026-02-09T12:00:00.100+00:00",
"trade_id": "t12345",
}
event = parse_event(data)
assert event.side == "buy"
assert event.trade_id == "t12345"
assert event.ts_exchange is not None
# ── Quote events ───────────────────────────────────────────────────────
def test_parse_quote_basic():
data = {
"event_type": "quote",
"provider": "binance",
"symbol": "BTCUSDT",
"bid": 70000.0,
"ask": 70001.0,
"bid_size": 5.0,
"ask_size": 3.0,
"ts_recv": "2026-02-09T12:00:00+00:00",
}
event = parse_event(data)
assert isinstance(event, QuoteEvent)
assert event.bid == 70000.0
assert event.ask == 70001.0
def test_parse_quote_zero_values():
data = {
"event_type": "quote",
"provider": "binance",
"symbol": "BTCUSDT",
"bid": 0.0,
"ask": 0.0,
"bid_size": 0.0,
"ask_size": 0.0,
}
event = parse_event(data)
assert event is not None
assert event.bid == 0.0
# ── Heartbeat events ──────────────────────────────────────────────────
def test_parse_heartbeat():
data = {
"event_type": "heartbeat",
"provider": "alpaca",
"ts_recv": "2026-02-09T12:00:00+00:00",
}
event = parse_event(data)
assert isinstance(event, HeartbeatEvent)
assert event.provider == "alpaca"
# ── Edge cases ─────────────────────────────────────────────────────────
def test_parse_unknown_type():
data = {"event_type": "unknown_type", "provider": "test"}
event = parse_event(data)
assert event is None
def test_parse_missing_type():
data = {"provider": "test", "symbol": "BTC"}
event = parse_event(data)
assert event is None
def test_parse_invalid_data():
data = {"event_type": "trade"} # missing required fields
event = parse_event(data)
assert event is None
def test_parse_empty_dict():
event = parse_event({})
assert event is None
def test_parse_from_json_bytes():
"""Simulate actual NATS message deserialization."""
raw = b'{"event_type":"trade","provider":"binance","symbol":"BTCUSDT","price":70500.0,"size":1.5}'
data = json.loads(raw)
event = parse_event(data)
assert event is not None
assert event.price == 70500.0

View File

@@ -0,0 +1,111 @@
"""
Test publisher rate limiting.
"""
from unittest.mock import AsyncMock
import pytest
from senpai.md_consumer.publisher import Publisher
from senpai.md_consumer.models import FeatureSnapshot, TradeSignal
@pytest.fixture
def mock_nc():
"""Mock NATS client."""
nc = AsyncMock()
nc.publish = AsyncMock()
return nc
@pytest.fixture
def publisher(mock_nc):
return Publisher(mock_nc)
@pytest.mark.asyncio
async def test_publish_features_respects_rate_limit(mock_nc, publisher):
"""Second publish for same symbol within rate window should be skipped."""
snapshot = FeatureSnapshot(
symbol="BTCUSDT",
features={"mid": 70000.0},
)
# First publish should succeed
result1 = await publisher.publish_features(snapshot)
assert result1 is True
# Immediate second publish should be rate-limited
result2 = await publisher.publish_features(snapshot)
assert result2 is False # rate-limited
# Only one actual NATS publish
assert mock_nc.publish.call_count == 1
@pytest.mark.asyncio
async def test_publish_features_different_symbols(mock_nc, publisher):
"""Different symbols have independent rate limiters."""
snap1 = FeatureSnapshot(symbol="BTCUSDT", features={"mid": 70000.0})
snap2 = FeatureSnapshot(symbol="ETHUSDT", features={"mid": 2000.0})
r1 = await publisher.publish_features(snap1)
r2 = await publisher.publish_features(snap2)
assert r1 is True
assert r2 is True
assert mock_nc.publish.call_count == 2
@pytest.mark.asyncio
async def test_publish_signal_no_rate_limit(mock_nc, publisher):
"""Signals are NOT rate limited."""
signal = TradeSignal(
symbol="BTCUSDT",
direction="long",
confidence=0.8,
reason="test",
)
r1 = await publisher.publish_signal(signal)
r2 = await publisher.publish_signal(signal)
assert r1 is True
assert r2 is True
assert mock_nc.publish.call_count == 2
@pytest.mark.asyncio
async def test_publish_features_after_rate_window(mock_nc, publisher):
"""After rate window passes, publish should succeed again."""
# Override min interval to something very small for testing
publisher._min_interval = 0.01 # 10ms
snapshot = FeatureSnapshot(
symbol="BTCUSDT",
features={"mid": 70000.0},
)
r1 = await publisher.publish_features(snapshot)
assert r1 is True
# Wait for rate window to pass
import asyncio
await asyncio.sleep(0.02)
r2 = await publisher.publish_features(snapshot)
assert r2 is True
assert mock_nc.publish.call_count == 2
@pytest.mark.asyncio
async def test_publish_handles_nats_error(mock_nc, publisher):
"""NATS publish error should not raise, just return False."""
mock_nc.publish.side_effect = Exception("NATS down")
snapshot = FeatureSnapshot(
symbol="BTCUSDT",
features={"mid": 70000.0},
)
result = await publisher.publish_features(snapshot)
assert result is False

View File

@@ -0,0 +1,138 @@
"""
Test state management — LatestState and RollingWindow.
"""
import time
from senpai.md_consumer.state import (
LatestState,
RollingWindow,
TradeRecord,
)
from senpai.md_consumer.models import TradeEvent, QuoteEvent
# ── RollingWindow ──────────────────────────────────────────────────────
def test_rolling_window_add_trade():
w = RollingWindow(window_seconds=60.0)
t = TradeRecord(price=100.0, size=1.0, ts=time.monotonic())
w.add_trade(t)
assert len(w.trades) == 1
assert w.trades[0].price == 100.0
def test_rolling_window_eviction():
"""Old records should be evicted."""
w = RollingWindow(window_seconds=1.0) # 1 second window
old_ts = time.monotonic() - 2.0 # 2 seconds ago
w.add_trade(TradeRecord(price=100.0, size=1.0, ts=old_ts))
w.add_trade(TradeRecord(price=200.0, size=2.0, ts=time.monotonic()))
# Old record should be evicted
trades = list(w.trades)
assert len(trades) == 1
assert trades[0].price == 200.0
def test_rolling_window_trades_since():
w = RollingWindow(window_seconds=60.0)
now = time.monotonic()
# Add trades at different times
w.add_trade(TradeRecord(price=100.0, size=1.0, ts=now - 30)) # 30s ago
w.add_trade(TradeRecord(price=200.0, size=2.0, ts=now - 5)) # 5s ago
w.add_trade(TradeRecord(price=300.0, size=3.0, ts=now)) # now
last_10s = w.trades_since(10.0)
assert len(last_10s) == 2 # 5s ago + now
assert last_10s[0].price == 200.0
def test_rolling_window_empty():
w = RollingWindow(window_seconds=60.0)
assert len(w.trades) == 0
assert len(w.quotes) == 0
assert w.trades_since(10.0) == []
# ── LatestState ────────────────────────────────────────────────────────
def test_latest_state_update_trade():
state = LatestState(window_seconds=60.0)
event = TradeEvent(
provider="binance",
symbol="BTCUSDT",
price=70500.0,
size=1.5,
side="buy",
)
state.update_trade(event)
latest = state.get_latest_trade("BTCUSDT")
assert latest is not None
assert latest.price == 70500.0
assert latest.side == "buy"
assert state.event_count == 1
def test_latest_state_update_quote():
state = LatestState(window_seconds=60.0)
event = QuoteEvent(
provider="binance",
symbol="BTCUSDT",
bid=70000.0,
ask=70001.0,
bid_size=5.0,
ask_size=3.0,
)
state.update_quote(event)
latest = state.get_latest_quote("BTCUSDT")
assert latest is not None
assert latest.bid == 70000.0
assert latest.ask == 70001.0
def test_latest_state_symbols():
state = LatestState(window_seconds=60.0)
state.update_trade(TradeEvent(
provider="binance", symbol="BTCUSDT", price=100.0, size=1.0
))
state.update_quote(QuoteEvent(
provider="binance", symbol="ETHUSDT",
bid=2000.0, ask=2001.0, bid_size=1.0, ask_size=1.0,
))
assert "BTCUSDT" in state.symbols
assert "ETHUSDT" in state.symbols
def test_latest_state_to_dict():
state = LatestState(window_seconds=60.0)
state.update_trade(TradeEvent(
provider="binance", symbol="BTCUSDT", price=70500.0, size=1.0
))
state.update_quote(QuoteEvent(
provider="binance", symbol="BTCUSDT",
bid=70000.0, ask=70001.0, bid_size=1.0, ask_size=1.0,
))
d = state.to_dict("BTCUSDT")
assert d["symbol"] == "BTCUSDT"
assert "latest_trade" in d
assert "latest_quote" in d
assert d["latest_trade"]["price"] == 70500.0
def test_latest_state_missing_symbol():
state = LatestState(window_seconds=60.0)
assert state.get_latest_trade("NOPE") is None
assert state.get_latest_quote("NOPE") is None