feat: MD pipeline — market-data-service hardening + SenpAI NATS consumer

Producer (market-data-service):
- Backpressure: smart drop policy (heartbeats→quotes→trades preserved)
- Heartbeat monitor: synthetic HeartbeatEvent on provider silence
- Graceful shutdown: WS→bus→storage→DB engine cleanup sequence
- Bybit V5 public WS provider (backup for Binance, no API key needed)
- FailoverManager: health-based provider switching with recovery
- NATS output adapter: md.events.{type}.{symbol} for SenpAI
- /bus-stats endpoint for backpressure monitoring
- Dockerfile + docker-compose.node1.yml integration
- 36 tests (parsing + bus + failover), requirements.lock

Consumer (senpai-md-consumer):
- NATSConsumer: subscribe md.events.>, queue group senpai-md, backpressure
- State store: LatestState + RollingWindow (deque, 60s)
- Feature engine: 11 features (mid, spread, VWAP, return, vol, latency)
- Rule-based signals: long/short on return+volume+spread conditions
- Publisher: rate-limited features + signals + alerts to NATS
- HTTP API: /health, /metrics, /state/latest, /features/latest, /stats
- 10 Prometheus metrics
- Dockerfile + docker-compose.senpai.yml
- 41 tests (parsing + state + features + rate-limit), requirements.lock

CI: ruff + pytest + smoke import for both services
Tests: 77 total passed, lint clean
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Apple
2026-02-09 11:46:15 -08:00
parent c50843933f
commit 09dee24342
47 changed files with 3930 additions and 56 deletions

View File

@@ -42,12 +42,37 @@ First, get free paper-trading API keys:
Without keys, Alpaca runs in **dry-run mode** (heartbeats only).
### 5. Run both providers
### 5. Run (Bybit — backup crypto, no keys needed)
```bash
python -m app run --provider all --symbols BTCUSDT,AAPL
python -m app run --provider bybit --symbols BTCUSDT,ETHUSDT
```
### 6. Run all providers
```bash
python -m app run --provider all --symbols BTCUSDT,ETHUSDT,AAPL,TSLA
```
## Docker
### Build & run standalone
```bash
docker build -t market-data-service .
docker run --rm -v mds-data:/data market-data-service run --provider binance --symbols BTCUSDT,ETHUSDT
```
### As part of NODE1 stack
The service is included in `docker-compose.node1.yml`:
```bash
docker-compose -f docker-compose.node1.yml up -d market-data-service
```
Default config: Binance+Bybit on BTCUSDT,ETHUSDT with NATS output enabled.
## HTTP Endpoints
Once running, the service exposes:
@@ -57,9 +82,36 @@ Once running, the service exposes:
| `GET /health` | Service health check |
| `GET /metrics` | Prometheus metrics |
| `GET /latest?symbol=BTCUSDT` | Latest trade + quote from SQLite |
| `GET /bus-stats` | Queue size, fill percent, backpressure status |
Default port: `8891` (configurable via `HTTP_PORT`).
## SenpAI Integration (NATS)
Enable NATS output to push events directly to SenpAI:
```env
NATS_URL=nats://localhost:4222
NATS_ENABLED=true
NATS_SUBJECT_PREFIX=md.events
```
Subject schema:
- `md.events.trade.BTCUSDT` — trade events
- `md.events.quote.AAPL` — quote events
- `md.events.heartbeat.__system__` — heartbeats
- `md.events.>` — subscribe to all events
## Backpressure & Reliability
- **Backpressure**: Smart drop policy when queue fills up
- 80%+ → drop heartbeat events
- 90%+ → drop quotes (trades are preserved)
- 100% → drop oldest event
- **Heartbeat monitor**: Emits synthetic heartbeat if provider goes silent
- **Auto-reconnect**: Exponential backoff with resubscribe
- **Failover**: Bybit as backup for Binance with health-based switching
## View Data
### SQLite
@@ -81,20 +133,25 @@ Key metrics:
- `market_events_total` — events by provider/type/symbol
- `market_exchange_latency_ms` — exchange-to-receive latency
- `market_events_per_second` — throughput gauge
- `market_gaps_total` — detected gaps per provider
## Architecture
```
Provider (Binance/Alpaca)
Provider (Binance/Bybit/Alpaca)
│ raw WebSocket messages
Adapter (_parse → domain Event)
│ TradeEvent / QuoteEvent / BookL2Event
EventBus (asyncio.Queue fan-out)
EventBus (asyncio.Queue fan-out + backpressure + heartbeat)
├─▶ StorageConsumer → SQLite + JSONL
├─▶ MetricsConsumer → Prometheus counters/histograms
─▶ PrintConsumer → structured log (sampled 1/100)
─▶ PrintConsumer → structured log (sampled 1/N)
└─▶ NatsConsumer → NATS PubSub (for SenpAI)
FailoverManager
monitors provider health → switches source on degradation
```
## Adding a New Provider
@@ -109,35 +166,23 @@ from app.domain.events import Event, TradeEvent
class YourProvider(MarketDataProvider):
name = "your_provider"
async def connect(self) -> None:
# Establish connection
...
async def subscribe(self, symbols: list[str]) -> None:
# Subscribe to streams
...
async def connect(self) -> None: ...
async def subscribe(self, symbols: list[str]) -> None: ...
async def stream(self) -> AsyncIterator[Event]:
# Yield normalized events, handle reconnect
while True:
raw = await self._receive()
yield self._parse(raw)
async def close(self) -> None:
...
async def close(self) -> None: ...
```
3. Register in `app/providers/__init__.py`:
```python
from app.providers.your_provider import YourProvider
registry = {
...
"your_provider": YourProvider,
}
registry["your_provider"] = YourProvider
```
4. Run: `python -m app run --provider your_provider --symbols ...`
4. Add config to `app/config.py` if needed
5. Run: `python -m app run --provider your_provider --symbols ...`
## Tests
@@ -145,9 +190,55 @@ registry = {
pytest tests/ -v
```
36 tests covering:
- Binance message parsing (7 tests)
- Alpaca message parsing (8 tests)
- Bybit message parsing (9 tests)
- Event bus: fanout, backpressure, heartbeat (7 tests)
- Failover manager (5 tests)
## CI
Included in `.github/workflows/python-services-ci.yml`:
- `ruff check` — lint
- `pytest` — unit tests
- `compileall` — syntax check
## Troubleshooting
### Port 8891 already in use
```bash
lsof -ti:8891 | xargs kill -9
```
### NATS connection refused
If `NATS_ENABLED=true` but NATS is not running, the service starts normally — NATS output is skipped with a warning log. To run without NATS:
```env
NATS_ENABLED=false
```
### SQLite "database is locked"
Normal under heavy load — SQLite does not support concurrent writers. The service uses a single async writer. If you see this in external tools (`sqlite3` CLI), wait for the service to stop or use the `/latest` HTTP endpoint instead.
### Binance WebSocket disconnects
Auto-reconnect is built in with exponential backoff (1s → 60s max). Check logs for `binance.reconnecting`. If persistent, verify DNS/firewall access to `stream.binance.com:9443`.
### Bybit "subscribe_failed"
Verify symbol names match Bybit spot conventions (e.g. `BTCUSDT`, not `BTC-USDT`). Check `bybit.subscribe_failed` in logs.
### No data for Alpaca symbols
Without API keys, Alpaca runs in **dry-run mode** (heartbeats only). Set `ALPACA_KEY`, `ALPACA_SECRET` and `ALPACA_DRY_RUN=false` in `.env`.
### JetStream not available
If `USE_JETSTREAM=true` but NATS was started without `--js`, you'll see a connection error. Start NATS with JetStream:
```bash
docker run -d -p 4222:4222 nats:2.10-alpine --js
```
## TODO: Future Providers
- [ ] CoinAPI (REST + WebSocket, paid tier)
- [ ] IQFeed (US equities, DTN subscription)
- [ ] Polygon.io (real-time + historical)
- [ ] Interactive Brokers TWS API
- [ ] Coinbase WebSocket (backup crypto #2)