feat: MD pipeline — market-data-service hardening + SenpAI NATS consumer
Producer (market-data-service):
- Backpressure: smart drop policy (heartbeats→quotes→trades preserved)
- Heartbeat monitor: synthetic HeartbeatEvent on provider silence
- Graceful shutdown: WS→bus→storage→DB engine cleanup sequence
- Bybit V5 public WS provider (backup for Binance, no API key needed)
- FailoverManager: health-based provider switching with recovery
- NATS output adapter: md.events.{type}.{symbol} for SenpAI
- /bus-stats endpoint for backpressure monitoring
- Dockerfile + docker-compose.node1.yml integration
- 36 tests (parsing + bus + failover), requirements.lock
Consumer (senpai-md-consumer):
- NATSConsumer: subscribe md.events.>, queue group senpai-md, backpressure
- State store: LatestState + RollingWindow (deque, 60s)
- Feature engine: 11 features (mid, spread, VWAP, return, vol, latency)
- Rule-based signals: long/short on return+volume+spread conditions
- Publisher: rate-limited features + signals + alerts to NATS
- HTTP API: /health, /metrics, /state/latest, /features/latest, /stats
- 10 Prometheus metrics
- Dockerfile + docker-compose.senpai.yml
- 41 tests (parsing + state + features + rate-limit), requirements.lock
CI: ruff + pytest + smoke import for both services
Tests: 77 total passed, lint clean
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -42,12 +42,37 @@ First, get free paper-trading API keys:
|
||||
|
||||
Without keys, Alpaca runs in **dry-run mode** (heartbeats only).
|
||||
|
||||
### 5. Run both providers
|
||||
### 5. Run (Bybit — backup crypto, no keys needed)
|
||||
|
||||
```bash
|
||||
python -m app run --provider all --symbols BTCUSDT,AAPL
|
||||
python -m app run --provider bybit --symbols BTCUSDT,ETHUSDT
|
||||
```
|
||||
|
||||
### 6. Run all providers
|
||||
|
||||
```bash
|
||||
python -m app run --provider all --symbols BTCUSDT,ETHUSDT,AAPL,TSLA
|
||||
```
|
||||
|
||||
## Docker
|
||||
|
||||
### Build & run standalone
|
||||
|
||||
```bash
|
||||
docker build -t market-data-service .
|
||||
docker run --rm -v mds-data:/data market-data-service run --provider binance --symbols BTCUSDT,ETHUSDT
|
||||
```
|
||||
|
||||
### As part of NODE1 stack
|
||||
|
||||
The service is included in `docker-compose.node1.yml`:
|
||||
|
||||
```bash
|
||||
docker-compose -f docker-compose.node1.yml up -d market-data-service
|
||||
```
|
||||
|
||||
Default config: Binance+Bybit on BTCUSDT,ETHUSDT with NATS output enabled.
|
||||
|
||||
## HTTP Endpoints
|
||||
|
||||
Once running, the service exposes:
|
||||
@@ -57,9 +82,36 @@ Once running, the service exposes:
|
||||
| `GET /health` | Service health check |
|
||||
| `GET /metrics` | Prometheus metrics |
|
||||
| `GET /latest?symbol=BTCUSDT` | Latest trade + quote from SQLite |
|
||||
| `GET /bus-stats` | Queue size, fill percent, backpressure status |
|
||||
|
||||
Default port: `8891` (configurable via `HTTP_PORT`).
|
||||
|
||||
## SenpAI Integration (NATS)
|
||||
|
||||
Enable NATS output to push events directly to SenpAI:
|
||||
|
||||
```env
|
||||
NATS_URL=nats://localhost:4222
|
||||
NATS_ENABLED=true
|
||||
NATS_SUBJECT_PREFIX=md.events
|
||||
```
|
||||
|
||||
Subject schema:
|
||||
- `md.events.trade.BTCUSDT` — trade events
|
||||
- `md.events.quote.AAPL` — quote events
|
||||
- `md.events.heartbeat.__system__` — heartbeats
|
||||
- `md.events.>` — subscribe to all events
|
||||
|
||||
## Backpressure & Reliability
|
||||
|
||||
- **Backpressure**: Smart drop policy when queue fills up
|
||||
- 80%+ → drop heartbeat events
|
||||
- 90%+ → drop quotes (trades are preserved)
|
||||
- 100% → drop oldest event
|
||||
- **Heartbeat monitor**: Emits synthetic heartbeat if provider goes silent
|
||||
- **Auto-reconnect**: Exponential backoff with resubscribe
|
||||
- **Failover**: Bybit as backup for Binance with health-based switching
|
||||
|
||||
## View Data
|
||||
|
||||
### SQLite
|
||||
@@ -81,20 +133,25 @@ Key metrics:
|
||||
- `market_events_total` — events by provider/type/symbol
|
||||
- `market_exchange_latency_ms` — exchange-to-receive latency
|
||||
- `market_events_per_second` — throughput gauge
|
||||
- `market_gaps_total` — detected gaps per provider
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Provider (Binance/Alpaca)
|
||||
Provider (Binance/Bybit/Alpaca)
|
||||
│ raw WebSocket messages
|
||||
▼
|
||||
Adapter (_parse → domain Event)
|
||||
│ TradeEvent / QuoteEvent / BookL2Event
|
||||
▼
|
||||
EventBus (asyncio.Queue fan-out)
|
||||
EventBus (asyncio.Queue fan-out + backpressure + heartbeat)
|
||||
├─▶ StorageConsumer → SQLite + JSONL
|
||||
├─▶ MetricsConsumer → Prometheus counters/histograms
|
||||
└─▶ PrintConsumer → structured log (sampled 1/100)
|
||||
├─▶ PrintConsumer → structured log (sampled 1/N)
|
||||
└─▶ NatsConsumer → NATS PubSub (for SenpAI)
|
||||
|
||||
FailoverManager
|
||||
monitors provider health → switches source on degradation
|
||||
```
|
||||
|
||||
## Adding a New Provider
|
||||
@@ -109,35 +166,23 @@ from app.domain.events import Event, TradeEvent
|
||||
class YourProvider(MarketDataProvider):
|
||||
name = "your_provider"
|
||||
|
||||
async def connect(self) -> None:
|
||||
# Establish connection
|
||||
...
|
||||
|
||||
async def subscribe(self, symbols: list[str]) -> None:
|
||||
# Subscribe to streams
|
||||
...
|
||||
|
||||
async def connect(self) -> None: ...
|
||||
async def subscribe(self, symbols: list[str]) -> None: ...
|
||||
async def stream(self) -> AsyncIterator[Event]:
|
||||
# Yield normalized events, handle reconnect
|
||||
while True:
|
||||
raw = await self._receive()
|
||||
yield self._parse(raw)
|
||||
|
||||
async def close(self) -> None:
|
||||
...
|
||||
async def close(self) -> None: ...
|
||||
```
|
||||
|
||||
3. Register in `app/providers/__init__.py`:
|
||||
```python
|
||||
from app.providers.your_provider import YourProvider
|
||||
|
||||
registry = {
|
||||
...
|
||||
"your_provider": YourProvider,
|
||||
}
|
||||
registry["your_provider"] = YourProvider
|
||||
```
|
||||
|
||||
4. Run: `python -m app run --provider your_provider --symbols ...`
|
||||
4. Add config to `app/config.py` if needed
|
||||
5. Run: `python -m app run --provider your_provider --symbols ...`
|
||||
|
||||
## Tests
|
||||
|
||||
@@ -145,9 +190,55 @@ registry = {
|
||||
pytest tests/ -v
|
||||
```
|
||||
|
||||
36 tests covering:
|
||||
- Binance message parsing (7 tests)
|
||||
- Alpaca message parsing (8 tests)
|
||||
- Bybit message parsing (9 tests)
|
||||
- Event bus: fanout, backpressure, heartbeat (7 tests)
|
||||
- Failover manager (5 tests)
|
||||
|
||||
## CI
|
||||
|
||||
Included in `.github/workflows/python-services-ci.yml`:
|
||||
- `ruff check` — lint
|
||||
- `pytest` — unit tests
|
||||
- `compileall` — syntax check
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Port 8891 already in use
|
||||
```bash
|
||||
lsof -ti:8891 | xargs kill -9
|
||||
```
|
||||
|
||||
### NATS connection refused
|
||||
If `NATS_ENABLED=true` but NATS is not running, the service starts normally — NATS output is skipped with a warning log. To run without NATS:
|
||||
```env
|
||||
NATS_ENABLED=false
|
||||
```
|
||||
|
||||
### SQLite "database is locked"
|
||||
Normal under heavy load — SQLite does not support concurrent writers. The service uses a single async writer. If you see this in external tools (`sqlite3` CLI), wait for the service to stop or use the `/latest` HTTP endpoint instead.
|
||||
|
||||
### Binance WebSocket disconnects
|
||||
Auto-reconnect is built in with exponential backoff (1s → 60s max). Check logs for `binance.reconnecting`. If persistent, verify DNS/firewall access to `stream.binance.com:9443`.
|
||||
|
||||
### Bybit "subscribe_failed"
|
||||
Verify symbol names match Bybit spot conventions (e.g. `BTCUSDT`, not `BTC-USDT`). Check `bybit.subscribe_failed` in logs.
|
||||
|
||||
### No data for Alpaca symbols
|
||||
Without API keys, Alpaca runs in **dry-run mode** (heartbeats only). Set `ALPACA_KEY`, `ALPACA_SECRET` and `ALPACA_DRY_RUN=false` in `.env`.
|
||||
|
||||
### JetStream not available
|
||||
If `USE_JETSTREAM=true` but NATS was started without `--js`, you'll see a connection error. Start NATS with JetStream:
|
||||
```bash
|
||||
docker run -d -p 4222:4222 nats:2.10-alpine --js
|
||||
```
|
||||
|
||||
## TODO: Future Providers
|
||||
|
||||
- [ ] CoinAPI (REST + WebSocket, paid tier)
|
||||
- [ ] IQFeed (US equities, DTN subscription)
|
||||
- [ ] Polygon.io (real-time + historical)
|
||||
- [ ] Interactive Brokers TWS API
|
||||
- [ ] Coinbase WebSocket (backup crypto #2)
|
||||
|
||||
Reference in New Issue
Block a user