Producer (market-data-service):
- Backpressure: smart drop policy (heartbeats→quotes→trades preserved)
- Heartbeat monitor: synthetic HeartbeatEvent on provider silence
- Graceful shutdown: WS→bus→storage→DB engine cleanup sequence
- Bybit V5 public WS provider (backup for Binance, no API key needed)
- FailoverManager: health-based provider switching with recovery
- NATS output adapter: md.events.{type}.{symbol} for SenpAI
- /bus-stats endpoint for backpressure monitoring
- Dockerfile + docker-compose.node1.yml integration
- 36 tests (parsing + bus + failover), requirements.lock
Consumer (senpai-md-consumer):
- NATSConsumer: subscribe md.events.>, queue group senpai-md, backpressure
- State store: LatestState + RollingWindow (deque, 60s)
- Feature engine: 11 features (mid, spread, VWAP, return, vol, latency)
- Rule-based signals: long/short on return+volume+spread conditions
- Publisher: rate-limited features + signals + alerts to NATS
- HTTP API: /health, /metrics, /state/latest, /features/latest, /stats
- 10 Prometheus metrics
- Dockerfile + docker-compose.senpai.yml
- 41 tests (parsing + state + features + rate-limit), requirements.lock
CI: ruff + pytest + smoke import for both services
Tests: 77 total passed, lint clean
Co-authored-by: Cursor <cursoragent@cursor.com>
6.4 KiB
Market Data Service (SenpAI)
Real-time market data collection and normalization for the SenpAI/Gordon trading agent.
Quick Start
1. Install
cd services/market-data-service
pip install -r requirements.txt
2. Copy config
cp .env.example .env
3. Run (Binance — no keys needed)
python -m app run --provider binance --symbols BTCUSDT,ETHUSDT
4. Run (Alpaca — paper trading)
First, get free paper-trading API keys:
- Sign up at https://app.alpaca.markets
- Switch to Paper Trading in the dashboard
- Go to API Keys → Generate New Key
- Add to
.env:ALPACA_KEY=your_key_here ALPACA_SECRET=your_secret_here ALPACA_DRY_RUN=false - Run:
python -m app run --provider alpaca --symbols AAPL,TSLA
Without keys, Alpaca runs in dry-run mode (heartbeats only).
5. Run (Bybit — backup crypto, no keys needed)
python -m app run --provider bybit --symbols BTCUSDT,ETHUSDT
6. Run all providers
python -m app run --provider all --symbols BTCUSDT,ETHUSDT,AAPL,TSLA
Docker
Build & run standalone
docker build -t market-data-service .
docker run --rm -v mds-data:/data market-data-service run --provider binance --symbols BTCUSDT,ETHUSDT
As part of NODE1 stack
The service is included in docker-compose.node1.yml:
docker-compose -f docker-compose.node1.yml up -d market-data-service
Default config: Binance+Bybit on BTCUSDT,ETHUSDT with NATS output enabled.
HTTP Endpoints
Once running, the service exposes:
| Endpoint | Description |
|---|---|
GET /health |
Service health check |
GET /metrics |
Prometheus metrics |
GET /latest?symbol=BTCUSDT |
Latest trade + quote from SQLite |
GET /bus-stats |
Queue size, fill percent, backpressure status |
Default port: 8891 (configurable via HTTP_PORT).
SenpAI Integration (NATS)
Enable NATS output to push events directly to SenpAI:
NATS_URL=nats://localhost:4222
NATS_ENABLED=true
NATS_SUBJECT_PREFIX=md.events
Subject schema:
md.events.trade.BTCUSDT— trade eventsmd.events.quote.AAPL— quote eventsmd.events.heartbeat.__system__— heartbeatsmd.events.>— subscribe to all events
Backpressure & Reliability
- Backpressure: Smart drop policy when queue fills up
- 80%+ → drop heartbeat events
- 90%+ → drop quotes (trades are preserved)
- 100% → drop oldest event
- Heartbeat monitor: Emits synthetic heartbeat if provider goes silent
- Auto-reconnect: Exponential backoff with resubscribe
- Failover: Bybit as backup for Binance with health-based switching
View Data
SQLite
sqlite3 market_data.db "SELECT * FROM trades ORDER BY ts_recv DESC LIMIT 5;"
JSONL Event Log
tail -5 events.jsonl | python -m json.tool
Prometheus Metrics
curl http://localhost:8891/metrics
Key metrics:
market_events_total— events by provider/type/symbolmarket_exchange_latency_ms— exchange-to-receive latencymarket_events_per_second— throughput gaugemarket_gaps_total— detected gaps per provider
Architecture
Provider (Binance/Bybit/Alpaca)
│ raw WebSocket messages
▼
Adapter (_parse → domain Event)
│ TradeEvent / QuoteEvent / BookL2Event
▼
EventBus (asyncio.Queue fan-out + backpressure + heartbeat)
├─▶ StorageConsumer → SQLite + JSONL
├─▶ MetricsConsumer → Prometheus counters/histograms
├─▶ PrintConsumer → structured log (sampled 1/N)
└─▶ NatsConsumer → NATS PubSub (for SenpAI)
FailoverManager
monitors provider health → switches source on degradation
Adding a New Provider
- Create
app/providers/your_provider.py - Subclass
MarketDataProvider:
from app.providers import MarketDataProvider
from app.domain.events import Event, TradeEvent
class YourProvider(MarketDataProvider):
name = "your_provider"
async def connect(self) -> None: ...
async def subscribe(self, symbols: list[str]) -> None: ...
async def stream(self) -> AsyncIterator[Event]:
while True:
raw = await self._receive()
yield self._parse(raw)
async def close(self) -> None: ...
- Register in
app/providers/__init__.py:
from app.providers.your_provider import YourProvider
registry["your_provider"] = YourProvider
- Add config to
app/config.pyif needed - Run:
python -m app run --provider your_provider --symbols ...
Tests
pytest tests/ -v
36 tests covering:
- Binance message parsing (7 tests)
- Alpaca message parsing (8 tests)
- Bybit message parsing (9 tests)
- Event bus: fanout, backpressure, heartbeat (7 tests)
- Failover manager (5 tests)
CI
Included in .github/workflows/python-services-ci.yml:
ruff check— lintpytest— unit testscompileall— syntax check
Troubleshooting
Port 8891 already in use
lsof -ti:8891 | xargs kill -9
NATS connection refused
If NATS_ENABLED=true but NATS is not running, the service starts normally — NATS output is skipped with a warning log. To run without NATS:
NATS_ENABLED=false
SQLite "database is locked"
Normal under heavy load — SQLite does not support concurrent writers. The service uses a single async writer. If you see this in external tools (sqlite3 CLI), wait for the service to stop or use the /latest HTTP endpoint instead.
Binance WebSocket disconnects
Auto-reconnect is built in with exponential backoff (1s → 60s max). Check logs for binance.reconnecting. If persistent, verify DNS/firewall access to stream.binance.com:9443.
Bybit "subscribe_failed"
Verify symbol names match Bybit spot conventions (e.g. BTCUSDT, not BTC-USDT). Check bybit.subscribe_failed in logs.
No data for Alpaca symbols
Without API keys, Alpaca runs in dry-run mode (heartbeats only). Set ALPACA_KEY, ALPACA_SECRET and ALPACA_DRY_RUN=false in .env.
JetStream not available
If USE_JETSTREAM=true but NATS was started without --js, you'll see a connection error. Start NATS with JetStream:
docker run -d -p 4222:4222 nats:2.10-alpine --js
TODO: Future Providers
- CoinAPI (REST + WebSocket, paid tier)
- IQFeed (US equities, DTN subscription)
- Polygon.io (real-time + historical)
- Interactive Brokers TWS API
- Coinbase WebSocket (backup crypto #2)