Files
microdao-daarion/services/market-data-service/README.md
Apple 09dee24342 feat: MD pipeline — market-data-service hardening + SenpAI NATS consumer
Producer (market-data-service):
- Backpressure: smart drop policy (heartbeats→quotes→trades preserved)
- Heartbeat monitor: synthetic HeartbeatEvent on provider silence
- Graceful shutdown: WS→bus→storage→DB engine cleanup sequence
- Bybit V5 public WS provider (backup for Binance, no API key needed)
- FailoverManager: health-based provider switching with recovery
- NATS output adapter: md.events.{type}.{symbol} for SenpAI
- /bus-stats endpoint for backpressure monitoring
- Dockerfile + docker-compose.node1.yml integration
- 36 tests (parsing + bus + failover), requirements.lock

Consumer (senpai-md-consumer):
- NATSConsumer: subscribe md.events.>, queue group senpai-md, backpressure
- State store: LatestState + RollingWindow (deque, 60s)
- Feature engine: 11 features (mid, spread, VWAP, return, vol, latency)
- Rule-based signals: long/short on return+volume+spread conditions
- Publisher: rate-limited features + signals + alerts to NATS
- HTTP API: /health, /metrics, /state/latest, /features/latest, /stats
- 10 Prometheus metrics
- Dockerfile + docker-compose.senpai.yml
- 41 tests (parsing + state + features + rate-limit), requirements.lock

CI: ruff + pytest + smoke import for both services
Tests: 77 total passed, lint clean
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-09 11:46:15 -08:00

6.4 KiB

Market Data Service (SenpAI)

Real-time market data collection and normalization for the SenpAI/Gordon trading agent.

Quick Start

1. Install

cd services/market-data-service
pip install -r requirements.txt

2. Copy config

cp .env.example .env

3. Run (Binance — no keys needed)

python -m app run --provider binance --symbols BTCUSDT,ETHUSDT

4. Run (Alpaca — paper trading)

First, get free paper-trading API keys:

  1. Sign up at https://app.alpaca.markets
  2. Switch to Paper Trading in the dashboard
  3. Go to API Keys → Generate New Key
  4. Add to .env:
    ALPACA_KEY=your_key_here
    ALPACA_SECRET=your_secret_here
    ALPACA_DRY_RUN=false
    
  5. Run:
    python -m app run --provider alpaca --symbols AAPL,TSLA
    

Without keys, Alpaca runs in dry-run mode (heartbeats only).

5. Run (Bybit — backup crypto, no keys needed)

python -m app run --provider bybit --symbols BTCUSDT,ETHUSDT

6. Run all providers

python -m app run --provider all --symbols BTCUSDT,ETHUSDT,AAPL,TSLA

Docker

Build & run standalone

docker build -t market-data-service .
docker run --rm -v mds-data:/data market-data-service run --provider binance --symbols BTCUSDT,ETHUSDT

As part of NODE1 stack

The service is included in docker-compose.node1.yml:

docker-compose -f docker-compose.node1.yml up -d market-data-service

Default config: Binance+Bybit on BTCUSDT,ETHUSDT with NATS output enabled.

HTTP Endpoints

Once running, the service exposes:

Endpoint Description
GET /health Service health check
GET /metrics Prometheus metrics
GET /latest?symbol=BTCUSDT Latest trade + quote from SQLite
GET /bus-stats Queue size, fill percent, backpressure status

Default port: 8891 (configurable via HTTP_PORT).

SenpAI Integration (NATS)

Enable NATS output to push events directly to SenpAI:

NATS_URL=nats://localhost:4222
NATS_ENABLED=true
NATS_SUBJECT_PREFIX=md.events

Subject schema:

  • md.events.trade.BTCUSDT — trade events
  • md.events.quote.AAPL — quote events
  • md.events.heartbeat.__system__ — heartbeats
  • md.events.> — subscribe to all events

Backpressure & Reliability

  • Backpressure: Smart drop policy when queue fills up
    • 80%+ → drop heartbeat events
    • 90%+ → drop quotes (trades are preserved)
    • 100% → drop oldest event
  • Heartbeat monitor: Emits synthetic heartbeat if provider goes silent
  • Auto-reconnect: Exponential backoff with resubscribe
  • Failover: Bybit as backup for Binance with health-based switching

View Data

SQLite

sqlite3 market_data.db "SELECT * FROM trades ORDER BY ts_recv DESC LIMIT 5;"

JSONL Event Log

tail -5 events.jsonl | python -m json.tool

Prometheus Metrics

curl http://localhost:8891/metrics

Key metrics:

  • market_events_total — events by provider/type/symbol
  • market_exchange_latency_ms — exchange-to-receive latency
  • market_events_per_second — throughput gauge
  • market_gaps_total — detected gaps per provider

Architecture

Provider (Binance/Bybit/Alpaca)
    │ raw WebSocket messages
    ▼
Adapter (_parse → domain Event)
    │ TradeEvent / QuoteEvent / BookL2Event
    ▼
EventBus (asyncio.Queue fan-out + backpressure + heartbeat)
    ├─▶ StorageConsumer  → SQLite + JSONL
    ├─▶ MetricsConsumer  → Prometheus counters/histograms
    ├─▶ PrintConsumer    → structured log (sampled 1/N)
    └─▶ NatsConsumer     → NATS PubSub (for SenpAI)
    
FailoverManager
    monitors provider health → switches source on degradation

Adding a New Provider

  1. Create app/providers/your_provider.py
  2. Subclass MarketDataProvider:
from app.providers import MarketDataProvider
from app.domain.events import Event, TradeEvent

class YourProvider(MarketDataProvider):
    name = "your_provider"

    async def connect(self) -> None: ...
    async def subscribe(self, symbols: list[str]) -> None: ...
    async def stream(self) -> AsyncIterator[Event]:
        while True:
            raw = await self._receive()
            yield self._parse(raw)
    async def close(self) -> None: ...
  1. Register in app/providers/__init__.py:
from app.providers.your_provider import YourProvider
registry["your_provider"] = YourProvider
  1. Add config to app/config.py if needed
  2. Run: python -m app run --provider your_provider --symbols ...

Tests

pytest tests/ -v

36 tests covering:

  • Binance message parsing (7 tests)
  • Alpaca message parsing (8 tests)
  • Bybit message parsing (9 tests)
  • Event bus: fanout, backpressure, heartbeat (7 tests)
  • Failover manager (5 tests)

CI

Included in .github/workflows/python-services-ci.yml:

  • ruff check — lint
  • pytest — unit tests
  • compileall — syntax check

Troubleshooting

Port 8891 already in use

lsof -ti:8891 | xargs kill -9

NATS connection refused

If NATS_ENABLED=true but NATS is not running, the service starts normally — NATS output is skipped with a warning log. To run without NATS:

NATS_ENABLED=false

SQLite "database is locked"

Normal under heavy load — SQLite does not support concurrent writers. The service uses a single async writer. If you see this in external tools (sqlite3 CLI), wait for the service to stop or use the /latest HTTP endpoint instead.

Binance WebSocket disconnects

Auto-reconnect is built in with exponential backoff (1s → 60s max). Check logs for binance.reconnecting. If persistent, verify DNS/firewall access to stream.binance.com:9443.

Bybit "subscribe_failed"

Verify symbol names match Bybit spot conventions (e.g. BTCUSDT, not BTC-USDT). Check bybit.subscribe_failed in logs.

No data for Alpaca symbols

Without API keys, Alpaca runs in dry-run mode (heartbeats only). Set ALPACA_KEY, ALPACA_SECRET and ALPACA_DRY_RUN=false in .env.

JetStream not available

If USE_JETSTREAM=true but NATS was started without --js, you'll see a connection error. Start NATS with JetStream:

docker run -d -p 4222:4222 nats:2.10-alpine --js

TODO: Future Providers

  • CoinAPI (REST + WebSocket, paid tier)
  • IQFeed (US equities, DTN subscription)
  • Polygon.io (real-time + historical)
  • Interactive Brokers TWS API
  • Coinbase WebSocket (backup crypto #2)