# Market Data Service (SenpAI) Real-time market data collection and normalization for the SenpAI/Gordon trading agent. ## Quick Start ### 1. Install ```bash cd services/market-data-service pip install -r requirements.txt ``` ### 2. Copy config ```bash cp .env.example .env ``` ### 3. Run (Binance — no keys needed) ```bash python -m app run --provider binance --symbols BTCUSDT,ETHUSDT ``` ### 4. Run (Alpaca — paper trading) First, get free paper-trading API keys: 1. Sign up at https://app.alpaca.markets 2. Switch to **Paper Trading** in the dashboard 3. Go to API Keys → Generate New Key 4. Add to `.env`: ``` ALPACA_KEY=your_key_here ALPACA_SECRET=your_secret_here ALPACA_DRY_RUN=false ``` 5. Run: ```bash python -m app run --provider alpaca --symbols AAPL,TSLA ``` Without keys, Alpaca runs in **dry-run mode** (heartbeats only). ### 5. Run (Bybit — backup crypto, no keys needed) ```bash python -m app run --provider bybit --symbols BTCUSDT,ETHUSDT ``` ### 6. Run all providers ```bash python -m app run --provider all --symbols BTCUSDT,ETHUSDT,AAPL,TSLA ``` ## Docker ### Build & run standalone ```bash docker build -t market-data-service . docker run --rm -v mds-data:/data market-data-service run --provider binance --symbols BTCUSDT,ETHUSDT ``` ### As part of NODE1 stack The service is included in `docker-compose.node1.yml`: ```bash docker-compose -f docker-compose.node1.yml up -d market-data-service ``` Default config: Binance+Bybit on BTCUSDT,ETHUSDT with NATS output enabled. ## HTTP Endpoints Once running, the service exposes: | Endpoint | Description | |---|---| | `GET /health` | Service health check | | `GET /metrics` | Prometheus metrics | | `GET /latest?symbol=BTCUSDT` | Latest trade + quote from SQLite | | `GET /bus-stats` | Queue size, fill percent, backpressure status | Default port: `8891` (configurable via `HTTP_PORT`). ## SenpAI Integration (NATS) Enable NATS output to push events directly to SenpAI: ```env NATS_URL=nats://localhost:4222 NATS_ENABLED=true NATS_SUBJECT_PREFIX=md.events ``` Subject schema: - `md.events.trade.BTCUSDT` — trade events - `md.events.quote.AAPL` — quote events - `md.events.heartbeat.__system__` — heartbeats - `md.events.>` — subscribe to all events ## Backpressure & Reliability - **Backpressure**: Smart drop policy when queue fills up - 80%+ → drop heartbeat events - 90%+ → drop quotes (trades are preserved) - 100% → drop oldest event - **Heartbeat monitor**: Emits synthetic heartbeat if provider goes silent - **Auto-reconnect**: Exponential backoff with resubscribe - **Failover**: Bybit as backup for Binance with health-based switching ## View Data ### SQLite ```bash sqlite3 market_data.db "SELECT * FROM trades ORDER BY ts_recv DESC LIMIT 5;" ``` ### JSONL Event Log ```bash tail -5 events.jsonl | python -m json.tool ``` ### Prometheus Metrics ```bash curl http://localhost:8891/metrics ``` Key metrics: - `market_events_total` — events by provider/type/symbol - `market_exchange_latency_ms` — exchange-to-receive latency - `market_events_per_second` — throughput gauge - `market_gaps_total` — detected gaps per provider ## Architecture ``` Provider (Binance/Bybit/Alpaca) │ raw WebSocket messages ▼ Adapter (_parse → domain Event) │ TradeEvent / QuoteEvent / BookL2Event ▼ EventBus (asyncio.Queue fan-out + backpressure + heartbeat) ├─▶ StorageConsumer → SQLite + JSONL ├─▶ MetricsConsumer → Prometheus counters/histograms ├─▶ PrintConsumer → structured log (sampled 1/N) └─▶ NatsConsumer → NATS PubSub (for SenpAI) FailoverManager monitors provider health → switches source on degradation ``` ## Adding a New Provider 1. Create `app/providers/your_provider.py` 2. Subclass `MarketDataProvider`: ```python from app.providers import MarketDataProvider from app.domain.events import Event, TradeEvent class YourProvider(MarketDataProvider): name = "your_provider" async def connect(self) -> None: ... async def subscribe(self, symbols: list[str]) -> None: ... async def stream(self) -> AsyncIterator[Event]: while True: raw = await self._receive() yield self._parse(raw) async def close(self) -> None: ... ``` 3. Register in `app/providers/__init__.py`: ```python from app.providers.your_provider import YourProvider registry["your_provider"] = YourProvider ``` 4. Add config to `app/config.py` if needed 5. Run: `python -m app run --provider your_provider --symbols ...` ## Tests ```bash pytest tests/ -v ``` 36 tests covering: - Binance message parsing (7 tests) - Alpaca message parsing (8 tests) - Bybit message parsing (9 tests) - Event bus: fanout, backpressure, heartbeat (7 tests) - Failover manager (5 tests) ## CI Included in `.github/workflows/python-services-ci.yml`: - `ruff check` — lint - `pytest` — unit tests - `compileall` — syntax check ## Troubleshooting ### Port 8891 already in use ```bash lsof -ti:8891 | xargs kill -9 ``` ### NATS connection refused If `NATS_ENABLED=true` but NATS is not running, the service starts normally — NATS output is skipped with a warning log. To run without NATS: ```env NATS_ENABLED=false ``` ### SQLite "database is locked" Normal under heavy load — SQLite does not support concurrent writers. The service uses a single async writer. If you see this in external tools (`sqlite3` CLI), wait for the service to stop or use the `/latest` HTTP endpoint instead. ### Binance WebSocket disconnects Auto-reconnect is built in with exponential backoff (1s → 60s max). Check logs for `binance.reconnecting`. If persistent, verify DNS/firewall access to `stream.binance.com:9443`. ### Bybit "subscribe_failed" Verify symbol names match Bybit spot conventions (e.g. `BTCUSDT`, not `BTC-USDT`). Check `bybit.subscribe_failed` in logs. ### No data for Alpaca symbols Without API keys, Alpaca runs in **dry-run mode** (heartbeats only). Set `ALPACA_KEY`, `ALPACA_SECRET` and `ALPACA_DRY_RUN=false` in `.env`. ### JetStream not available If `USE_JETSTREAM=true` but NATS was started without `--js`, you'll see a connection error. Start NATS with JetStream: ```bash docker run -d -p 4222:4222 nats:2.10-alpine --js ``` ## TODO: Future Providers - [ ] CoinAPI (REST + WebSocket, paid tier) - [ ] IQFeed (US equities, DTN subscription) - [ ] Polygon.io (real-time + historical) - [ ] Interactive Brokers TWS API - [ ] Coinbase WebSocket (backup crypto #2)