Files
microdao-daarion/services/market-data-service/README.md
Apple 09dee24342 feat: MD pipeline — market-data-service hardening + SenpAI NATS consumer
Producer (market-data-service):
- Backpressure: smart drop policy (heartbeats→quotes→trades preserved)
- Heartbeat monitor: synthetic HeartbeatEvent on provider silence
- Graceful shutdown: WS→bus→storage→DB engine cleanup sequence
- Bybit V5 public WS provider (backup for Binance, no API key needed)
- FailoverManager: health-based provider switching with recovery
- NATS output adapter: md.events.{type}.{symbol} for SenpAI
- /bus-stats endpoint for backpressure monitoring
- Dockerfile + docker-compose.node1.yml integration
- 36 tests (parsing + bus + failover), requirements.lock

Consumer (senpai-md-consumer):
- NATSConsumer: subscribe md.events.>, queue group senpai-md, backpressure
- State store: LatestState + RollingWindow (deque, 60s)
- Feature engine: 11 features (mid, spread, VWAP, return, vol, latency)
- Rule-based signals: long/short on return+volume+spread conditions
- Publisher: rate-limited features + signals + alerts to NATS
- HTTP API: /health, /metrics, /state/latest, /features/latest, /stats
- 10 Prometheus metrics
- Dockerfile + docker-compose.senpai.yml
- 41 tests (parsing + state + features + rate-limit), requirements.lock

CI: ruff + pytest + smoke import for both services
Tests: 77 total passed, lint clean
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-09 11:46:15 -08:00

245 lines
6.4 KiB
Markdown

# Market Data Service (SenpAI)
Real-time market data collection and normalization for the SenpAI/Gordon trading agent.
## Quick Start
### 1. Install
```bash
cd services/market-data-service
pip install -r requirements.txt
```
### 2. Copy config
```bash
cp .env.example .env
```
### 3. Run (Binance — no keys needed)
```bash
python -m app run --provider binance --symbols BTCUSDT,ETHUSDT
```
### 4. Run (Alpaca — paper trading)
First, get free paper-trading API keys:
1. Sign up at https://app.alpaca.markets
2. Switch to **Paper Trading** in the dashboard
3. Go to API Keys → Generate New Key
4. Add to `.env`:
```
ALPACA_KEY=your_key_here
ALPACA_SECRET=your_secret_here
ALPACA_DRY_RUN=false
```
5. Run:
```bash
python -m app run --provider alpaca --symbols AAPL,TSLA
```
Without keys, Alpaca runs in **dry-run mode** (heartbeats only).
### 5. Run (Bybit — backup crypto, no keys needed)
```bash
python -m app run --provider bybit --symbols BTCUSDT,ETHUSDT
```
### 6. Run all providers
```bash
python -m app run --provider all --symbols BTCUSDT,ETHUSDT,AAPL,TSLA
```
## Docker
### Build & run standalone
```bash
docker build -t market-data-service .
docker run --rm -v mds-data:/data market-data-service run --provider binance --symbols BTCUSDT,ETHUSDT
```
### As part of NODE1 stack
The service is included in `docker-compose.node1.yml`:
```bash
docker-compose -f docker-compose.node1.yml up -d market-data-service
```
Default config: Binance+Bybit on BTCUSDT,ETHUSDT with NATS output enabled.
## HTTP Endpoints
Once running, the service exposes:
| Endpoint | Description |
|---|---|
| `GET /health` | Service health check |
| `GET /metrics` | Prometheus metrics |
| `GET /latest?symbol=BTCUSDT` | Latest trade + quote from SQLite |
| `GET /bus-stats` | Queue size, fill percent, backpressure status |
Default port: `8891` (configurable via `HTTP_PORT`).
## SenpAI Integration (NATS)
Enable NATS output to push events directly to SenpAI:
```env
NATS_URL=nats://localhost:4222
NATS_ENABLED=true
NATS_SUBJECT_PREFIX=md.events
```
Subject schema:
- `md.events.trade.BTCUSDT` — trade events
- `md.events.quote.AAPL` — quote events
- `md.events.heartbeat.__system__` — heartbeats
- `md.events.>` — subscribe to all events
## Backpressure & Reliability
- **Backpressure**: Smart drop policy when queue fills up
- 80%+ → drop heartbeat events
- 90%+ → drop quotes (trades are preserved)
- 100% → drop oldest event
- **Heartbeat monitor**: Emits synthetic heartbeat if provider goes silent
- **Auto-reconnect**: Exponential backoff with resubscribe
- **Failover**: Bybit as backup for Binance with health-based switching
## View Data
### SQLite
```bash
sqlite3 market_data.db "SELECT * FROM trades ORDER BY ts_recv DESC LIMIT 5;"
```
### JSONL Event Log
```bash
tail -5 events.jsonl | python -m json.tool
```
### Prometheus Metrics
```bash
curl http://localhost:8891/metrics
```
Key metrics:
- `market_events_total` — events by provider/type/symbol
- `market_exchange_latency_ms` — exchange-to-receive latency
- `market_events_per_second` — throughput gauge
- `market_gaps_total` — detected gaps per provider
## Architecture
```
Provider (Binance/Bybit/Alpaca)
│ raw WebSocket messages
Adapter (_parse → domain Event)
│ TradeEvent / QuoteEvent / BookL2Event
EventBus (asyncio.Queue fan-out + backpressure + heartbeat)
├─▶ StorageConsumer → SQLite + JSONL
├─▶ MetricsConsumer → Prometheus counters/histograms
├─▶ PrintConsumer → structured log (sampled 1/N)
└─▶ NatsConsumer → NATS PubSub (for SenpAI)
FailoverManager
monitors provider health → switches source on degradation
```
## Adding a New Provider
1. Create `app/providers/your_provider.py`
2. Subclass `MarketDataProvider`:
```python
from app.providers import MarketDataProvider
from app.domain.events import Event, TradeEvent
class YourProvider(MarketDataProvider):
name = "your_provider"
async def connect(self) -> None: ...
async def subscribe(self, symbols: list[str]) -> None: ...
async def stream(self) -> AsyncIterator[Event]:
while True:
raw = await self._receive()
yield self._parse(raw)
async def close(self) -> None: ...
```
3. Register in `app/providers/__init__.py`:
```python
from app.providers.your_provider import YourProvider
registry["your_provider"] = YourProvider
```
4. Add config to `app/config.py` if needed
5. Run: `python -m app run --provider your_provider --symbols ...`
## Tests
```bash
pytest tests/ -v
```
36 tests covering:
- Binance message parsing (7 tests)
- Alpaca message parsing (8 tests)
- Bybit message parsing (9 tests)
- Event bus: fanout, backpressure, heartbeat (7 tests)
- Failover manager (5 tests)
## CI
Included in `.github/workflows/python-services-ci.yml`:
- `ruff check` — lint
- `pytest` — unit tests
- `compileall` — syntax check
## Troubleshooting
### Port 8891 already in use
```bash
lsof -ti:8891 | xargs kill -9
```
### NATS connection refused
If `NATS_ENABLED=true` but NATS is not running, the service starts normally — NATS output is skipped with a warning log. To run without NATS:
```env
NATS_ENABLED=false
```
### SQLite "database is locked"
Normal under heavy load — SQLite does not support concurrent writers. The service uses a single async writer. If you see this in external tools (`sqlite3` CLI), wait for the service to stop or use the `/latest` HTTP endpoint instead.
### Binance WebSocket disconnects
Auto-reconnect is built in with exponential backoff (1s → 60s max). Check logs for `binance.reconnecting`. If persistent, verify DNS/firewall access to `stream.binance.com:9443`.
### Bybit "subscribe_failed"
Verify symbol names match Bybit spot conventions (e.g. `BTCUSDT`, not `BTC-USDT`). Check `bybit.subscribe_failed` in logs.
### No data for Alpaca symbols
Without API keys, Alpaca runs in **dry-run mode** (heartbeats only). Set `ALPACA_KEY`, `ALPACA_SECRET` and `ALPACA_DRY_RUN=false` in `.env`.
### JetStream not available
If `USE_JETSTREAM=true` but NATS was started without `--js`, you'll see a connection error. Start NATS with JetStream:
```bash
docker run -d -p 4222:4222 nats:2.10-alpine --js
```
## TODO: Future Providers
- [ ] CoinAPI (REST + WebSocket, paid tier)
- [ ] IQFeed (US equities, DTN subscription)
- [ ] Polygon.io (real-time + historical)
- [ ] Interactive Brokers TWS API
- [ ] Coinbase WebSocket (backup crypto #2)