Producer (market-data-service):
- Backpressure: smart drop policy (heartbeats→quotes→trades preserved)
- Heartbeat monitor: synthetic HeartbeatEvent on provider silence
- Graceful shutdown: WS→bus→storage→DB engine cleanup sequence
- Bybit V5 public WS provider (backup for Binance, no API key needed)
- FailoverManager: health-based provider switching with recovery
- NATS output adapter: md.events.{type}.{symbol} for SenpAI
- /bus-stats endpoint for backpressure monitoring
- Dockerfile + docker-compose.node1.yml integration
- 36 tests (parsing + bus + failover), requirements.lock
Consumer (senpai-md-consumer):
- NATSConsumer: subscribe md.events.>, queue group senpai-md, backpressure
- State store: LatestState + RollingWindow (deque, 60s)
- Feature engine: 11 features (mid, spread, VWAP, return, vol, latency)
- Rule-based signals: long/short on return+volume+spread conditions
- Publisher: rate-limited features + signals + alerts to NATS
- HTTP API: /health, /metrics, /state/latest, /features/latest, /stats
- 10 Prometheus metrics
- Dockerfile + docker-compose.senpai.yml
- 41 tests (parsing + state + features + rate-limit), requirements.lock
CI: ruff + pytest + smoke import for both services
Tests: 77 total passed, lint clean
Co-authored-by: Cursor <cursoragent@cursor.com>
245 lines
6.4 KiB
Markdown
245 lines
6.4 KiB
Markdown
# Market Data Service (SenpAI)
|
|
|
|
Real-time market data collection and normalization for the SenpAI/Gordon trading agent.
|
|
|
|
## Quick Start
|
|
|
|
### 1. Install
|
|
|
|
```bash
|
|
cd services/market-data-service
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### 2. Copy config
|
|
|
|
```bash
|
|
cp .env.example .env
|
|
```
|
|
|
|
### 3. Run (Binance — no keys needed)
|
|
|
|
```bash
|
|
python -m app run --provider binance --symbols BTCUSDT,ETHUSDT
|
|
```
|
|
|
|
### 4. Run (Alpaca — paper trading)
|
|
|
|
First, get free paper-trading API keys:
|
|
1. Sign up at https://app.alpaca.markets
|
|
2. Switch to **Paper Trading** in the dashboard
|
|
3. Go to API Keys → Generate New Key
|
|
4. Add to `.env`:
|
|
```
|
|
ALPACA_KEY=your_key_here
|
|
ALPACA_SECRET=your_secret_here
|
|
ALPACA_DRY_RUN=false
|
|
```
|
|
5. Run:
|
|
```bash
|
|
python -m app run --provider alpaca --symbols AAPL,TSLA
|
|
```
|
|
|
|
Without keys, Alpaca runs in **dry-run mode** (heartbeats only).
|
|
|
|
### 5. Run (Bybit — backup crypto, no keys needed)
|
|
|
|
```bash
|
|
python -m app run --provider bybit --symbols BTCUSDT,ETHUSDT
|
|
```
|
|
|
|
### 6. Run all providers
|
|
|
|
```bash
|
|
python -m app run --provider all --symbols BTCUSDT,ETHUSDT,AAPL,TSLA
|
|
```
|
|
|
|
## Docker
|
|
|
|
### Build & run standalone
|
|
|
|
```bash
|
|
docker build -t market-data-service .
|
|
docker run --rm -v mds-data:/data market-data-service run --provider binance --symbols BTCUSDT,ETHUSDT
|
|
```
|
|
|
|
### As part of NODE1 stack
|
|
|
|
The service is included in `docker-compose.node1.yml`:
|
|
|
|
```bash
|
|
docker-compose -f docker-compose.node1.yml up -d market-data-service
|
|
```
|
|
|
|
Default config: Binance+Bybit on BTCUSDT,ETHUSDT with NATS output enabled.
|
|
|
|
## HTTP Endpoints
|
|
|
|
Once running, the service exposes:
|
|
|
|
| Endpoint | Description |
|
|
|---|---|
|
|
| `GET /health` | Service health check |
|
|
| `GET /metrics` | Prometheus metrics |
|
|
| `GET /latest?symbol=BTCUSDT` | Latest trade + quote from SQLite |
|
|
| `GET /bus-stats` | Queue size, fill percent, backpressure status |
|
|
|
|
Default port: `8891` (configurable via `HTTP_PORT`).
|
|
|
|
## SenpAI Integration (NATS)
|
|
|
|
Enable NATS output to push events directly to SenpAI:
|
|
|
|
```env
|
|
NATS_URL=nats://localhost:4222
|
|
NATS_ENABLED=true
|
|
NATS_SUBJECT_PREFIX=md.events
|
|
```
|
|
|
|
Subject schema:
|
|
- `md.events.trade.BTCUSDT` — trade events
|
|
- `md.events.quote.AAPL` — quote events
|
|
- `md.events.heartbeat.__system__` — heartbeats
|
|
- `md.events.>` — subscribe to all events
|
|
|
|
## Backpressure & Reliability
|
|
|
|
- **Backpressure**: Smart drop policy when queue fills up
|
|
- 80%+ → drop heartbeat events
|
|
- 90%+ → drop quotes (trades are preserved)
|
|
- 100% → drop oldest event
|
|
- **Heartbeat monitor**: Emits synthetic heartbeat if provider goes silent
|
|
- **Auto-reconnect**: Exponential backoff with resubscribe
|
|
- **Failover**: Bybit as backup for Binance with health-based switching
|
|
|
|
## View Data
|
|
|
|
### SQLite
|
|
```bash
|
|
sqlite3 market_data.db "SELECT * FROM trades ORDER BY ts_recv DESC LIMIT 5;"
|
|
```
|
|
|
|
### JSONL Event Log
|
|
```bash
|
|
tail -5 events.jsonl | python -m json.tool
|
|
```
|
|
|
|
### Prometheus Metrics
|
|
```bash
|
|
curl http://localhost:8891/metrics
|
|
```
|
|
|
|
Key metrics:
|
|
- `market_events_total` — events by provider/type/symbol
|
|
- `market_exchange_latency_ms` — exchange-to-receive latency
|
|
- `market_events_per_second` — throughput gauge
|
|
- `market_gaps_total` — detected gaps per provider
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Provider (Binance/Bybit/Alpaca)
|
|
│ raw WebSocket messages
|
|
▼
|
|
Adapter (_parse → domain Event)
|
|
│ TradeEvent / QuoteEvent / BookL2Event
|
|
▼
|
|
EventBus (asyncio.Queue fan-out + backpressure + heartbeat)
|
|
├─▶ StorageConsumer → SQLite + JSONL
|
|
├─▶ MetricsConsumer → Prometheus counters/histograms
|
|
├─▶ PrintConsumer → structured log (sampled 1/N)
|
|
└─▶ NatsConsumer → NATS PubSub (for SenpAI)
|
|
|
|
FailoverManager
|
|
monitors provider health → switches source on degradation
|
|
```
|
|
|
|
## Adding a New Provider
|
|
|
|
1. Create `app/providers/your_provider.py`
|
|
2. Subclass `MarketDataProvider`:
|
|
|
|
```python
|
|
from app.providers import MarketDataProvider
|
|
from app.domain.events import Event, TradeEvent
|
|
|
|
class YourProvider(MarketDataProvider):
|
|
name = "your_provider"
|
|
|
|
async def connect(self) -> None: ...
|
|
async def subscribe(self, symbols: list[str]) -> None: ...
|
|
async def stream(self) -> AsyncIterator[Event]:
|
|
while True:
|
|
raw = await self._receive()
|
|
yield self._parse(raw)
|
|
async def close(self) -> None: ...
|
|
```
|
|
|
|
3. Register in `app/providers/__init__.py`:
|
|
```python
|
|
from app.providers.your_provider import YourProvider
|
|
registry["your_provider"] = YourProvider
|
|
```
|
|
|
|
4. Add config to `app/config.py` if needed
|
|
5. Run: `python -m app run --provider your_provider --symbols ...`
|
|
|
|
## Tests
|
|
|
|
```bash
|
|
pytest tests/ -v
|
|
```
|
|
|
|
36 tests covering:
|
|
- Binance message parsing (7 tests)
|
|
- Alpaca message parsing (8 tests)
|
|
- Bybit message parsing (9 tests)
|
|
- Event bus: fanout, backpressure, heartbeat (7 tests)
|
|
- Failover manager (5 tests)
|
|
|
|
## CI
|
|
|
|
Included in `.github/workflows/python-services-ci.yml`:
|
|
- `ruff check` — lint
|
|
- `pytest` — unit tests
|
|
- `compileall` — syntax check
|
|
|
|
## Troubleshooting
|
|
|
|
### Port 8891 already in use
|
|
```bash
|
|
lsof -ti:8891 | xargs kill -9
|
|
```
|
|
|
|
### NATS connection refused
|
|
If `NATS_ENABLED=true` but NATS is not running, the service starts normally — NATS output is skipped with a warning log. To run without NATS:
|
|
```env
|
|
NATS_ENABLED=false
|
|
```
|
|
|
|
### SQLite "database is locked"
|
|
Normal under heavy load — SQLite does not support concurrent writers. The service uses a single async writer. If you see this in external tools (`sqlite3` CLI), wait for the service to stop or use the `/latest` HTTP endpoint instead.
|
|
|
|
### Binance WebSocket disconnects
|
|
Auto-reconnect is built in with exponential backoff (1s → 60s max). Check logs for `binance.reconnecting`. If persistent, verify DNS/firewall access to `stream.binance.com:9443`.
|
|
|
|
### Bybit "subscribe_failed"
|
|
Verify symbol names match Bybit spot conventions (e.g. `BTCUSDT`, not `BTC-USDT`). Check `bybit.subscribe_failed` in logs.
|
|
|
|
### No data for Alpaca symbols
|
|
Without API keys, Alpaca runs in **dry-run mode** (heartbeats only). Set `ALPACA_KEY`, `ALPACA_SECRET` and `ALPACA_DRY_RUN=false` in `.env`.
|
|
|
|
### JetStream not available
|
|
If `USE_JETSTREAM=true` but NATS was started without `--js`, you'll see a connection error. Start NATS with JetStream:
|
|
```bash
|
|
docker run -d -p 4222:4222 nats:2.10-alpine --js
|
|
```
|
|
|
|
## TODO: Future Providers
|
|
|
|
- [ ] CoinAPI (REST + WebSocket, paid tier)
|
|
- [ ] IQFeed (US equities, DTN subscription)
|
|
- [ ] Polygon.io (real-time + historical)
|
|
- [ ] Interactive Brokers TWS API
|
|
- [ ] Coinbase WebSocket (backup crypto #2)
|