Complete snapshot of /opt/microdao-daarion/ from NODE1 (144.76.224.179).
This represents the actual running production code that has diverged
significantly from the previous main branch.
Key changes from old main:
- Gateway (http_api.py): expanded from ~40KB to 164KB with full agent support
- Router: new /v1/agents/{id}/infer endpoint with vision + DeepSeek routing
- Behavior Policy: SOWA v2.2 (3-level: FULL/ACK/SILENT)
- Agent Registry: config/agent_registry.yml as single source of truth
- 13 agents configured (was 3)
- Memory service integration
- CrewAI teams and roles
Excluded from snapshot: venv/, .env, data/, backups, .tgz archives
Co-authored-by: Cursor <cursoragent@cursor.com>
218 lines
4.9 KiB
Markdown
218 lines
4.9 KiB
Markdown
# Secrets Rotation Runbook
|
|
|
|
**Last Updated:** 2026-01-19
|
|
**Owner:** Platform Team
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
This runbook describes the procedure for rotating secrets in the DAARION platform without service downtime.
|
|
|
|
---
|
|
|
|
## Secrets Inventory
|
|
|
|
| Secret | Location | Service | Rotation Frequency |
|
|
|-------|----------|---------|-------------------|
|
|
| NATS Creds | `.env` files | All services | Quarterly |
|
|
| JWT Secrets | Gateway `.env` | Gateway | Quarterly |
|
|
| DeepSeek API Key | Router `.env` | Router | As needed |
|
|
| Mistral API Key | Router `.env` | Router | As needed |
|
|
| Grok API Key | Router `.env` | Router | As needed |
|
|
| Cohere API Key | Router `.env` | Router | As needed |
|
|
| Qdrant API Key | Memory `.env` | Memory Service | Quarterly |
|
|
| PostgreSQL Password | `.env` files | All services | Quarterly |
|
|
| Neo4j Password | `.env` files | Router, Memory | Quarterly |
|
|
| Telegram Bot Tokens | Gateway `.env` | Gateway | As needed |
|
|
|
|
---
|
|
|
|
## Rotation Procedure
|
|
|
|
### Phase 1: Preparation (5 min)
|
|
|
|
1. **Backup current secrets:**
|
|
```bash
|
|
# Backup all .env files
|
|
cd /opt/microdao-daarion
|
|
tar -czf secrets_backup_$(date +%Y%m%d).tar.gz \
|
|
gateway-bot/.env \
|
|
services/*/.env \
|
|
docker-compose.node1.yml
|
|
```
|
|
|
|
2. **Create new secrets:**
|
|
- Generate new passwords/keys
|
|
- Store in secure location (not in git)
|
|
|
|
3. **Verify services are healthy:**
|
|
```bash
|
|
docker ps --format "{{.Names}}: {{.Status}}" | grep -E "(gateway|router|memory)"
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 2: Dual-Validity Period (10 min)
|
|
|
|
**For API Keys (DeepSeek, Mistral, etc.):**
|
|
|
|
1. **Add new key alongside old:**
|
|
```bash
|
|
# In Router .env
|
|
DEEPSEEK_API_KEY_OLD=$DEEPSEEK_API_KEY
|
|
DEEPSEEK_API_KEY_NEW=<new_key>
|
|
```
|
|
|
|
2. **Update code to try new key first, fallback to old:**
|
|
```python
|
|
api_key = os.getenv("DEEPSEEK_API_KEY_NEW") or os.getenv("DEEPSEEK_API_KEY_OLD")
|
|
```
|
|
|
|
3. **Restart Router:**
|
|
```bash
|
|
docker restart dagi-router-node1
|
|
```
|
|
|
|
4. **Monitor for 5 minutes:**
|
|
```bash
|
|
# Check logs for errors
|
|
docker logs -f dagi-router-node1 | grep -i error
|
|
```
|
|
|
|
5. **If stable, remove old key:**
|
|
```bash
|
|
# Remove DEEPSEEK_API_KEY_OLD from .env
|
|
docker restart dagi-router-node1
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 3: Database Password Rotation (15 min)
|
|
|
|
**For PostgreSQL:**
|
|
|
|
1. **Update password in Postgres:**
|
|
```bash
|
|
docker exec dagi-postgres psql -U daarion -c "
|
|
ALTER USER daarion WITH PASSWORD 'NewPassword123!';
|
|
"
|
|
```
|
|
|
|
2. **Update all .env files with new password:**
|
|
```bash
|
|
# Update in all services
|
|
find /opt/microdao-daarion -name ".env" -exec sed -i 's/DB_PASSWORD=.*/DB_PASSWORD=NewPassword123!/g' {} \;
|
|
```
|
|
|
|
3. **Rolling restart (one service at a time):**
|
|
```bash
|
|
docker restart dagi-memory-service-node1
|
|
sleep 5
|
|
docker restart dagi-router-node1
|
|
sleep 5
|
|
# ... continue for all services
|
|
```
|
|
|
|
4. **Verify connectivity:**
|
|
```bash
|
|
docker exec dagi-memory-service-node1 psql -U daarion -d daarion_main -c "SELECT 1;"
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 4: NATS Credentials Rotation (10 min)
|
|
|
|
1. **Generate new NATS credentials:**
|
|
```bash
|
|
# On NATS server
|
|
docker exec dagi-nats-node1 nats server generate-credentials \
|
|
--name worker \
|
|
--output /data/worker.creds
|
|
```
|
|
|
|
2. **Update NATS config:**
|
|
```bash
|
|
# Update docker-compose.node1.yml
|
|
# Add new credentials path
|
|
```
|
|
|
|
3. **Rolling restart services:**
|
|
```bash
|
|
# Restart workers first (they reconnect automatically)
|
|
docker restart crewai-nats-worker
|
|
docker restart parser-pipeline
|
|
|
|
# Then restart Gateway/Router
|
|
docker restart dagi-gateway-node1
|
|
docker restart dagi-router-node1
|
|
```
|
|
|
|
4. **Verify NATS connectivity:**
|
|
```bash
|
|
# Check JetStream streams
|
|
curl http://localhost:8222/jsz
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 5: Verification (5 min)
|
|
|
|
1. **Health checks:**
|
|
```bash
|
|
curl http://localhost:9300/health # Gateway
|
|
curl http://localhost:9102/health # Router
|
|
curl http://localhost:8000/health # Memory
|
|
```
|
|
|
|
2. **Test critical flows:**
|
|
- Send test message to Telegram bot
|
|
- Verify agent response
|
|
- Check memory storage
|
|
|
|
3. **Monitor logs:**
|
|
```bash
|
|
docker logs --tail 100 dagi-gateway-node1 | grep -i error
|
|
docker logs --tail 100 dagi-router-node1 | grep -i error
|
|
```
|
|
|
|
---
|
|
|
|
## Rollback Procedure
|
|
|
|
If rotation fails:
|
|
|
|
1. **Restore old secrets:**
|
|
```bash
|
|
tar -xzf secrets_backup_YYYYMMDD.tar.gz
|
|
```
|
|
|
|
2. **Restart services:**
|
|
```bash
|
|
docker-compose -f docker-compose.node1.yml restart
|
|
```
|
|
|
|
3. **Verify recovery:**
|
|
```bash
|
|
# Run health checks
|
|
# Test critical flows
|
|
```
|
|
|
|
---
|
|
|
|
## Emergency Contacts
|
|
|
|
- **Platform Lead:** @platform-lead
|
|
- **On-Call:** Check PagerDuty
|
|
- **Slack Channel:** #platform-ops
|
|
|
|
---
|
|
|
|
## Notes
|
|
|
|
- **Never commit secrets to git**
|
|
- **Use environment variables, not hardcoded values**
|
|
- **Test rotation in staging first**
|
|
- **Keep backup of old secrets for 30 days**
|
|
- **Document any custom rotation procedures**
|