# Secrets Rotation Runbook **Last Updated:** 2026-01-19 **Owner:** Platform Team --- ## Overview This runbook describes the procedure for rotating secrets in the DAARION platform without service downtime. --- ## Secrets Inventory | Secret | Location | Service | Rotation Frequency | |-------|----------|---------|-------------------| | NATS Creds | `.env` files | All services | Quarterly | | JWT Secrets | Gateway `.env` | Gateway | Quarterly | | DeepSeek API Key | Router `.env` | Router | As needed | | Mistral API Key | Router `.env` | Router | As needed | | Grok API Key | Router `.env` | Router | As needed | | Cohere API Key | Router `.env` | Router | As needed | | Qdrant API Key | Memory `.env` | Memory Service | Quarterly | | PostgreSQL Password | `.env` files | All services | Quarterly | | Neo4j Password | `.env` files | Router, Memory | Quarterly | | Telegram Bot Tokens | Gateway `.env` | Gateway | As needed | --- ## Rotation Procedure ### Phase 1: Preparation (5 min) 1. **Backup current secrets:** ```bash # Backup all .env files cd /opt/microdao-daarion tar -czf secrets_backup_$(date +%Y%m%d).tar.gz \ gateway-bot/.env \ services/*/.env \ docker-compose.node1.yml ``` 2. **Create new secrets:** - Generate new passwords/keys - Store in secure location (not in git) 3. **Verify services are healthy:** ```bash docker ps --format "{{.Names}}: {{.Status}}" | grep -E "(gateway|router|memory)" ``` --- ### Phase 2: Dual-Validity Period (10 min) **For API Keys (DeepSeek, Mistral, etc.):** 1. **Add new key alongside old:** ```bash # In Router .env DEEPSEEK_API_KEY_OLD=$DEEPSEEK_API_KEY DEEPSEEK_API_KEY_NEW= ``` 2. **Update code to try new key first, fallback to old:** ```python api_key = os.getenv("DEEPSEEK_API_KEY_NEW") or os.getenv("DEEPSEEK_API_KEY_OLD") ``` 3. **Restart Router:** ```bash docker restart dagi-router-node1 ``` 4. **Monitor for 5 minutes:** ```bash # Check logs for errors docker logs -f dagi-router-node1 | grep -i error ``` 5. **If stable, remove old key:** ```bash # Remove DEEPSEEK_API_KEY_OLD from .env docker restart dagi-router-node1 ``` --- ### Phase 3: Database Password Rotation (15 min) **For PostgreSQL:** 1. **Update password in Postgres:** ```bash docker exec dagi-postgres psql -U daarion -c " ALTER USER daarion WITH PASSWORD 'NewPassword123!'; " ``` 2. **Update all .env files with new password:** ```bash # Update in all services find /opt/microdao-daarion -name ".env" -exec sed -i 's/DB_PASSWORD=.*/DB_PASSWORD=NewPassword123!/g' {} \; ``` 3. **Rolling restart (one service at a time):** ```bash docker restart dagi-memory-service-node1 sleep 5 docker restart dagi-router-node1 sleep 5 # ... continue for all services ``` 4. **Verify connectivity:** ```bash docker exec dagi-memory-service-node1 psql -U daarion -d daarion_main -c "SELECT 1;" ``` --- ### Phase 4: NATS Credentials Rotation (10 min) 1. **Generate new NATS credentials:** ```bash # On NATS server docker exec dagi-nats-node1 nats server generate-credentials \ --name worker \ --output /data/worker.creds ``` 2. **Update NATS config:** ```bash # Update docker-compose.node1.yml # Add new credentials path ``` 3. **Rolling restart services:** ```bash # Restart workers first (they reconnect automatically) docker restart crewai-nats-worker docker restart parser-pipeline # Then restart Gateway/Router docker restart dagi-gateway-node1 docker restart dagi-router-node1 ``` 4. **Verify NATS connectivity:** ```bash # Check JetStream streams curl http://localhost:8222/jsz ``` --- ### Phase 5: Verification (5 min) 1. **Health checks:** ```bash curl http://localhost:9300/health # Gateway curl http://localhost:9102/health # Router curl http://localhost:8000/health # Memory ``` 2. **Test critical flows:** - Send test message to Telegram bot - Verify agent response - Check memory storage 3. **Monitor logs:** ```bash docker logs --tail 100 dagi-gateway-node1 | grep -i error docker logs --tail 100 dagi-router-node1 | grep -i error ``` --- ## Rollback Procedure If rotation fails: 1. **Restore old secrets:** ```bash tar -xzf secrets_backup_YYYYMMDD.tar.gz ``` 2. **Restart services:** ```bash docker-compose -f docker-compose.node1.yml restart ``` 3. **Verify recovery:** ```bash # Run health checks # Test critical flows ``` --- ## Emergency Contacts - **Platform Lead:** @platform-lead - **On-Call:** Check PagerDuty - **Slack Channel:** #platform-ops --- ## Notes - **Never commit secrets to git** - **Use environment variables, not hardcoded values** - **Test rotation in staging first** - **Keep backup of old secrets for 30 days** - **Document any custom rotation procedures**