Files
microdao-daarion/docs/DATA_RETENTION_POLICY.md
Apple ef3473db21 snapshot: NODE1 production state 2026-02-09
Complete snapshot of /opt/microdao-daarion/ from NODE1 (144.76.224.179).
This represents the actual running production code that has diverged
significantly from the previous main branch.

Key changes from old main:
- Gateway (http_api.py): expanded from ~40KB to 164KB with full agent support
- Router: new /v1/agents/{id}/infer endpoint with vision + DeepSeek routing
- Behavior Policy: SOWA v2.2 (3-level: FULL/ACK/SILENT)
- Agent Registry: config/agent_registry.yml as single source of truth
- 13 agents configured (was 3)
- Memory service integration
- CrewAI teams and roles

Excluded from snapshot: venv/, .env, data/, backups, .tgz archives

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-09 08:46:46 -08:00

267 lines
6.6 KiB
Markdown

# Data Retention Policy
**Last Updated:** 2026-01-19
**Owner:** Platform Team
---
## Overview
This document defines data retention policies for all data stores in the DAARION platform to prevent unbounded storage growth.
---
## 1. NATS JetStream Streams
| Stream | Max Age | Max Size | Retention Policy | Action |
|--------|---------|----------|------------------|--------|
| MESSAGES | 7 days | 10GB | limits | Auto-purge after 7d |
| AGENT_RUNS | 7 days | 5GB | limits | Auto-purge after 7d |
| ATTACHMENTS | 30 days | 50GB | limits | Auto-purge after 30d |
| MEMORY | 30 days | 20GB | limits | Auto-purge after 30d |
| AUDIT | 90 days | 100GB | limits | Archive after 90d |
### Configuration
```yaml
# In NATS config
streams:
MESSAGES:
max_age: 7d
max_bytes: 10GB
AGENT_RUNS:
max_age: 7d
max_bytes: 5GB
ATTACHMENTS:
max_age: 30d
max_bytes: 50GB
MEMORY:
max_age: 30d
max_bytes: 20GB
AUDIT:
max_age: 90d
max_bytes: 100GB
```
---
## 2. Application Logs
| Service | Retention | Location | Action |
|---------|-----------|----------|--------|
| Gateway | 7 days | `/opt/microdao-daarion/logs/gateway/` | Rotate daily, delete after 7d |
| Router | 7 days | `/opt/microdao-daarion/logs/router/` | Rotate daily, delete after 7d |
| Memory Service | 7 days | `/opt/microdao-daarion/logs/memory/` | Rotate daily, delete after 7d |
| All Workers | 7 days | `/opt/microdao-daarion/logs/workers/` | Rotate daily, delete after 7d |
### Log Rotation Script
```bash
#!/bin/bash
# /opt/microdao-daarion/scripts/rotate_logs.sh
LOG_DIR="/opt/microdao-daarion/logs"
RETENTION_DAYS=7
find "$LOG_DIR" -name "*.log.*" -type f -mtime +$RETENTION_DAYS -delete
find "$LOG_DIR" -name "*.log" -type f -mtime +$RETENTION_DAYS -exec gzip {} \;
```
**Cron:** `0 2 * * * /opt/microdao-daarion/scripts/rotate_logs.sh`
---
## 3. Attachments & Artifacts
| Type | Retention | Location | Action |
|------|-----------|----------|--------|
| Uploaded files | 30 days | `/data/uploads/` | Delete after 30d of inactivity |
| Processed artifacts | 90 days | `/data/artifacts/` | Archive to cold storage after 90d |
| Temporary files | 1 day | `/tmp/` | Auto-cleanup on service restart |
### Cleanup Script
```bash
#!/bin/bash
# /opt/microdao-daarion/scripts/cleanup_attachments.sh
UPLOAD_DIR="/data/uploads"
ARTIFACT_DIR="/data/artifacts"
# Delete uploads older than 30 days
find "$UPLOAD_DIR" -type f -mtime +30 -delete
# Archive artifacts older than 90 days
find "$ARTIFACT_DIR" -type f -mtime +90 -exec tar -czf /data/archive/{}.tar.gz {} \; -delete
```
**Cron:** `0 3 * * * /opt/microdao-daarion/scripts/cleanup_attachments.sh`
---
## 4. Redis Cache
| Key Pattern | TTL | Action |
|-------------|-----|--------|
| `idemp:*` | 24 hours | Auto-expire (handled by Redis) |
| `cache:*` | 1 hour | Auto-expire |
| `session:*` | 7 days | Auto-expire |
| `rate_limit:*` | 1 hour | Auto-expire |
**No manual cleanup needed** - Redis handles TTL automatically.
### Monitoring
```bash
# Check Redis memory usage
redis-cli INFO memory
# Check key counts by pattern
redis-cli --scan --pattern "idemp:*" | wc -l
```
---
## 5. PostgreSQL
| Table | Retention | Action |
|-------|-----------|--------|
| `sessions` | 30 days | `DELETE FROM sessions WHERE created_at < NOW() - INTERVAL '30 days'` |
| `audit_log` | 90 days | `DELETE FROM audit_log WHERE timestamp < NOW() - INTERVAL '90 days'` |
| `facts` | Keep all | No deletion (critical data) |
| `helion_mentors` | Keep all | No deletion (critical data) |
### Cleanup Script
```sql
-- /opt/microdao-daarion/scripts/cleanup_postgres.sql
BEGIN;
DELETE FROM sessions
WHERE created_at < NOW() - INTERVAL '30 days';
DELETE FROM audit_log
WHERE timestamp < NOW() - INTERVAL '90 days';
VACUUM ANALYZE;
COMMIT;
```
**Cron:** `0 4 * * 0 psql -U daarion -d daarion_main -f /opt/microdao-daarion/scripts/cleanup_postgres.sql`
---
## 6. Qdrant Collections
| Collection | Retention | Action |
|------------|-----------|--------|
| `*_messages` | 90 days | Delete points older than 90d |
| `*_docs` | 180 days | Delete points older than 180d |
| `*_kb` | Keep all | No deletion (knowledge base) |
### Cleanup Script
```python
# /opt/microdao-daarion/scripts/cleanup_qdrant.py
from qdrant_client import QdrantClient
from datetime import datetime, timedelta
client = QdrantClient("http://qdrant:6333")
collections = ["helion_messages", "nutra_messages"]
cutoff_date = datetime.now() - timedelta(days=90)
for collection in collections:
# Delete points older than cutoff
client.delete(
collection_name=collection,
points_selector={
"filter": {
"must": [{
"key": "timestamp",
"range": {"lt": cutoff_date.isoformat()}
}]
}
}
)
```
**Cron:** `0 5 * * 0 python3 /opt/microdao-daarion/scripts/cleanup_qdrant.py`
---
## 7. Neo4j Graph Database
| Node Type | Retention | Action |
|-----------|-----------|--------|
| Events | 90 days | Delete nodes with `timestamp < NOW() - 90 days` |
| Relationships | Keep all | No deletion (structural data) |
| User nodes | Keep all | No deletion (identity data) |
### Cleanup Script
```cypher
// /opt/microdao-daarion/scripts/cleanup_neo4j.cypher
MATCH (e:Event)
WHERE e.timestamp < datetime() - duration({days: 90})
DETACH DELETE e;
```
**Cron:** `0 6 * * 0 cypher-shell -u neo4j -p "DaarionNeo4j2026!" -f /opt/microdao-daarion/scripts/cleanup_neo4j.cypher`
---
## 8. Monitoring & Alerts
### Storage Usage Alerts
- **Disk usage > 80%**: Alert to ops channel
- **Stream size > 80% of max**: Alert to ops channel
- **Log directory > 10GB**: Alert to ops channel
### Metrics to Track
- Total storage used per service
- Retention policy compliance (age of oldest data)
- Cleanup job success/failure rates
---
## 9. Emergency Cleanup
If storage is critically low:
1. **Immediate actions:**
```bash
# Delete old logs
find /opt/microdao-daarion/logs -name "*.log.*" -mtime +3 -delete
# Delete old uploads
find /data/uploads -type f -mtime +7 -delete
# Vacuum PostgreSQL
psql -U daarion -d daarion_main -c "VACUUM FULL;"
```
2. **Reduce retention temporarily:**
- Set NATS stream max_age to 3 days
- Reduce log retention to 3 days
- Archive old attachments immediately
---
## 10. Compliance Notes
- **Audit logs**: Keep for 90 days minimum (compliance requirement)
- **User data**: Follow GDPR - delete on request
- **Backups**: Keep for 30 days, then archive to cold storage
---
## Review Schedule
- **Weekly**: Check storage usage metrics
- **Monthly**: Review retention policy effectiveness
- **Quarterly**: Adjust retention periods based on usage patterns