ROOT CAUSE: Monitor was doing DROP DATABASE when NODE2 agents were missing,
but the backup didn't have NODE2 agents, causing an infinite loop.
FIX:
- FULL RECOVERY (DROP DATABASE) only when MicroDAOs < 5 (critical data loss)
- SOFT RECOVERY (just sync agents) when MicroDAOs exist but agents missing
- Prefer backup with NODE2 agents (full_backup_with_node2*.sql)
- Never DROP DATABASE if MicroDAOs exist
This prevents the daily data loss issue.
- Check for at least 45 NODE2 agents (out of 50 expected)
- This prevents false positives when only core agents exist
- Better detection of actual data loss
- Add monitor-db-stability.sh for automatic recovery
- Improve PostgreSQL shutdown settings to prevent data loss
- Add checkpoint and WAL settings for better persistence
- This script was trying to assign test agents (ag_atlas, etc.) to NODE2
- Use sync-node2-dagi-agents.py instead for loading real agents
- Test agents are now automatically removed by health check
- Add apply-migrations.sh for automatic migration application
- Add ensure-db-persistence.sh for database integrity checks
- Add db-health-check.sh for periodic health monitoring
- Improve PostgreSQL configuration in docker-compose.db.yml
- Add proper shutdown settings to prevent data loss
- Fix get_dagi_router_agents to use router_healthy from node_cache first
- Fallback to direct API call only if cache is unavailable
- This fixes NODE2 agents showing as 'stale' when router is actually healthy
- Fix CITY_SERVICE_URL in scripts (remove /api/city, use /api)
- Add NGINX reverse proxy config for assets.daarion.space
- Add script to migrate assets from /static/uploads to MinIO
- Add script to update asset URLs in database after migration
- Created city-lobby room as main public chat with DAARWIZZ
- Fixed /api/city/rooms proxy to use correct backend path (/api/v1/city/rooms)
- Updated district rooms with zone keys (leadership, system, engineering, etc.)
- Set MicroDAO lobbies as primary rooms
- Created seed_city_rooms.py script
- Created TASK_PHASE_CITY_ROOMS_AND_PUBLIC_CHAT_v1.md
Total: 35 rooms, 31 public, 10 districts
- Add migration 042_node_cache_router_metrics.sql
- Node guardian now collects router health and sends in heartbeat
- City-service uses cached router_healthy from node_cache
- This allows NODE2 router status to be displayed correctly
- Add migration 041_node_local_endpoints.sql
- Add get_node_endpoints() to repo_city.py
- Update routes_city.py to use DB endpoints instead of hardcoded URLs
- Update node-guardian-loop.py to use NODE_SWAPPER_URL/NODE_ROUTER_URL env vars
- Update launchd plist for NODE2 with router URL
- Add migration 013_city_map_coordinates.sql with map coordinates, zones, and agents table
- Add /city/map API endpoint in city-service
- Add /city/agents and /city/agents/online endpoints
- Extend presence aggregator to include agents[] in snapshot
- Add AgentsSource for fetching agent data from DB
- Create CityMap component with interactive room tiles
- Add useCityMap hook for fetching map data
- Update useGlobalPresence to include agents
- Add map/list view toggle on /city page
- Add agent badges to room cards and map tiles
- matrix-gateway: POST /internal/matrix/presence/online endpoint
- usePresenceHeartbeat hook with activity tracking
- Auto away after 5 min inactivity
- Offline on page close/visibility change
- Integrated in MatrixChatRoom component