feat: Add Alateya, Clan, Eonarch agents + fix gateway-router connection

## Agents Added
- Alateya: R&D, biotech, innovations
- Clan (Spirit): Community spirit agent
- Eonarch: Consciousness evolution agent

## Changes
- docker-compose.node1.yml: Added tokens for all 3 new agents
- gateway-bot/http_api.py: Added configs and webhook endpoints
- gateway-bot/clan_prompt.txt: New prompt file
- gateway-bot/eonarch_prompt.txt: New prompt file

## Fixes
- Fixed ROUTER_URL from :9102 to :8000 (internal container port)
- All 9 Telegram agents now working

## Documentation
- Created PROJECT-MASTER-INDEX.md - single entry point
- Added various status documents and scripts

Tokens configured:
- Helion, NUTRA, Agromatrix (existing)
- Alateya, Clan, Eonarch (new)
- Druid, GreenFood, DAARWIZZ (configured)
This commit is contained in:
Apple
2026-01-28 06:40:34 -08:00
parent 4aeb69e7ae
commit 0c8bef82f4
120 changed files with 21905 additions and 425 deletions

View File

@@ -0,0 +1,183 @@
# Qdrant Canonical Migration - Cutover Checklist
**Status:** GO
**Date:** 2026-01-26
**Risk Level:** Operational (security invariants verified)
---
## Pre-Cutover Verification
### Security Invariants (VERIFIED ✅)
| Invariant | Status |
|-----------|--------|
| `tenant_id` always required | ✅ |
| `agent_ids ⊆ allowed_agent_ids` (or admin) | ✅ |
| Admin default: no private | ✅ |
| Empty `should` → error | ✅ |
| Private only via owner | ✅ |
| Qdrant `match.any` format | ✅ |
---
## Cutover Steps
### 1. Deploy
```bash
# Copy code to NODE1
scp -r services/memory/qdrant/ root@NODE1:/opt/microdao-daarion/services/memory/
scp docs/memory/canonical_collections.yaml root@NODE1:/opt/microdao-daarion/docs/memory/
scp scripts/qdrant_*.py root@NODE1:/opt/microdao-daarion/scripts/
# IMPORTANT: Verify dim/metric in canonical_collections.yaml matches live embedding
# Current: dim=1024, metric=cosine
# Restart service that owns Qdrant reads/writes
docker compose restart memory-service
# OR
systemctl restart memory-service
```
### 2. Migration
```bash
# Dry run - MUST pass before real migration
python3 scripts/qdrant_migrate_to_canonical.py --all --dry-run 2>&1 | tee migration_dry_run.log
# Verify dry run output:
# - Target collection name(s) shown
# - Per-collection counts listed
# - Zero dim/metric mismatches (unless --skip-dim-check used)
# Real migration
python3 scripts/qdrant_migrate_to_canonical.py --all --continue-on-error 2>&1 | tee migration_$(date +%Y%m%d_%H%M%S).log
# Review summary:
# - Collections processed: X/Y
# - Points migrated: N
# - Errors: should be 0 or minimal
```
### 3. Parity Check
```bash
python3 scripts/qdrant_parity_check.py --agents helion,nutra,druid 2>&1 | tee parity_check.log
# Requirements:
# - Count parity within tolerance
# - topK overlap threshold passes
# - Schema validation passes
```
### 4. Dual-Read Window
```bash
# Enable dual-read for validation
export DUAL_READ_OLD=true
# Restart service to pick up env change
docker compose restart memory-service
```
**Validation queries (must pass):**
| Query Type | Expected Result |
|------------|-----------------|
| Agent-only (Helion) | Returns own docs, no other agents |
| Multi-agent (DAARWIZZ) | Returns from allowed agents only |
| Private visibility | Only owner sees private |
```bash
# Run smoke test
python3 scripts/qdrant_smoke_test.py --host localhost
```
### 5. Cutover
```bash
# Disable dual-read
export DUAL_READ_OLD=false
# Ensure no legacy writes
export DUAL_WRITE_OLD=false
# OR remove these env vars entirely
# Restart service
docker compose restart memory-service
# Verify service is healthy
curl -s http://localhost:8000/health
```
### 6. Post-Cutover Guard
```bash
# Keep legacy collections for rollback window (recommended: 7 days)
# DO NOT delete legacy collections yet
# After rollback window (7 days):
# 1. Run one more parity check
python3 scripts/qdrant_parity_check.py --all
# 2. If parity passes, delete legacy collections
# WARNING: This is irreversible
# python3 -c "
# from qdrant_client import QdrantClient
# client = QdrantClient(host='localhost', port=6333)
# legacy = ['helion_docs', 'nutra_messages', ...] # list all legacy
# for col in legacy:
# client.delete_collection(col)
# print(f'Deleted: {col}')
# "
```
---
## Rollback Procedure
If issues arise after cutover:
```bash
# 1. Re-enable dual-read from legacy
export DUAL_READ_OLD=true
export DUAL_WRITE_OLD=true # if needed
# 2. Restart service
docker compose restart memory-service
# 3. Investigate issues
# - Check migration logs
# - Check parity results
# - Review error messages
# 4. If canonical data is corrupted, switch to legacy-only mode:
# (requires code change to bypass canonical reads)
```
---
## Files Reference
| File | Purpose |
|------|---------|
| `services/memory/qdrant/` | Canonical Qdrant module |
| `docs/memory/canonical_collections.yaml` | Collection config |
| `docs/memory/cm_payload_v1.md` | Payload schema docs |
| `scripts/qdrant_migrate_to_canonical.py` | Migration tool |
| `scripts/qdrant_parity_check.py` | Parity verification |
| `scripts/qdrant_smoke_test.py` | Security smoke test |
---
## Sign-off
- [ ] Dry run passed
- [ ] Migration completed
- [ ] Parity check passed
- [ ] Dual-read validation passed
- [ ] Cutover completed
- [ ] Post-cutover health verified
- [ ] Rollback window started (7 days)
- [ ] Legacy collections deleted (after rollback window)

View File

@@ -0,0 +1,142 @@
# Canonical Qdrant Collections Configuration
# Version: 1.0
# Last Updated: 2026-01-26
# Default embedding configuration
embedding:
text:
model: "cohere-embed-multilingual-v3"
dim: 1024
metric: "cosine"
code:
model: "openai-text-embedding-3-small"
dim: 1536
metric: "cosine"
# Canonical collections
collections:
text:
name: "cm_text_1024_v1"
dim: 1024
metric: "cosine"
description: "Main text embeddings collection"
payload_indexes:
- field: "tenant_id"
type: "keyword"
- field: "team_id"
type: "keyword"
- field: "agent_id"
type: "keyword"
- field: "scope"
type: "keyword"
- field: "visibility"
type: "keyword"
- field: "indexed"
type: "bool"
- field: "source_id"
type: "keyword"
- field: "tags"
type: "keyword"
- field: "created_at"
type: "datetime"
# Tenant configuration
tenants:
daarion:
id: "t_daarion"
default_team: "team_core"
# Team configuration
teams:
core:
id: "team_core"
tenant_id: "t_daarion"
# Agent slug mapping (legacy name -> canonical slug)
agent_slugs:
helion: "agt_helion"
Helion: "agt_helion"
HELION: "agt_helion"
nutra: "agt_nutra"
Nutra: "agt_nutra"
NUTRA: "agt_nutra"
druid: "agt_druid"
Druid: "agt_druid"
DRUID: "agt_druid"
greenfood: "agt_greenfood"
Greenfood: "agt_greenfood"
GREENFOOD: "agt_greenfood"
agromatrix: "agt_agromatrix"
AgroMatrix: "agt_agromatrix"
AGROMATRIX: "agt_agromatrix"
daarwizz: "agt_daarwizz"
Daarwizz: "agt_daarwizz"
DAARWIZZ: "agt_daarwizz"
alateya: "agt_alateya"
Alateya: "agt_alateya"
ALATEYA: "agt_alateya"
# Legacy collection mapping rules
legacy_collection_mapping:
# Pattern: collection_name_regex -> (agent_slug_group, scope, tags)
patterns:
- regex: "^([a-z]+)_docs$"
agent_group: 1
scope: "docs"
tags: []
- regex: "^([a-z]+)_messages$"
agent_group: 1
scope: "messages"
tags: []
- regex: "^([a-z]+)_memory_items$"
agent_group: 1
scope: "memory"
tags: []
- regex: "^([a-z]+)_artifacts$"
agent_group: 1
scope: "artifacts"
tags: []
- regex: "^druid_legal_kb$"
agent_group: null
agent_id: "agt_druid"
scope: "docs"
tags: ["legal_kb"]
- regex: "^nutra_food_knowledge$"
agent_group: null
agent_id: "agt_nutra"
scope: "docs"
tags: ["food_kb"]
- regex: "^memories$"
agent_group: null
scope: "memory"
tags: []
- regex: "^messages$"
agent_group: null
scope: "messages"
tags: []
# Feature flags for migration
feature_flags:
dual_write_enabled: false
dual_read_enabled: false
canonical_write_only: false
legacy_read_fallback: true
# Defaults for migration
migration_defaults:
visibility: "confidential"
owner_kind: "agent"
indexed: true

View File

@@ -0,0 +1,216 @@
# Co-Memory Payload Schema v1 (cm_payload_v1)
**Version:** 1.0
**Status:** Canonical
**Last Updated:** 2026-01-26
## Overview
This document defines the canonical payload schema for all vectors stored in Qdrant across the DAARION platform. The schema enables:
- **Unlimited agents** without creating new collections
- **Fine-grained access control** via payload filters
- **Multi-tenant isolation** via tenant_id
- **Consistent querying** across all memory types
## Design Principles
1. **One collection = one embedding space** (same dim + metric)
2. **No per-agent collections** - agents identified by `agent_id` field
3. **Access control via payload** - visibility + ACL fields
4. **Stable identifiers** - ULIDs for all entities
---
## Collection Naming Convention
```
cm_<type>_<dim>_v<version>
```
Examples:
- `cm_text_1024_v1` - text embeddings, 1024 dimensions
- `cm_code_768_v1` - code embeddings, 768 dimensions
- `cm_mm_512_v1` - multimodal embeddings, 512 dimensions
---
## Payload Schema
### Required Fields (MVP)
| Field | Type | Description | Example |
|-------|------|-------------|---------|
| `schema_version` | string | Always `"cm_payload_v1"` | `"cm_payload_v1"` |
| `tenant_id` | string | Tenant identifier | `"t_daarion"` |
| `team_id` | string | Team identifier (nullable) | `"team_core"` |
| `project_id` | string | Project identifier (nullable) | `"proj_helion"` |
| `agent_id` | string | Agent identifier (nullable) | `"agt_helion"` |
| `owner_kind` | enum | Owner type | `"agent"` / `"team"` / `"user"` |
| `owner_id` | string | Owner identifier | `"agt_helion"` |
| `scope` | enum | Content type | `"docs"` / `"messages"` / `"memory"` / `"artifacts"` / `"signals"` |
| `visibility` | enum | Access level | `"public"` / `"confidential"` / `"private"` |
| `indexed` | boolean | Searchable by AI | `true` |
| `source_kind` | enum | Source type | `"document"` / `"wiki"` / `"message"` / `"artifact"` / `"web"` / `"code"` |
| `source_id` | string | Source identifier | `"doc_01HQ..."` |
| `chunk.chunk_id` | string | Chunk identifier | `"chk_01HQ..."` |
| `chunk.chunk_idx` | integer | Chunk index in source | `0` |
| `fingerprint` | string | Content hash (SHA256) | `"a1b2c3..."` |
| `created_at` | string | ISO 8601 timestamp | `"2026-01-26T12:00:00Z"` |
### Optional Fields (Recommended)
| Field | Type | Description | Example |
|-------|------|-------------|---------|
| `acl.read_team_ids` | array[string] | Teams with read access | `["team_core"]` |
| `acl.read_agent_ids` | array[string] | Agents with read access | `["agt_nutra"]` |
| `acl.read_role_ids` | array[string] | Roles with read access | `["role_admin"]` |
| `tags` | array[string] | Content tags | `["legal_kb", "contracts"]` |
| `lang` | string | Language code | `"uk"` / `"en"` |
| `importance` | float | Importance score 0-1 | `0.8` |
| `ttl_days` | integer | Auto-delete after N days | `365` |
| `embedding.model` | string | Embedding model ID | `"cohere-embed-v3"` |
| `embedding.dim` | integer | Vector dimension | `1024` |
| `embedding.metric` | string | Distance metric | `"cosine"` |
| `updated_at` | string | Last update timestamp | `"2026-01-26T12:00:00Z"` |
---
## Identifier Formats
| Entity | Prefix | Format | Example |
|--------|--------|--------|---------|
| Tenant | `t_` | `t_<slug>` | `t_daarion` |
| Team | `team_` | `team_<slug>` | `team_core` |
| Project | `proj_` | `proj_<slug>` | `proj_helion` |
| Agent | `agt_` | `agt_<slug>` | `agt_helion` |
| Document | `doc_` | `doc_<ulid>` | `doc_01HQXYZ...` |
| Message | `msg_` | `msg_<ulid>` | `msg_01HQXYZ...` |
| Artifact | `art_` | `art_<ulid>` | `art_01HQXYZ...` |
| Chunk | `chk_` | `chk_<ulid>` | `chk_01HQXYZ...` |
---
## Scope Enum
| Value | Description | Typical Sources |
|-------|-------------|-----------------|
| `docs` | Documents, knowledge bases | PDF, Google Docs, Wiki |
| `messages` | Conversations | Telegram, Slack, Email |
| `memory` | Agent memory items | Session notes, learned facts |
| `artifacts` | Generated content | Reports, presentations |
| `signals` | Events, notifications | System events |
---
## Visibility Enum
| Value | Access Rule |
|-------|-------------|
| `public` | Anyone in tenant/team can read |
| `confidential` | Owner + ACL-granted readers |
| `private` | Only owner can read |
---
## Access Control Rules
### Private Content
```python
visibility == "private" AND owner_kind == request.owner_kind AND owner_id == request.owner_id
```
### Confidential Content
```python
visibility == "confidential" AND (
(owner_kind == request.owner_kind AND owner_id == request.owner_id) OR
request.agent_id IN acl.read_agent_ids OR
request.team_id IN acl.read_team_ids OR
request.role_id IN acl.read_role_ids
)
```
### Public Content
```python
visibility == "public" AND team_id == request.team_id
```
---
## Migration Mapping (Legacy Collections)
| Old Collection Pattern | New Payload |
|------------------------|-------------|
| `helion_docs` | `agent_id="agt_helion"`, `scope="docs"` |
| `nutra_messages` | `agent_id="agt_nutra"`, `scope="messages"` |
| `druid_legal_kb` | `agent_id="agt_druid"`, `scope="docs"`, `tags=["legal_kb"]` |
| `nutra_food_knowledge` | `agent_id="agt_nutra"`, `scope="docs"`, `tags=["food_kb"]` |
| `*_memory_items` | `scope="memory"` |
| `*_artifacts` | `scope="artifacts"` |
---
## Example Payloads
### Document Chunk (Helion Knowledge Base)
```json
{
"schema_version": "cm_payload_v1",
"tenant_id": "t_daarion",
"team_id": "team_core",
"project_id": "proj_helion",
"agent_id": "agt_helion",
"owner_kind": "agent",
"owner_id": "agt_helion",
"scope": "docs",
"visibility": "confidential",
"indexed": true,
"source_kind": "document",
"source_id": "doc_01HQ8K9X2NPQR3FGJKLM5678",
"chunk": {
"chunk_id": "chk_01HQ8K9X3MPQR3FGJKLM9012",
"chunk_idx": 0
},
"fingerprint": "sha256:a1b2c3d4e5f6...",
"created_at": "2026-01-26T12:00:00Z",
"tags": ["product", "features"],
"lang": "uk",
"embedding": {
"model": "cohere-embed-multilingual-v3",
"dim": 1024,
"metric": "cosine"
}
}
```
### Message (Telegram Conversation)
```json
{
"schema_version": "cm_payload_v1",
"tenant_id": "t_daarion",
"team_id": "team_core",
"agent_id": "agt_helion",
"owner_kind": "user",
"owner_id": "user_tg_123456",
"scope": "messages",
"visibility": "private",
"indexed": true,
"source_kind": "message",
"source_id": "msg_01HQ8K9X4NPQR3FGJKLM3456",
"chunk": {
"chunk_id": "chk_01HQ8K9X5MPQR3FGJKLM7890",
"chunk_idx": 0
},
"fingerprint": "sha256:b2c3d4e5f6g7...",
"created_at": "2026-01-26T12:05:00Z",
"channel_id": "tg_chat_789"
}
```
---
## Changelog
- **v1.0** (2026-01-26): Initial canonical schema

View File

@@ -0,0 +1,178 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://daarion.city/schemas/cm_payload_v1.json",
"title": "Co-Memory Payload Schema v1",
"description": "Canonical payload schema for Qdrant vectors in DAARION platform",
"type": "object",
"required": [
"schema_version",
"tenant_id",
"owner_kind",
"owner_id",
"scope",
"visibility",
"indexed",
"source_kind",
"source_id",
"chunk",
"fingerprint",
"created_at"
],
"properties": {
"schema_version": {
"type": "string",
"const": "cm_payload_v1",
"description": "Schema version identifier"
},
"tenant_id": {
"type": "string",
"pattern": "^t_[a-z0-9_]+$",
"description": "Tenant identifier (t_<slug>)"
},
"team_id": {
"type": ["string", "null"],
"pattern": "^team_[a-z0-9_]+$",
"description": "Team identifier (team_<slug>)"
},
"project_id": {
"type": ["string", "null"],
"pattern": "^proj_[a-z0-9_]+$",
"description": "Project identifier (proj_<slug>)"
},
"agent_id": {
"type": ["string", "null"],
"pattern": "^agt_[a-z0-9_]+$",
"description": "Agent identifier (agt_<slug>)"
},
"owner_kind": {
"type": "string",
"enum": ["user", "team", "agent"],
"description": "Type of owner"
},
"owner_id": {
"type": "string",
"minLength": 1,
"description": "Owner identifier"
},
"scope": {
"type": "string",
"enum": ["docs", "messages", "memory", "artifacts", "signals"],
"description": "Content type/scope"
},
"visibility": {
"type": "string",
"enum": ["public", "confidential", "private"],
"description": "Access visibility level"
},
"indexed": {
"type": "boolean",
"description": "Whether content is searchable by AI"
},
"source_kind": {
"type": "string",
"enum": ["document", "wiki", "message", "artifact", "web", "code"],
"description": "Type of source content"
},
"source_id": {
"type": "string",
"pattern": "^(doc|msg|art|web|code)_[A-Za-z0-9]+$",
"description": "Source identifier with type prefix"
},
"chunk": {
"type": "object",
"required": ["chunk_id", "chunk_idx"],
"properties": {
"chunk_id": {
"type": "string",
"pattern": "^chk_[A-Za-z0-9]+$",
"description": "Chunk identifier"
},
"chunk_idx": {
"type": "integer",
"minimum": 0,
"description": "Chunk index within source"
}
}
},
"fingerprint": {
"type": "string",
"minLength": 1,
"description": "Content hash for deduplication"
},
"created_at": {
"type": "string",
"format": "date-time",
"description": "Creation timestamp (ISO 8601)"
},
"updated_at": {
"type": ["string", "null"],
"format": "date-time",
"description": "Last update timestamp (ISO 8601)"
},
"acl": {
"type": "object",
"properties": {
"read_team_ids": {
"type": "array",
"items": {"type": "string"},
"description": "Teams with read access"
},
"read_agent_ids": {
"type": "array",
"items": {"type": "string"},
"description": "Agents with read access"
},
"read_role_ids": {
"type": "array",
"items": {"type": "string"},
"description": "Roles with read access"
}
}
},
"tags": {
"type": "array",
"items": {"type": "string"},
"description": "Content tags for filtering"
},
"lang": {
"type": ["string", "null"],
"pattern": "^[a-z]{2}(-[A-Z]{2})?$",
"description": "Language code (ISO 639-1)"
},
"importance": {
"type": ["number", "null"],
"minimum": 0,
"maximum": 1,
"description": "Importance score (0-1)"
},
"ttl_days": {
"type": ["integer", "null"],
"minimum": 1,
"description": "Auto-delete after N days"
},
"channel_id": {
"type": ["string", "null"],
"description": "Channel/chat identifier for messages"
},
"embedding": {
"type": "object",
"properties": {
"model": {
"type": "string",
"description": "Embedding model identifier"
},
"dim": {
"type": "integer",
"minimum": 1,
"description": "Vector dimension"
},
"metric": {
"type": "string",
"enum": ["cosine", "dot", "euclidean"],
"description": "Distance metric"
}
}
}
},
"additionalProperties": true
}