TASK_PHASE_NODE1_REPAIR: - Fix daarion-web SSR: use CITY_API_BASE_URL instead of 127.0.0.1 - Fix auth API routes: use AUTH_API_URL env var - Add wget to Dockerfiles for healthchecks (stt, ocr, web-search, swapper, vector-db, rag) - Update healthchecks to use wget instead of curl - Fix vector-db-service: update torch==2.4.0, sentence-transformers==2.6.1 - Fix rag-service: correct haystack imports for v2.x - Fix telegram-gateway: remove msg.ack() for non-JetStream NATS - Add /health endpoint to nginx mvp-routes.conf - Add room_role, is_public, sort_order columns to city_rooms migration - Add TASK_PHASE_NODE1_REPAIR.md and DEPLOY_NODE1_REPAIR.md docs Previous tasks included: - TASK 039-044: Orchestrator rooms, Matrix chat cleanup, CrewAI integration
15 KiB
TASK_PHASE_NODE1_REPAIR.md
Phase name
NODE1_REPAIR — bring NODE1 to a healthy, MVP-ready state.
Goal
- All core services on NODE1 are
runningandhealthyindocker ps. daarion-webserves working UI for:/microdao/daarion(orchestrator room view),/nodes/node-1(NODE1 status),/agents/...(agents/crew views).
- Telegram bot(s) can route a message through
telegram-gateway → dagi-router → LLMand return a response. https://gateway.daarion.city/healthreturns HTTP 200.- DB schema and code are aligned with the MVP product brief and room/orchestrator features (TASK 039–044).
Context (facts — do not "redefine" them in code)
NODE1 (144.76.224.179):
docker psshows multiple services asunhealthyorRestarting:daarion-web,dagi-router,dagi-stt-service,dagi-ocr-service,dagi-web-search-service,dagi-swapper-service,dagi-vector-db-service,dagi-rag-service.
- Git HEAD on server = TASK 038 (no TASK 039–044 applied).
daarion-web(Next.js) fails on SSR with:connect ECONNREFUSED 127.0.0.1:80- It tries to
fetch http://127.0.0.1:80/...
daarion-city-serviceis alive:curl http://localhost:7001/health→ healthy- But DB schema is missing new columns (e.g.
room_role,is_public,sort_order) for orchestrator rooms.
dagi-routerresponds:curl localhost:9102/health→ok- Docker healthcheck runs
python -c "import requests";requestsis not installed → container markedunhealthy.
- STT/OCR/WebSearch/Swapper:
- Healthchecks run
curlinside slim images withoutcurlinstalled → falseunhealthy.
- Healthchecks run
dagi-vector-db-service:- Keeps restarting with:
AttributeError: module 'torch.utils._pytree' has no attribute 'register_pytree_node'
- Torch version is incompatible with
sentence-transformers.
- Keeps restarting with:
dagi-rag-service:- Crashes with:
ModuleNotFoundError: No module named 'haystack'
- Crashes with:
telegram-gateway:- Logs
Temporary failure in name resolutionforhttp://router:9102/route- Real service name in Docker is
dagi-router, notrouter.
- Real service name in Docker is
- Logs
NotJSMessageErrorwhen callingmsg.ack()– ack is used on a non-JetStream subject.
- Logs
https://gateway.daarion.city/healthreturns 404 (SSL OK but no health endpoint).- Because
daarion-webisunhealthy, MVP UI for NODE1 (microDAO, nodes, agents) is effectively offline. - Product brief requires at least six core flows live for MVP:
- MicroDAO onboarding,
- Public channel for guests,
- MicroDAO chat,
- Follow-ups,
- Kanban tasks,
- Private agent.
Do NOT change these facts; change code/config to fix the system.
Scope
In scope
- Code and config changes in the main repo:
- Dockerfiles and
docker-compose.yml(and any overrides). daarion-webenv/SSR config.daarion-city-servicemigrations and DB schema updates.dagi-router, STT/OCR/WebSearch/Swapper healthchecks.dagi-vector-db-servicedependencies (Torch, sentence-transformers).dagi-rag-servicedependencies (Haystack).telegram-gatewayconfiguration and NATS usage.- Gateway
/healthendpoint (backend or nginx, depending on actual stack).
- Dockerfiles and
- Local verification (via
docker compose) + instructions for running on NODE1.
Out of scope
- New product features beyond MVP (no new flows).
- Large refactors of architecture.
- Switching to a different LLM stack or DB vendor.
Prerequisites
Before editing:
- Inspect repo structure to locate:
- Docker compose files (e.g.
docker-compose.yml,docker-compose.prod.yml). - Services:
daarion-web,daarion-city-service,dagi-router,dagi-stt-service,dagi-ocr-service,dagi-web-search-service,dagi-swapper-service,dagi-vector-db-service,dagi-rag-service,telegram-gateway,gateway(or equivalent).
- Migration tooling for
daarion-city-service(Alembic / Prisma / Drizzle / etc.). - Existing deploy scripts:
scripts/deploy-prod.sh,scripts/migrate-prod.sh(or equivalents).
- Docker compose files (e.g.
- Read:
01_product_brief_mvp.md— especially sections about microDAO, rooms, orchestrator, onboarding, follow-ups, Kanban, private agent.docs/DEPLOY_MIGRATIONS.mdor any deployment doc describing DB migrations.microdao — Data Model & Event Catalog(if present in repo/docs) to understand expected DB fields for rooms.
Tasks
1. Bring codebase up to TASK 039–044 (rooms / orchestrator) and align DB schema
1.1. Locate tasks 039–044 (look under docs/cursor/ / docs/tasks/ / similar).
- Identify what changes they describe:
- new fields for rooms (e.g.
room_role,is_public,sort_order), - any additional tables/relations required for orchestrator rooms and microDAO UI.
- new fields for rooms (e.g.
1.2. Implement DB/schema changes:
- Use existing migration framework for
daarion-city-service. - Create a new migration that:
- adds missing columns (e.g.
room_role,is_public,sort_order) to relevant tables (e.g.rooms), - adds any indices or constraints described in the docs,
- is idempotent and safe to apply on existing prod DB.
- adds missing columns (e.g.
- Ensure migration can run in both dev and prod environments.
1.3. Update daarion-city-service models/ORM to match the new schema.
- All API endpoints that return rooms/microDAO views must expose these fields (if required by frontend).
1.4. Ensure deploy pipeline uses these migrations:
- Confirm
scripts/migrate-prod.sh(or equivalent) calls the migration tool. - If not, update it so that running the script applies the new migration.
1.5. Add/update minimal tests:
- Unit/integration test for room creation / listing that uses the new fields.
- At least one test for the orchestrator room API.
2. Fix daarion-web API base URLs and SSR errors
2.1. Locate daarion-web config:
.env/.env.production/next.config.js/app/config.tsetc.
2.2. Define correct base URL for city-service:
For server-side calls:
CITY_API_BASE_URL=http://daarion-city-service:7001
For client-side calls (if needed):
NEXT_PUBLIC_CITY_API_BASE_URL=https://gateway.daarion.city/api
# or, for internal-only, http://daarion-city-service:7001
2.3. Update all fetch calls in daarion-web to use these env vars instead of hardcoded http://127.0.0.1:80.
- Search for
127.0.0.1,localhost, and update to useCITY_API_BASE_URL/NEXT_PUBLIC_CITY_API_BASE_URL. - Ensure Next.js server components and API routes read values from
process.env.
2.4. Local smoke test:
docker compose up -d daarion-city-service
docker compose up -d --build daarion-web
- Open
http://localhost:<WEB_PORT>/microdao/daarionand check there are no SSR 500 errors. - Check
/nodes/node-1and one of/agents/...pages.
3. Fix healthchecks for dagi-router and STT/OCR/WebSearch/Swapper
3.1. dagi-router healthcheck (Python requests)
3.1.1. Locate dagi-router Dockerfile and docker-compose service.
3.1.2. Replace healthcheck that uses python -c "import requests" with an HTTP healthcheck pointing at the service's /health endpoint.
Example docker-compose.yml snippet:
services:
dagi-router:
# ...
healthcheck:
test: ["CMD-SHELL", "wget -qO- http://localhost:9102/health || exit 1"]
interval: 10s
timeout: 3s
retries: 5
3.1.3. Ensure the image has wget (or curl):
RUN apt-get update && apt-get install -y --no-install-recommends wget \
&& rm -rf /var/lib/apt/lists/*
3.2. STT/OCR/WebSearch/Swapper healthchecks (curl)
3.2.1. For each of:
dagi-stt-service,dagi-ocr-service,dagi-web-search-service,dagi-swapper-service,
replace curl-based healthcheck with wget or an equivalent command that is available in the image, or add wget/curl to Dockerfile as above.
Example:
healthcheck:
test: ["CMD-SHELL", "wget -qO- http://localhost:<PORT>/health || exit 1"]
interval: 10s
timeout: 3s
retries: 5
3.2.2. Rebuild and run locally:
docker compose build dagi-router dagi-stt-service dagi-ocr-service dagi-web-search-service dagi-swapper-service
docker compose up -d dagi-router dagi-stt-service dagi-ocr-service dagi-web-search-service dagi-swapper-service
docker ps
- Verify
STATUSshowshealthyafter the healthcheck grace period.
4. Fix dagi-vector-db-service dependencies (Torch / sentence-transformers)
4.1. Locate Dockerfile / requirements for dagi-vector-db-service.
4.2. Update Python dependencies to a compatible set, e.g.:
RUN pip install --no-cache-dir "torch==2.4.0" "sentence-transformers==2.6.1"
(or another version pair that is known to work together).
4.3. Rebuild and run:
docker compose build dagi-vector-db-service
docker compose up -d dagi-vector-db-service
docker logs -f dagi-vector-db-service
- Ensure there is no
torch.utils._pytreeerror and service reaches "ready" state. - Add a simple
/healthendpoint test if not present.
5. Fix dagi-rag-service dependencies (Haystack)
5.1. Locate Dockerfile / requirements for dagi-rag-service.
5.2. Add Haystack dependency, for example:
RUN pip install --no-cache-dir "farm-haystack[all]==1.26.2"
(or the version used locally).
5.3. Rebuild and run:
docker compose build dagi-rag-service
docker compose up -d dagi-rag-service
docker logs -f dagi-rag-service
- Confirm
ModuleNotFoundError: No module named 'haystack'is gone. - Add/verify
/healthendpoint and healthcheck.
6. Fix Telegram gateway configuration and NATS usage
6.1. Router URL (DNS / service name)
6.1.1. Find telegram-gateway service in docker-compose.yml and its env/config.
6.1.2. Set correct router URL:
services:
telegram-gateway:
environment:
# ...
ROUTER_URL: http://dagi-router:9102
6.1.3. Alternatively, define network alias:
services:
dagi-router:
networks:
default:
aliases:
- router
and keep ROUTER_URL=http://router:9102.
6.2. Avoid NotJSMessageError (msg.ack on non-JetStream)
6.2.1. Locate the code where telegram-gateway subscribes to NATS and calls msg.ack().
6.2.2. If the subject is not part of a JetStream stream, remove msg.ack():
# Before
msg = await sub.__anext__()
# ... process ...
await msg.ack()
# After (simple NATS)
msg = await sub.__anext__()
# ... process ...
# no ack for core NATS
6.2.3. If you want JetStream in the future, add TODO comments and separate task; for this phase keep it simple and working.
6.2.4. Local smoke test:
- Start NATS,
dagi-router, andtelegram-gateway. - Simulate a message (if test tooling exists) and ensure no
NotJSMessageErrorappears.
7. Add /health endpoint for gateway.daarion.city
Depending on implementation:
7.1. If gateway is a backend service (Node/FastAPI/etc.)
7.1.1. Add minimal endpoint:
// Node/Express example
app.get('/health', (req, res) => {
res.status(200).json({ status: 'ok' });
});
or
# FastAPI example
@app.get("/health")
def health():
return {"status": "ok"}
7.1.2. Ensure this endpoint is mounted at the top level of the gateway service.
7.2. If gateway.daarion.city is served via nginx
7.2.1. Update nginx config (e.g. /etc/nginx/sites-available/gateway.conf) to include:
location /health {
return 200 'OK';
add_header Content-Type text/plain;
}
7.2.2. Reload nginx:
nginx -t && nginx -s reload
7.2.3. Local/container test:
curl -k https://gateway.daarion.city/healthshould return HTTP 200.
8. Deployment flow for NODE1 (instructions)
Agent should prepare / update deployment docs (e.g. docs/DEPLOY_NODE1_REPAIR.md) with:
8.1. Git update on NODE1:
cd /opt/microdao-daarion
git fetch
git checkout main # or production branch
git pull
8.2. Apply migrations:
./scripts/migrate-prod.sh # or documented migrations command
8.3. Rebuild and restart only relevant services:
docker compose build \
daarion-city-service \
daarion-web \
dagi-router \
dagi-stt-service \
dagi-ocr-service \
dagi-web-search-service \
dagi-swapper-service \
dagi-vector-db-service \
dagi-rag-service \
telegram-gateway \
gateway
docker compose up -d \
daarion-city-service \
daarion-web \
dagi-router \
dagi-stt-service \
dagi-ocr-service \
dagi-web-search-service \
dagi-swapper-service \
dagi-vector-db-service \
dagi-rag-service \
telegram-gateway \
gateway
8.4. Quick docker ps check:
- All listed services must be
Upandhealthyafter grace period.
Acceptance checklist
Task is done when all of the following are true:
-
Services/health
-
On NODE1,
docker psshows:daarion-web,daarion-city-service,dagi-router,dagi-stt-service,dagi-ocr-service,dagi-web-search-service,dagi-swapper-service,dagi-vector-db-service,dagi-rag-service,telegram-gateway,gatewayin stateUpandhealthy.
-
curl http://localhost:7001/health(city-service) → 200. -
curl http://localhost:9102/health(dagi-router) → 200. -
curl -k https://gateway.daarion.city/health→ 200.
-
-
DB & API
- DB schema contains required fields for rooms (e.g.
room_role,is_public,sort_order), matching Data Model & product brief. - Migration for these changes runs successfully on DEV and PROD.
- API endpoints that frontend uses for microDAO/rooms/orchestrator return the new fields (where specified in docs).
- DB schema contains required fields for rooms (e.g.
-
daarion-web UI
/microdao/daarionloads without SSR error and displays orchestrator/microDAO context./nodes/node-1loads and shows NODE1 data.- At least one
/agents/...page loads and shows crew/agents data. - No
ECONNREFUSED 127.0.0.1:80indaarion-weblogs.
-
Telegram routing
telegram-gatewayuses the correct router URL (http://dagi-router:9102or via aliasrouter).- No
Temporary failure in name resolutionorNotJSMessageErrorintelegram-gatewaylogs under normal operation. - Sending a message through the Telegram bot results in a valid LLM-based reply via
dagi-router.
-
Docs
- This task file
TASK_PHASE_NODE1_REPAIR.mdis saved underdocs/tasks/(or the project's task folder). - A short deploy how-to for NODE1 (from "git pull" to "docker compose up") is added/updated (e.g.
docs/DEPLOY_NODE1_REPAIR.md).
- This task file
Notes for agents
-
Prefer minimal, targeted changes over large refactors.
-
Reuse existing patterns from other services (Dockerfiles, healthchecks, migrations).
-
When in doubt which version of a library to pin (Torch, Haystack), check:
- existing working services in this repo,
- or the versions used in local/dev containers (if recorded in lockfiles).
-
Keep logs and errors in comments / commit messages to help future debugging.