Orchestrator unhealthy
Impact: no coordinated writes or retrieval fanout.
Action: inspect logs and restart orchestrator service.
docker compose logs -f contextlattice-orchestrator
docker compose up -d contextlattice-orchestrator
Fanout backlog rising
Impact: partial sink coverage and delayed consistency.
Action: check fanout telemetry and Letta auto-prune state; trigger a prune pass and tune LETTA_AUTO_PRUNE_* thresholds if needed. If backlog is dominated by root JSON churn, set LETTA_LOW_VALUE_ROOT_JSON_PREFIXES so those files are excluded from Letta fanout.
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/fanout | jq
curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" \
"http://127.0.0.1:8075/telemetry/fanout/letta/auto-prune/run?force=false" | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/ops/capabilities | jq
Disk pressure
Impact: service resets or degraded latency under retention lag.
Action: run retention now, run chunked memory-bank low-value cleanup (safe batches), and verify qdrant/mongo volume paths.
curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/retention/run | jq
curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" \
"http://127.0.0.1:8075/telemetry/memory/cleanup-low-value/chunked?dry_run=true&project_batch=10&per_project_limit=250" | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/retention | jq
MindsDB or Letta drift
Impact: weaker full-mode retrieval coverage.
Action: restart service and trigger your rehydrate workflow once fanout load is stable.
docker compose up -d mindsdb mindsdb-http-proxy letta
Read timeouts on deep retrieval
Impact: caller reports timeout even though staged retrieval and async warm may still be in progress.
Action: keep staged fetch enabled, set caller timeout by mode, and use deep async partial responses (job_id + poll_url/events_url) instead of blocking slow-source reads.
# .env (or caller env)
CONTEXTLATTICE_READ_TIMEOUT_SECS=75
# alias supported:
# CONTEXTLATTICE_READ_TIMEOUT_SECS=75
# retrieval staged fetch should stay enabled
ORCH_RETRIEVAL_ENABLE_STAGED_FETCH=true
# hard split for balanced/fast reads
ORCH_RETRIEVAL_SYNC_ASYNC_SPLIT_ENABLED=true
# prevent sync slow-source blocking in non-deep modes unless caller explicitly sets sources
ORCH_RETRIEVAL_SYNC_SLOW_REQUIRES_EXPLICIT=true
# do not block deep mode on slow sources; return partial + job_id
ORCH_RETRIEVAL_SYNC_ASYNC_DEEP_BLOCKING=false
ORCH_RECALL_DEEP_ASYNC_DEFAULT_FOR_DEEP=true
ORCH_RECALL_DEEP_ASYNC_PARTIAL_ENABLED=true
ORCH_RECALL_DEEP_ASYNC_PARTIAL_MODE=fast
ORCH_RECALL_DEEP_ASYNC_PERSIST_ENABLED=true
ORCH_RECALL_DEEP_ASYNC_STORE_BACKEND=mongo
ORCH_RETRIEVAL_MONGO_RAW_DEEP_SYNC_ONLY_FOR_RAW_INTENT=true
# mode-level recall budgets
ORCH_RECALL_E2E_BUDGET_FAST_SECS=25
ORCH_RECALL_E2E_BUDGET_BALANCED_SECS=60
ORCH_RECALL_E2E_BUDGET_DEEP_SECS=75
# adaptive slow-source circuit and backlog gating
ORCH_RETRIEVAL_SLOW_SOURCE_CIRCUIT_SKIP_ENABLED=true
ORCH_RETRIEVAL_BACKLOG_GATING_ENABLED=true
# let blocked letta sources continue async warm without sync read blocking
ORCH_RETRIEVAL_BACKLOG_GATING_LETTA_ASYNC_WARM_ENABLED=true
Saved recall gate failing
Impact: release gates fail even though live retrieval is healthy because saved cases are stale or unscoped.
Action: refresh saved cases from hot pathways, then rerun the saved gate.
curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" \
-H "content-type: application/json" \
-d '{"max_cases":5,"min_hits":2,"run_evaluation":true}' \
http://127.0.0.1:8075/memory/recall/eval-cases/refresh | jq
curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" \
-H "content-type: application/json" \
-d '{}' \
http://127.0.0.1:8075/memory/recall/evaluate/saved | jq
Messaging bridge not posting
Impact: Telegram/Slack/OpenClaw commands do not write or recall memory.
Action: verify webhook env vars and smoke test the command endpoint directly.
curl -fsS -H 'content-type: application/json' \
-H "x-api-key: ${ORCH_KEY}" \
-d '{"channel":"openclaw","source_id":"chat-1","text":"@ContextLattice status"}' \
http://127.0.0.1:8075/integrations/messaging/command | jq
Optional Qdrant cloud BYO fails
Advanced
Optional cloud connectivity probe
Impact: cloud fallback or cloud-preferred mode cannot connect.
Action: validate endpoint + key, then run the cloud connectivity probe.
gmake qdrant-cloud-check