Context LatticeBy Private Memory Corp
Guide 2

Troubleshooting Guide

Diagnose install and runtime issues quickly, including service-specific recovery steps for Lite and Full modes.

Quick Path

Fastest path to recovery

  1. Run 60-second diagnostics and confirm API key/auth path.
  2. Resolve startup blockers (env wiring, port binds, container crash loops).
  3. Check fanout and retention telemetry before deeper sink-specific actions.
  4. Confirm agent/tool read timeout matches retrieval mode (fast 25s, balanced 60s, deep 75s), then poll or stream deep async jobs for final completion.
  5. Use mode-specific restart playbook (Lite or Full).
Fast checks

First 60-second diagnostics

Required
docker compose ps
ORCH_KEY="$(awk -F= '/^CONTEXTLATTICE_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"
curl -fsS http://127.0.0.1:8075/health | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/status | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/fanout | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/fanout | jq '.lettaAutoPrune.state | {lastRunAt,lastDeleted,lastSkippedReason,lastError}'
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/retention | jq

401 almost always means missing/incorrect x-api-key. Re-read key from .env and restart the caller process if needed.

If /health fails, check orchestrator logs first because write/read fanout depends on it.

Installation failures

Common startup blockers

Required

Compose env not loaded

Symptom: services boot with missing env vars or wrong profile behavior.

Fix: recreate the symlink and relaunch.

ln -svf ../../.env infra/compose/.env
gmake mem

Port conflicts

Symptom: containers exit immediately with bind errors.

Fix: stop conflicting processes, then restart target services.

docker compose down --remove-orphans
docker compose up -d --build

Memory MCP EPIPE

Symptom: memory gateway crashes after initialize.

Fix: rebuild and restart memorymcp-http image.

docker compose build memorymcp-http
docker compose up -d memorymcp-http
Service troubleshooting

Read/write path recovery

Required

Orchestrator unhealthy

Impact: no coordinated writes or retrieval fanout.

Action: inspect logs and restart orchestrator service.

docker compose logs -f contextlattice-orchestrator
docker compose up -d contextlattice-orchestrator

Fanout backlog rising

Impact: partial sink coverage and delayed consistency.

Action: check fanout telemetry and Letta auto-prune state; trigger a prune pass and tune LETTA_AUTO_PRUNE_* thresholds if needed. If backlog is dominated by root JSON churn, set LETTA_LOW_VALUE_ROOT_JSON_PREFIXES so those files are excluded from Letta fanout.

curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/fanout | jq
curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" \
  "http://127.0.0.1:8075/telemetry/fanout/letta/auto-prune/run?force=false" | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/ops/capabilities | jq

Disk pressure

Impact: service resets or degraded latency under retention lag.

Action: run retention now, run chunked memory-bank low-value cleanup (safe batches), and verify qdrant/mongo volume paths.

curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/retention/run | jq
curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" \
  "http://127.0.0.1:8075/telemetry/memory/cleanup-low-value/chunked?dry_run=true&project_batch=10&per_project_limit=250" | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/retention | jq

MindsDB or Letta drift

Impact: weaker full-mode retrieval coverage.

Action: restart service and trigger your rehydrate workflow once fanout load is stable.

docker compose up -d mindsdb mindsdb-http-proxy letta

Read timeouts on deep retrieval

Impact: caller reports timeout even though staged retrieval and async warm may still be in progress.

Action: keep staged fetch enabled, set caller timeout by mode, and use deep async partial responses (job_id + poll_url/events_url) instead of blocking slow-source reads.

# .env (or caller env)
CONTEXTLATTICE_READ_TIMEOUT_SECS=75
# alias supported:
# CONTEXTLATTICE_READ_TIMEOUT_SECS=75

# retrieval staged fetch should stay enabled
ORCH_RETRIEVAL_ENABLE_STAGED_FETCH=true
# hard split for balanced/fast reads
ORCH_RETRIEVAL_SYNC_ASYNC_SPLIT_ENABLED=true
# prevent sync slow-source blocking in non-deep modes unless caller explicitly sets sources
ORCH_RETRIEVAL_SYNC_SLOW_REQUIRES_EXPLICIT=true
# do not block deep mode on slow sources; return partial + job_id
ORCH_RETRIEVAL_SYNC_ASYNC_DEEP_BLOCKING=false
ORCH_RECALL_DEEP_ASYNC_DEFAULT_FOR_DEEP=true
ORCH_RECALL_DEEP_ASYNC_PARTIAL_ENABLED=true
ORCH_RECALL_DEEP_ASYNC_PARTIAL_MODE=fast
ORCH_RECALL_DEEP_ASYNC_PERSIST_ENABLED=true
ORCH_RECALL_DEEP_ASYNC_STORE_BACKEND=mongo
ORCH_RETRIEVAL_MONGO_RAW_DEEP_SYNC_ONLY_FOR_RAW_INTENT=true
# mode-level recall budgets
ORCH_RECALL_E2E_BUDGET_FAST_SECS=25
ORCH_RECALL_E2E_BUDGET_BALANCED_SECS=60
ORCH_RECALL_E2E_BUDGET_DEEP_SECS=75
# adaptive slow-source circuit and backlog gating
ORCH_RETRIEVAL_SLOW_SOURCE_CIRCUIT_SKIP_ENABLED=true
ORCH_RETRIEVAL_BACKLOG_GATING_ENABLED=true
# let blocked letta sources continue async warm without sync read blocking
ORCH_RETRIEVAL_BACKLOG_GATING_LETTA_ASYNC_WARM_ENABLED=true

Saved recall gate failing

Impact: release gates fail even though live retrieval is healthy because saved cases are stale or unscoped.

Action: refresh saved cases from hot pathways, then rerun the saved gate.

curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" \
  -H "content-type: application/json" \
  -d '{"max_cases":5,"min_hits":2,"run_evaluation":true}' \
  http://127.0.0.1:8075/memory/recall/eval-cases/refresh | jq

curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" \
  -H "content-type: application/json" \
  -d '{}' \
  http://127.0.0.1:8075/memory/recall/evaluate/saved | jq

Messaging bridge not posting

Impact: Telegram/Slack/OpenClaw commands do not write or recall memory.

Action: verify webhook env vars and smoke test the command endpoint directly.

curl -fsS -H 'content-type: application/json' \
  -H "x-api-key: ${ORCH_KEY}" \
  -d '{"channel":"openclaw","source_id":"chat-1","text":"@ContextLattice status"}' \
  http://127.0.0.1:8075/integrations/messaging/command | jq

Optional Qdrant cloud BYO fails

Advanced
Optional cloud connectivity probe

Impact: cloud fallback or cloud-preferred mode cannot connect.

Action: validate endpoint + key, then run the cloud connectivity probe.

gmake qdrant-cloud-check
Mode-specific recovery

Lite and Full restart playbooks

Required

Lite mode recovery

Use this when core services are enough and you need fast recovery with lower resource use.

gmake mem-down-lite
gmake mem-up-lite
gmake mem-ps-lite
curl -fsS http://127.0.0.1:8075/health | jq

Full mode recovery

Use this when retrieval requires full analytics/RAG services and complete fanout coverage.

gmake mem-mode-full
gmake mem
gmake mem-ps
gmake mem-logs
Readiness Gate

Check or cancel scheduled 04:30 MT long-run gate

Advanced
Open launch-readiness schedule controls
gmake launch-readiness-gate-schedule-status
gmake launch-readiness-gate-schedule-cancel