Troubleshooting Guide | Context Lattice

Fast checks

First 60-second diagnostics

docker compose ps
ORCH_KEY="$(awk -F= '/^MEMMCP_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"
curl -fsS http://127.0.0.1:8075/health | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/status | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/fanout | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/retention | jq

401 almost always means missing/incorrect x-api-key. Re-read key from .env and restart the caller process if needed.

If /health fails, check orchestrator logs first because write/read fanout depends on it.

Installation failures

Common startup blockers

Compose env not loaded

Symptom: services boot with missing env vars or wrong profile behavior.

Fix: recreate the symlink and relaunch.

ln -svf ../../.env infra/compose/.env
gmake mem

Port conflicts

Symptom: containers exit immediately with bind errors.

Fix: stop conflicting processes, then restart target services.

docker compose down --remove-orphans
docker compose up -d --build

Memory MCP EPIPE

Symptom: memory gateway crashes after initialize.

Fix: rebuild and restart memorymcp-http image.

docker compose build memorymcp-http
docker compose up -d memorymcp-http

Service troubleshooting

Read/write path recovery

Orchestrator unhealthy

Impact: no coordinated writes or retrieval fanout.

Action: inspect logs and restart orchestrator service.

docker compose logs -f memmcp-orchestrator
docker compose up -d memmcp-orchestrator

Fanout backlog rising

Impact: partial sink coverage and delayed consistency.

Action: check fanout telemetry and queue settings; retry workers recover asynchronously.

curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/fanout | jq

Disk pressure

Impact: service resets or degraded latency under retention lag.

Action: run retention now and verify qdrant/mongo volume paths.

curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/retention/run | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/retention | jq

MindsDB or Letta drift

Impact: weaker full-mode retrieval coverage.

Action: restart service and trigger your rehydrate workflow once fanout load is stable.

docker compose up -d mindsdb mindsdb-http-proxy letta

Messaging bridge not posting

Impact: Telegram/Slack/OpenClaw commands do not write or recall memory.

Action: verify webhook env vars and smoke test the command endpoint directly.

curl -fsS -H 'content-type: application/json' \
  -H "x-api-key: ${ORCH_KEY}" \
  -d '{"channel":"openclaw","source_id":"chat-1","text":"@ContextLattice status"}' \
  http://127.0.0.1:8075/integrations/messaging/command | jq

Optional Qdrant cloud BYO fails

Impact: cloud fallback or cloud-preferred mode cannot connect.

Action: validate endpoint + key, then run the cloud connectivity probe.

gmake qdrant-cloud-check

Mode-specific recovery

Lite and Full restart playbooks

Lite mode recovery

Use this when core services are enough and you need fast recovery with lower resource use.

gmake mem-down-lite
gmake mem-up-lite
gmake mem-ps-lite
curl -fsS http://127.0.0.1:8075/health | jq

Full mode recovery

Use this when retrieval requires full analytics/RAG services and complete fanout coverage.

gmake mem-mode-full
gmake mem
gmake mem-ps
gmake mem-logs

Readiness Gate

Check or cancel scheduled 04:30 MT long-run gate

gmake launch-readiness-gate-schedule-status
gmake launch-readiness-gate-schedule-cancel