Context Lattice | Private memory fabric for AI agents

What it solves

Memory quality + cost control

Replace repetitive long-context stuffing with reusable memory retrieval and sink durability.

HTTP-preferred MCP Federated retrieval 100 msgs/sec writes Durable outbox fanout Learning rerank Letta RAG support Qwen default model

How it runs

Qwen-first, any-model runtime

Ships with Qwen by default and can plug into your preferred stack via Ollama, LLVM/llama.cpp, LM Studio, and compatible local gateways.

Latest platform step

100 msgs/sec write stability

Write path is tuned for ~100 messages/sec with queue backpressure, fanout coalescing, admission control, retry workers, and retention sweeps to keep sinks stable under burst load.

Quickstart

One command to launch safely

Use gmake quickstart for first-run setup, secure bootstrap, and health verification.

gmake quickstart

60-second verify

Prove service + auth in two calls

ORCH_KEY="$(awk -F= '/^MEMMCP_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"
curl -fsS http://127.0.0.1:8075/health | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/status | jq '.service,.sinks'

Auth note

Getting 401 on local requests?

Secure mode is on by default. Keep using /health without auth, and include x-api-key for protected endpoints like /status, /memory/*, and /telemetry/*.

Learning retrieval

Orchestrator gets better at memory recall over time

The orchestrator uses a learning schema from feedback signals to rerank results and improve retrieval precision over time. This is reinforced by RAG through Letta archival memory, alongside Qdrant, Mongo raw, MindsDB, and memory-bank fallback.

Read the detailed rollout notes on the Updates page.

How it all works together

Unified write + retrieval loop through the orchestrator

Every write enters through the orchestrator, which records durable raw data, fans out to specialized stores, and continuously protects queue and storage health. Every search comes back through the same orchestrator so results can be fused, reranked, and improved over time from feedback.

Write intake
Outbox fanout
Federated search
Learning rerank
Retention + guardrails

Service Map

Data Flow

Receive + Store

Write Flow

Single orchestrator spine, explicit method stages, and parallel fanout branches to all write sinks.

Read + Return

Retrieval Flow

Federated sources converge to the orchestrator spine, then reranked results return with learning feedback.

Orchestrator

Benefit: one control plane for writes, retrieval, and policy.

Why: central coordination is what allows multi-source ranking and learning to compound.

Memory Bank MCP

Benefit: canonical project/file context store.

Why: keeps user-facing memory deterministic and compatible with MCP-native clients.

Qdrant

Benefit: high-speed semantic recall.

Why: vector retrieval gives broad relevance quickly before deeper reranking.

Mongo Raw

Benefit: durable source-of-truth write ledger.

Why: protects recoverability and enables repair/rehydrate workflows.

MindsDB

Benefit: SQL-friendly analytics and structured querying.

Why: complements semantic search with tabular and operational insight paths.

Letta (RAG memory)

Benefit: long-horizon archival context for agent reasoning.

Why: deep memory context improves difficult recall beyond nearest-neighbor hits.

Fanout Outbox

Benefit: resilient async delivery with retries, coalescing, and admission control.

Why: prevents sink instability from breaking ingestion reliability.

Retention + Telemetry

Benefit: bounded storage growth and observable runtime behavior.

Why: operational stability is required for learning retrieval to stay trustworthy.

Why this boosts learning retrieval impact

Learning is strongest when memory is both rich and reliable

The orchestrator's learning schema can only improve ranking if retrieval sources stay healthy, durable, and synchronized. This architecture makes that possible: Qdrant provides fast candidates, Letta supplies deeper RAG context, Mongo guarantees recovery, MindsDB adds structured recall, and guardrails keep the full loop from collapsing under pressure.

Flexible Launch

Deployment Modes

Lite mode

Best for local development and constrained laptops where stable memory services matter more than deep analytics.

Includes: Orchestrator, Memory Bank MCP, Mongo raw, Qdrant, outbox fanout, retention workers
Compute: 2-4 vCPU recommended
Memory: 4-8 GB RAM baseline
Storage: 25-60 GB SSD depending on write volume

Full mode

Best for high-write workloads and richer retrieval where learning loops use every sink, including RAG through Letta.

Includes: Lite mode plus MindsDB analytics, Letta archival memory, observability stack, and full rehydrate tooling
Compute: 6-8 vCPU recommended
Memory: 16-24 GB RAM baseline
Storage: 120-200 GB SSD depending on retention policy

Fix context drift before it burns your token budget.