Context LatticeBy Private Memory Corp

Private by default. Local-first intelligence by design.

Private memory layer for agents

Fix context drift before it burns your token budget.

ContextLattice gives agents reusable memory with HTTP MCP, federated retrieval, durable fanout, and storage guardrails so quality and costs remain stable under load.

Private-by-default memory & context layer for agents.

Different data backends are fused in one retrieval pipeline to improve recall coverage, precision, and resilience.

Less technical users: DMG/Linux bundle/MSI. Technical/dev users: repo + ZIP remain the default.

Clone: git clone git@github.com:sheawinkler/ContextLattice.git

What it solves

Memory quality + cost control

Replace repetitive long-context stuffing with reusable memory retrieval and sink durability.

HTTP-preferred MCP Federated retrieval 100 msgs/sec writes Durable outbox fanout HTTP/messaging app interfacing (claw-ready) Learning rerank Letta RAG support Qwen default model
How it runs

Qwen-first, any-model runtime

Ships with Qwen by default and can plug into your preferred stack via Ollama, LLVM/llama.cpp, LM Studio, and compatible local gateways.

Latest platform step

100 msgs/sec write stability

Write path is tuned for ~100 messages/sec with queue backpressure, fanout coalescing, admission control, retry workers, and retention sweeps to keep sinks stable under burst load.

Quickstart

One command to launch safely

Use gmake quickstart for first-run setup, secure bootstrap, and health verification.

gmake quickstart
60-second verify

Prove service + auth in two calls

ORCH_KEY="$(awk -F= '/^CONTEXTLATTICE_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"
curl -fsS http://127.0.0.1:8075/health | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/status | jq '.service,.sinks'
Easy Monitoring

One command opens dashboard + live health checks

gmake monitor-open
# CLI-only checks:
gmake monitor-check

Dashboard URL: http://127.0.0.1:3000 (default local).

Auth note

Getting 401 on local requests?

Secure mode is on by default. Keep using /health without auth, and include x-api-key for protected endpoints like /status, /memory/*, and /telemetry/*.

Version Lanes

What is public now vs private next

Public v3.3 (current launch lane)

  • Current public release: v3.3.2
  • Frontdoor: gateway-go on :8075
  • Fallback: Python orchestrator on :18075
  • Memory-bank default: shodh_spike
  • Retrieval behavior: staged fast-return + async slow continuation
  • Personal computer target: HF/Glama 2-4 vCPU / 4-8 GB RAM / 20-50 GB SSD, Lite 2-4 vCPU / 8-12 GB RAM / 25-80 GB SSD, Full 6-8 vCPU / 12-20 GB RAM / 100-180 GB SSD (no spike-lab)
  • Release posture: stable baseline for public operators

Private v4 (tuning lane)

  • Frontdoor: same :8075 go gateway contract
  • Policy: aggressive adaptive tuning and candidate promotions
  • Memory-bank: shodh_spike with deterministic fallback chain and optional hedge
  • Validation: benchmark + recall parity + soak gate before promotion
  • Personal computer target: start from Full baseline and add headroom, especially SSD (external NVMe recommended)
  • Release posture: private experimentation before any public cutover

Launch flow map

flowchart LR A[Agent or App] --> B["Public Lane: v3.3"] B --> C["Gateway-Go :8075"] C --> D["Fast sources now: topic_rollups + qdrant + postgres_pgvector"] C --> E["Slow async continuation: mindsdb + mongo_raw + letta + memory_bank"] E --> F["Cache warm + optional deep follow-up"] B --> G["Python fallback lane :18075 (rollback only)"] H["Private Lane: v4"] --> I["Same ingress contract + stricter tuning gates"] I --> J["Candidate promotions only after benchmark + recall + soak pass"]
Learning retrieval

Orchestrator gets better at memory recall over time

The orchestrator uses a learning schema from feedback signals to rerank results and improve retrieval precision over time. Fast staged reads prioritize topic rollups, Qdrant, and postgres-pgvector, while deep continuation incorporates MindsDB, Mongo raw, Letta, and memory-bank.

Read the detailed rollout notes on the Updates page and the execution plan on the V3 Roadmap.

How it all works together

Unified write + retrieval loop through the orchestrator

Every write enters through the orchestrator, which records durable raw data, fans out to specialized stores, and continuously protects queue and storage health. Every search comes back through the same orchestrator so results can be fused, reranked, and improved over time from feedback.

  • Write intake
  • Outbox fanout
  • Federated search
  • Learning rerank
  • Retention + guardrails
Service Map

Data Flow

Receive + Store

Write Flow

Client write request Orchestrator Validation Raw write Mongo raw ledger Fanout Memory Bank + topic_rollups Qdrant + postgres_pgvector MindsDB Letta Async sink updates telemetry + retries

Single orchestrator spine, explicit method stages, and parallel fanout branches to all write sinks.

Read + Return

Retrieval Flow

Client search request Orchestrator Federate Rerank Return Topic rollups Qdrant Postgres pgvector Mongo raw Deep lane sources Ranked response Feedback updates learning schema Context returned with source confidence

Federated sources converge to the orchestrator spine, then reranked results return with learning feedback.

Orchestrator

Benefit: one control plane for writes, retrieval, and policy.

Why: central coordination is what allows multi-source ranking and learning to compound.

Topic Rollups

Benefit: compact, high-signal summaries for fast staged recall.

Why: reduces deep-read pressure while preserving source grounding for follow-up dives.

Qdrant

Benefit: high-speed semantic recall.

Why: vector retrieval gives broad relevance quickly before deeper reranking.

Postgres + pgvector

Benefit: low-latency vector + lexical retrieval in one lane.

Why: improves fast-read hit rate and stabilizes p95 latency under mixed workloads.

Mongo Raw

Benefit: durable source-of-truth write ledger.

Why: protects recoverability and enables replay/rehydrate workflows.

Deep lane (MindsDB + Letta + memory-bank)

Benefit: richer long-horizon recall when fast lanes need deeper evidence.

Why: async continuation improves completeness without blocking fast user responses.

Fanout Outbox

Benefit: resilient async delivery with retries, coalescing, and admission control.

Why: prevents sink instability from breaking ingestion reliability.

Retention + Telemetry

Benefit: bounded storage growth and observable runtime behavior.

Why: operational stability is required for learning retrieval to stay trustworthy.

Why this boosts learning retrieval impact

Learning is strongest when memory is both rich and reliable

The orchestrator's learning schema can only improve ranking if retrieval sources stay healthy, durable, and synchronized. This architecture makes that possible: topic rollups + Qdrant + postgres-pgvector provide fast candidates, deep continuation adds MindsDB/Letta/memory-bank evidence, Mongo guarantees recovery, and guardrails keep the full loop from collapsing under pressure.

Flexible Launch

Deployment Modes

Hugging Face / Glama lite

Single-container lane focused on compatibility and low footprint.

  • App version lane: Public v3.3.x
  • Includes: gateway + orchestrator container with topic-rollup-first retrieval
  • Compute: 2-4 vCPU recommended
  • Memory: 4-8 GB RAM baseline
  • Storage: 20-50 GB SSD depending on retention settings

Lite mode

Best for local development and constrained laptops where stable memory services matter more than deep analytics.

  • App version lane: Public v3.3.x
  • Includes: Gateway-Go frontdoor, orchestrator core, Memory Bank MCP, Mongo raw, Qdrant, outbox fanout, retention workers
  • Fast staged retrieval: topic_rollups + qdrant + postgres_pgvector
  • Compute: 2-4 vCPU recommended
  • Memory: 8-12 GB RAM baseline
  • Storage: 25-80 GB SSD depending on write volume

Full mode

Best for high-write workloads and richer retrieval where learning loops use every sink, including RAG through Letta.

  • App version lane: Public v3.3.x Full, and baseline for private v4 tuning
  • Includes: Lite mode plus MindsDB analytics, Letta archival memory, observability stack, and full rehydrate tooling
  • Deep continuation lane: async enrichment from mindsdb + mongo_raw + letta + memory_bank
  • Compute: 6-8 vCPU recommended
  • Memory: 12-20 GB RAM baseline (without spike-lab)
  • Storage: 100-180 GB SSD depending on retention policy
  • Spike-lab active: 24-32 GB RAM and 180-300 GB SSD/NVMe