Local-first memory infrastructure for AI agents

Stop giving your agents amnesia and calling it workflow.

ContextLattice gives Codex, Cursor, Claude Desktop, Open WebUI, and custom MCP agents one durable place to remember decisions, evidence, skills, checkpoints, behavior, and project context, then compile that memory into the next sharp model prompt.

CLI + HTTP/MCP + compiled context packets + async recall progress + agent steering

Add memory to your agents Start Free on GitHub See Premium

CLI: start, search, compile Skills: searchable capability index Learning: feedback-aware retrieval Scale: graph + deep recall lanes

Download options + technical bundles

Download macOS DMG Download Linux Bundle Download Windows MSI App Surface Download main (ZIP)

Less technical users: DMG/Linux bundle/MSI. Technical/dev users: repo + ZIP remain the default.

Clone: git clone git@github.com:sheawinkler/ContextLattice.git

Agent quickstart

If an agent is reading this page, give it this prompt.

One instruction, one repo, one doctor pass. It tells the agent to install ContextLattice, wire only detected harnesses, and report proof instead of guessing.

Navigate to github.com/sheawinkler/contextlattice, install ContextLattice using the documented quickstart, run the doctor, then integrate only with agent harnesses that are actually installed on this machine. Configure hooks/agent files for the detected agents, verify ContextLattice recall/writeback works, and report exact commands plus any skipped integrations.

Open the platform map

Context Lattice platform overview architecture diagram — **Platform Overview:** ingress, context spine, memory fabric, evolution engine, event fabric, and agent coordination in one runtime map.

More docs & tools (7)

Two lanes

Simple enough to install. Deep enough to become your agent memory infrastructure.

For builders

Give every agent the same memory contract.

Plug Codex, Cursor, Claude Desktop, Open WebUI, Claude Code, and custom MCP agents into one local-first layer for writes, recall, compiled context packets, prompt-ready session summaries, and repeatable handoff.

CLI-first workflow: contextlattice_agent_start, contextlattice_search, and contextlattice_checkpoint.
Agent templates: copy-ready instructions for Codex, Claude Code, OpenCode, Hermes, OMP, Mercury, Pi, Droid, ChatGPT, and Claude.
Context compiler: turn durable memory, ranked evidence, risks, files, and checks into a sharp reference packet for the next model call.
Skills Index: discover quarantined capabilities without loading every skill into every agent context.

For operators

Keep memory useful when the work gets heavier.

Move beyond "save some chat logs" into durable write truth, staged retrieval, behavior provenance, learning feedback, and graph-aware recall that can grow with the team.

Durable fanout: write once, then route to rollups, vectors, ledgers, and deeper stores.
Learning loops: feedback and eval cases improve ranking instead of freezing recall quality in place.
Behavior provenance: preserve decisions, evidence, checkpoints, and agent/session context for audit-grade handoff.

Full suite

Not just memory. The operating layer around memory.

ContextLattice packages durable memory, retrieval policy, session rollups, prompt-ready context packets, skills discovery, CLI workflows, templates, learning, provenance, and deep-memory lanes behind one local contract.

Plain English

Your tools remember the same work.

Less replaying context. Less copy-pasting transcripts. Less "wait, what were we doing?" Your agents pick up decisions, evidence, and project state from the same shared memory spine.

Technical proof

Local-first, multi-lane, measurable.

Public local lite starts with topic rollups and Qdrant. Full/operator stacks can add pgvector, raw ledger, async continuation, memory-bank lanes, graph maintenance, and stronger reliability controls.

Quickstart

One command to launch safely

Use gmake quickstart for first-run setup, secure bootstrap, and health verification.

gmake quickstart

Agent crawlers and assistants should parse llms.txt first.

60-second verify

Prove service, auth, and agent memory

ORCH_KEY="$(awk -F= '/^CONTEXTLATTICE_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"
curl -fsS http://127.0.0.1:8075/health | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/status | jq '.service,.sinks'
scripts/agent/agent-runtime-proof-pack --pretty

Easy Monitoring

One command opens dashboard + live health checks

gmake monitor-open
# CLI-only checks:
gmake monitor-check

Dashboard URL: http://127.0.0.1:3000 (default local).

Auth note

Getting 401 on local requests?

Secure mode is on by default. Keep using /health without auth, and include x-api-key for protected endpoints like /status, /memory/*, and /telemetry/*.

Hosted default

Production host split for paid launch

Public marketing/docs: https://contextlattice.io
Paid app + dashboard + billing API: https://app.contextlattice.io
Billing infrastructure: managed through deployment-specific configuration.

Version Lanes

What is public now

Public v3.17 (current release train)

Current public release: v3.17.2
Primary interface: CLI workflows for install, context, resume, remember, finish, correct, and doctor; dashboard, HTTP, and MCP are companion surfaces
Frontdoor: gateway-go on :8075
Runtime boundary: the active application path is Go/Rust; legacy Python is archived and not launched.
Lite memory default: topic rollups, Qdrant recall, and memory edges/neighbors
Cognition core: Temporal Claim Graph, advisor-only Adaptive Retrieval Planner, and Proof-Carrying Synthesis v2 with explicit support, opposition, uncertainty, and missing proof
Learning core: calibration-eligible outcomes create one-step shadow/canary policy candidates; repeated verified workflows become independently tested, human-approved, inactive skill exports with explicit, non-destructive retirement
Graph intelligence: identity-first repair reconnects durable memory in bounded batches; explicit neighbor holdouts require positive hydrated graph lift without hiding a direct-recall regression
Portable context: signed, expiring Context Passports can be verified, diffed, replay-planned, imported, and encrypted to explicit project-scoped Context Mesh recipients without giving ContextLattice a delivery channel
Retrieval behavior: compact proof-carrying Agent Packets, intent-aware ranking, epistemic refusal, staged fast-return, monotonic async recall steering, transport-inclusive token economics, and automatic outcome telemetry
Personal computer target: HF/Glama 2-4 vCPU / 4-8 GB RAM / 20-50 GB SSD, Lite 2-4 vCPU / 8-12 GB RAM / 25-80 GB SSD, Full 6-8 vCPU / 12-20 GB RAM / 100-180 GB SSD (no spike-lab)
Release posture: stable agent operating layer for public operators

Optional public adapter lab

Mode: opt-in public local Lite advanced, started with gmake mem-up-lite-advanced
Purpose: evaluate memory-bank adapters without making them quickstart dependencies
Interface: same CLI-first agent contract; dashboard, HTTP, and MCP remain companion surfaces
Boundary: not a paid/private feature gate and not required for normal users
Validation: operator-controlled, evidence-first testing before any adapter is promoted into defaults

Launch flow map

flowchart LR A[Agent or App] --> B["Public Lane: v3.17"] B --> C["Gateway-Go :8075"] C --> D["Public default fast sources: topic_rollups + qdrant"] C --> E["Slow async continuation: mindsdb + mongo_raw + letta + memory_bank"] E --> F["Cache warm + optional deep follow-up"] B --> G["Strict Go/Rust ownership audit"] B --> H["Optional public adapter lab"] H --> I["Operator-controlled tests only"]

Learning retrieval

Orchestrator gets better at memory recall over time

The orchestrator uses a learning schema from feedback signals to rerank results and improve retrieval precision over time. Public local fast staged reads prioritize topic rollups and Qdrant, while full/operator continuation can also incorporate pgvector, MindsDB, Mongo raw, Letta, and memory-bank.

Read the detailed rollout notes on the Updates page and the execution plan on the V3 Roadmap.

How it all works together

Unified write + retrieval loop through the orchestrator

Every write enters through the orchestrator, which records durable raw data, fans out to specialized stores, and continuously protects queue and storage health. Every search comes back through the same orchestrator so results can be fused, reranked, and improved over time from feedback.

Write intake
Outbox fanout
Federated search
Learning rerank
Retention + guardrails

Service Map

Data Flow

Receive + Store

Write Flow

Single orchestrator spine, explicit method stages, and parallel fanout branches to all write sinks.

Read + Return

Retrieval Flow

Federated sources converge to the orchestrator spine, then reranked results return with learning feedback.

Orchestrator

Benefit: one control plane for writes, retrieval, and policy.

Why: central coordination is what allows multi-source ranking and learning to compound.

Topic Rollups

Benefit: compact, high-signal summaries for fast staged recall.

Why: reduces deep-read pressure while preserving source grounding for follow-up dives.

Qdrant

Benefit: first-class local vector engine.

Why: payload-heavy filtering, quantization, snapshots, and distributed vector deployments keep the lite and full vector lanes aligned.

Postgres + pgvector

Benefit: SQL-co-located vector retrieval for full/operator stacks.

Why: joins, relational backups, and Postgres-native operations remain valuable when users already run the SQL lane.

Mongo Raw

Benefit: durable source-of-truth write ledger.

Why: protects recoverability and enables replay/rehydrate workflows.

Deep lane (MindsDB + Letta + memory-bank)

Benefit: richer long-horizon recall when fast lanes need deeper evidence.

Why: async continuation improves completeness without blocking fast user responses.

Fanout Outbox

Benefit: resilient async delivery with retries, coalescing, and admission control.

Why: prevents sink instability from breaking ingestion reliability.

Retention + Telemetry

Benefit: bounded storage growth and observable runtime behavior.

Why: operational stability is required for learning retrieval to stay trustworthy.

Why this boosts learning retrieval impact

Learning is strongest when memory is both rich and reliable

The orchestrator's learning schema can only improve ranking if retrieval sources stay healthy, durable, and synchronized. This architecture makes that possible: topic rollups + Qdrant provide fast candidates, deep continuation adds MindsDB/Letta/memory-bank evidence, Mongo guarantees recovery, and guardrails keep the full loop from collapsing under pressure.

Flexible Launch

Deployment Modes

Hugging Face / Glama lite

Single-container lane focused on compatibility and low footprint.

App version lane: Public v3.17.x
Includes: gateway + orchestrator container with topic-rollup-first retrieval
Compute: 2-4 vCPU recommended
Memory: 4-8 GB RAM baseline
Storage: 20-50 GB SSD depending on retention settings

Lite mode

Best for local development and constrained laptops where stable memory services matter more than deep analytics.

App version lane: Public v3.17.x
Includes: Gateway-Go frontdoor, orchestrator core, Memory Bank MCP, Mongo raw, Qdrant, outbox fanout, retention workers
Fast staged retrieval: topic_rollups + qdrant by default; pgvector remains first-class for full/operator stacks
Compute: 2-4 vCPU recommended
Memory: 8-12 GB RAM baseline
Storage: 25-80 GB SSD depending on write volume

Full mode

Best for high-write workloads and richer retrieval where learning loops use every sink, including RAG through Letta.

App version lane: Public v3.17.x Full
Includes: Lite mode plus MindsDB analytics, Letta archival memory, observability stack, and full rehydrate tooling
Deep continuation lane: async enrichment from mindsdb + mongo_raw + letta + memory_bank
Compute: 6-8 vCPU recommended
Memory: 12-20 GB RAM baseline (without spike-lab)
Storage: 100-180 GB SSD depending on retention policy
Spike-lab active: 24-32 GB RAM and 180-300 GB SSD/NVMe