Memory quality + cost control
Replace repetitive long-context stuffing with reusable memory retrieval and sink durability.
ContextLattice gives agents reusable memory with HTTP MCP, federated retrieval, durable fanout, and storage guardrails so quality and costs remain stable under load.
Private-by-default memory & context layer for agents.
Different data backends are fused in one retrieval pipeline to improve recall coverage, precision, and resilience.
Less technical users: DMG/Linux bundle/MSI. Technical/dev users: repo + ZIP remain the default.
Clone: git clone git@github.com:sheawinkler/ContextLattice.git
Replace repetitive long-context stuffing with reusable memory retrieval and sink durability.
Ships with Qwen by default and can plug into your preferred stack via Ollama, LLVM/llama.cpp, LM Studio, and compatible local gateways.
Write path is tuned for ~100 messages/sec with queue backpressure, fanout coalescing, admission control, retry workers, and retention sweeps to keep sinks stable under burst load.
Use gmake quickstart for first-run setup, secure bootstrap, and health verification.
gmake quickstart
ORCH_KEY="$(awk -F= '/^CONTEXTLATTICE_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"
curl -fsS http://127.0.0.1:8075/health | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/status | jq '.service,.sinks'
gmake monitor-open
# CLI-only checks:
gmake monitor-check
Dashboard URL: http://127.0.0.1:3000 (default local).
Secure mode is on by default. Keep using /health without auth, and include x-api-key for protected endpoints like /status, /memory/*, and /telemetry/*.
v3.3.2gateway-go on :8075:18075shodh_spike2-4 vCPU / 4-8 GB RAM / 20-50 GB SSD, Lite 2-4 vCPU / 8-12 GB RAM / 25-80 GB SSD, Full 6-8 vCPU / 12-20 GB RAM / 100-180 GB SSD (no spike-lab):8075 go gateway contractshodh_spike with deterministic fallback chain and optional hedgeThe orchestrator uses a learning schema from feedback signals to rerank results and improve retrieval precision over time. Fast staged reads prioritize topic rollups, Qdrant, and postgres-pgvector, while deep continuation incorporates MindsDB, Mongo raw, Letta, and memory-bank.
Read the detailed rollout notes on the Updates page and the execution plan on the V3 Roadmap.
Every write enters through the orchestrator, which records durable raw data, fans out to specialized stores, and continuously protects queue and storage health. Every search comes back through the same orchestrator so results can be fused, reranked, and improved over time from feedback.
Single orchestrator spine, explicit method stages, and parallel fanout branches to all write sinks.
Federated sources converge to the orchestrator spine, then reranked results return with learning feedback.
Benefit: one control plane for writes, retrieval, and policy.
Why: central coordination is what allows multi-source ranking and learning to compound.
Benefit: compact, high-signal summaries for fast staged recall.
Why: reduces deep-read pressure while preserving source grounding for follow-up dives.
Benefit: high-speed semantic recall.
Why: vector retrieval gives broad relevance quickly before deeper reranking.
Benefit: low-latency vector + lexical retrieval in one lane.
Why: improves fast-read hit rate and stabilizes p95 latency under mixed workloads.
Benefit: durable source-of-truth write ledger.
Why: protects recoverability and enables replay/rehydrate workflows.
Benefit: richer long-horizon recall when fast lanes need deeper evidence.
Why: async continuation improves completeness without blocking fast user responses.
Benefit: resilient async delivery with retries, coalescing, and admission control.
Why: prevents sink instability from breaking ingestion reliability.
Benefit: bounded storage growth and observable runtime behavior.
Why: operational stability is required for learning retrieval to stay trustworthy.
The orchestrator's learning schema can only improve ranking if retrieval sources stay healthy, durable, and synchronized. This architecture makes that possible: topic rollups + Qdrant + postgres-pgvector provide fast candidates, deep continuation adds MindsDB/Letta/memory-bank evidence, Mongo guarantees recovery, and guardrails keep the full loop from collapsing under pressure.
Single-container lane focused on compatibility and low footprint.
v3.3.xBest for local development and constrained laptops where stable memory services matter more than deep analytics.
v3.3.xtopic_rollups + qdrant + postgres_pgvectorBest for high-write workloads and richer retrieval where learning loops use every sink, including RAG through Letta.
v3.3.x Full, and baseline for private v4 tuningmindsdb + mongo_raw + letta + memory_bank