2026-03-24 · v3.2.13 · Glama-lite SQLite Acceleration
Lite mode now ships sqlite WAL + FTS5 BM25 acceleration with optional sqlite-vec detection
The public lite lane was upgraded for single-container deployments: topic rollup search now uses a local sqlite
acceleration index with fail-open fallback, while full multi-backend mode remains unchanged.
- Lite architecture:
topic_rollups + sqlite outbox + sqlite rollup index (WAL + FTS5 BM25).
- Full architecture: staged fast lane
topic_rollups + qdrant + postgres_pgvector plus deep async continuation mindsdb + mongo_raw + letta + memory_bank.
- Optional vector lane:
sqlite-vec capability is auto-detected and remains fail-open when unavailable.
- Correctness guardrail: sqlite rollup search is only used when sqlite generation marker matches the active in-memory rollup snapshot.
- A/B snapshot: topic rollup lookup average
85.065ms → 50.225ms (1.694x faster) in single-container benchmark artifact bench/results/glama_lite_topic_rollup_sqlite_ab_20260324.json.
2026-03-19 · v3.2.3 · Launch Readiness Docs Alignment
Finalized install/deployment mode copy to match staged retrieval runtime lanes
Completed a final launch-readiness docs pass so installation and deployment-mode guidance clearly reflects current v3.2 runtime behavior.
- Installation mode summaries now call out Gateway-Go frontdoor ownership.
- Lite mode now explicitly lists the fast staged lane:
topic_rollups + qdrant + postgres_pgvector.
- Full mode now explicitly lists deep async continuation lane sources:
mindsdb + mongo_raw + letta + memory_bank.
- Release assets include DMG (macOS), MSI (Windows), and Linux bootstrap tarball.
2026-03-19 · v3.2.2 · Graphics + Architecture Alignment
Synced README + website diagrams to current staged retrieval/runtime ownership
Reviewed architecture graphics and corrected source-lane/runtime ownership drift so GitHub README visuals and website architecture pages now match live v3.2 behavior.
- Updated shared SVG architecture panels used by both README and website.
- Reflected current staged retrieval lanes: fast
topic_rollups + qdrant + postgres_pgvector, deep async mindsdb + mongo_raw + letta + memory_bank.
- Updated runtime ownership copy to show Go ingress on
:8075, Rust hot path, and Python fallback on :18075.
- Bumped architecture asset cache keys so updated diagrams render immediately.
2026-03-19 · v3.2.1 · Config Canonicalization + Fallback Audit
Unified configuration paths under config/ and verified Python fallback utility
We standardized configuration layout to one canonical root and completed a Python fallback audit to confirm
runtime-critical Python remains justified while Go/Rust stay primary.
- Moved former
configs/ content to config/mcp/ and rewired compose/script paths.
- Kept runtime lock authority at
config/env/strict_runtime.env.
- Added compatibility file
config/mcp/memorybank-gateway.config.json for override workflows.
- Published audit:
docs/audits/python_fallback_audit_v3.2.1.md.
- Fallback lane health verified on
:18075 for /health, /migration/runtime, and /memory/search.
2026-03-19 · v3.2.0 · Public V3 Runtime Cutover
Go-first read runtime is now default on public V3 (Python moved to fallback lane)
Public V3 now serves through gateway-go on :8075 by default, with Python retained as a rollback-only fallback lane.
We also promoted full icm_spike memory-bank policy wiring into gateway runtime and moved memory-bank to async slow-source continuation.
- Primary path switch:
/memory/search executes in Go staged retrieval by default; Python fallback is exposed on :18075.
- Memory-bank reliability fix: gateway now consumes
ORCH_MEMORY_BANK_SEARCH_BACKEND=icm_spike and configured fallback chain.
- A/B artifact:
bench/results/live_runtime_comparison_v3.2.0_cutover.json (20 interleaved runs per lane).
- Measured delta: p50
0.202s → 0.139s (31.1% faster), p95 0.429s → 0.268s (37.5% faster), mean 0.255s → 0.157s (38.5% faster).
- Result parity in benchmark: median result count
4 on both lanes with identical returned-now source mode.
2026-03-13 · Runtime Tuning
Qdrant staged-fetch caps tuned by retrieval mode + compose passthrough hardening
Added mode-specific Qdrant sync timeout caps in gateway-go and exposed tuning envs in
compose so operators can tune fast/balanced/deep behavior without code edits.
- New knobs:
ORCH_RETRIEVAL_QDRANT_SYNC_TIMEOUT_CAP_FAST_SECS, ..._BALANCED_..., ..._DEEP_....
- A/B artifact pair:
bench/results/qdrant_tuning_20260313T004405Z.json vs bench/results/qdrant_tuning_20260313T213430Z.json.
- Observed p95 delta (same harness): baseline
593.575ms → 84.843ms (~7.00x), deep-tail 995.125ms → 162.870ms (~6.11x), fast-path 686.621ms → 603.521ms (~1.14x).
- Companion matrix artifact:
bench/results/perf_shortlist_matrix_20260313T213440Z.json.
2026-03-05 · V3 Planning
Published V3 roadmap for issues #68-#72
We published the V3 execution roadmap to show how performance, deep-read stability,
recall quality, and runner interoperability will be integrated and tested as one program.
- Roadmap page: roadmap.html
- Includes grouped tracks, integration scope, and benchmark + recall + security gates.
- Focus is application efficacy: more correct outcomes per request with lower tail latency.
2026-03-04 · v2.0.0 · Runtime Cutover Benchmark
Rust+Go runtime is now default; Python retained as legacy fallback
We ran a live A/B on the same /memory/search path before and after cutover.
Test profile: bench/phase1_runtime_comparison.py, 8 requests, 20s timeout.
- Cutover ON (Rust+Go): mean
3557ms, p50 2334ms, p95 8494ms, p99 9359ms, errors 0/8.
- Legacy OFF (Python): mean
17565ms, p50 20006ms, p95 20008ms, p99 20008ms, errors 7/8 (timeouts).
- Observed delta: mean
4.94x faster, p50 8.57x faster, p95 2.36x faster.
- Compose defaults now start with Rust+Go enabled; Python is rollback/legacy path only.
2026-02-18 · Messaging Surface + Cloud Validation
Channel command bridge live, with optional Qdrant Cloud gRPC validation
- Added orchestrator-native messaging endpoints for OpenClaw/ZeroClaw, Telegram, and Slack command routing.
- Added command parsing for
@ContextLattice remember|recall|status with project/topic directives.
- Added a one-shot 04:30 MT launch gate scheduler command and status/cancel controls.
- Added Qdrant Cloud BYO connectivity check script that validates both HTTP and gRPC paths while keeping local-first default behavior.
2026-02-18 · Launch Readiness
Public beta hardening and release packaging complete
- Added launch-readiness gate automation for authenticated load, queue drain, backup/restore drill, and production security preflight simulation.
- Published release image lockfile for reproducible deployments with pinned digest references.
- Finalized legal package (terms, privacy, DPA baseline, acceptable use, subprocessors, commercial licensing baseline).
- Published public messaging package and next-track messaging-surface expansion plan.
2026-02-18 · Throughput + Integration
New integrations guide, dark visual refresh, and faster sink fanout tuning
- Added a dedicated Integrations page for ChatGPT app, Claude chat apps (desktop and web), Claude Code, Codex, and OpenClaw/ZeroClaw wiring.
- Moved public overview pages to a dark-first theme and aligned navigation across Home, Architecture, Updates, Installation, and Troubleshooting.
- Expanded compose env passthrough for fanout and Mongo pool tuning to speed queue drain and improve sustained write throughput.
- Documented the default launch path (
gmake mem-up) and explicit full/lite commands for local-first deployment.
2026-02-16 · Architecture Clarity
Landing page now explains each service and why it exists
- Added component-by-component breakdown for Orchestrator, Memory Bank, Qdrant, Mongo, MindsDB, Letta, outbox, and retention.
- Added explicit “benefit + why” summaries so operators understand service intent at a glance.
- Connected these components directly to learning retrieval outcomes and ranking quality over time.
2026-02-16 · Pilot Experience
New pilot landing page published
We shipped a new public pilot page with a clearer value narrative and current platform capabilities.
- Headline focus: fix context drift before it burns token budget.
- Pilot structure: week-by-week baseline, optimization, and ROI readout.
- Capability highlights: coalesced fanout, backlog-aware Letta admission, and sink retention.
- Trust framing: private-by-default, BYOK-compatible, local-first operation.
2026-02-16 · Retrieval Intelligence
Orchestrator learning schema + Letta RAG emphasis
The orchestrator becomes more accurate over time by learning from feedback signals and applying
preference-aware reranking during retrieval. This learned ranking is further supported by RAG via
Letta archival memory, alongside Qdrant semantic recall and raw-source fallbacks.
- Learning loop reinforces what sources and patterns are actually useful in your workflows.
- Federated retrieval merges Qdrant, Mongo raw, MindsDB, Letta, and memory-bank lexical fallback.
- Result quality improves iteratively as the learning schema accumulates real operator feedback.
2026-02-16 · Reliability + Scale
Queue and storage safety upgrades
- Fanout coalescer reduces duplicate hot-file writes before they hit outbox depth.
- Letta admission control protects throughput by shedding low-value Letta writes under backlog.
- Low-value retention sweeps for Qdrant and Letta keep storage pressure bounded.
- Shared HTTP client pools and batched fanout improve write-path efficiency.