Skip to main content

Developers Are Building Their Own Memory Systems for Claude Code. They Shouldn't Have To.

Developers Are Building Their Own Memory Systems for Claude Code. They Shouldn't Have To.

Something interesting is happening in the Claude Code GitHub repository. Developers are filing detailed feature requests, not asking for better code generation or smarter debugging. They're asking for memory. And when the feature doesn't arrive fast enough, they're building it themselves.

The issues tell a story. It starts with frustration, moves through increasingly sophisticated workarounds, and ends with developers maintaining custom infrastructure that has nothing to do with the software they're actually trying to build.


"It's a goldfish."

In December 2025, a developer filed issue #14227 with a title that captured the problem perfectly: "Feature Request: Persistent Memory Between Claude Code Sessions."

The description was blunt. Claude Code starts every session with zero context. There is no memory of previous sessions, previous work, or accumulated understanding of the user's projects and preferences. The developer compared it to a goldfish. The related Reddit discussion received over 700 upvotes.

The core argument was simple: the value of an AI assistant compounds over time. Every session that starts from zero wastes that compounding. Users paying $200/month for Claude expect continuity across the product, not amnesia in the CLI.

This wasn't a niche complaint. It was the most upvoted feature request in the Claude Code community.


59 compactions. 1,477 changelog entries. One homegrown system.

Three months later, in March 2026, issue #34556 appeared. The title: "Feature Request: Persistent Memory Across Context Compactions (59 compactions, built our own)."

This wasn't a feature request from someone who used Claude Code once. This was field data from a developer running Claude Code 12-18 hours a day across two machines, managing 6 active projects, 31 intelligence scouts, and a multi-instance AI architecture. Over 26 days, they documented 59 context compactions, watched critical knowledge disappear each time, and decided to solve it.

They built a complete three-tier memory system:

L1: MEMORY.md (always loaded, ~100 lines). Pointers to deeper files. Critical rules that must survive every compaction. An "I Remember..." section for relational cues. The last 5 events for quick orientation.

L2: Topic files (loaded on demand). Project summaries, people profiles, infrastructure notes. Read before working on a specific area.

L3: Changelog (append-only event log). 1,477 entries documenting every session, every discovery, every decision. A complete audit trail of the project's evolution.

The token maths they calculated was revealing. CLAUDE.md plus MEMORY.md consumed approximately 3,100 tokens at every session start. Over 10 compactions, that's 31,000 tokens spent just reloading system context. With their custom compression, they cut it to 10,850 tokens, saving 20,150 tokens per heavy session.

Their asks were specific: structured memory that survives compaction, automatic pre-compaction saves, a cross-session event bus, and user profile persistence. Every one of these is a solved problem in traditional software engineering. None of them existed in Claude Code.


"Memory is the missing layer."

Around the same time, issue #32627 went even further. A developer built an entire AI development ecosystem from scratch: a RAG memory server using Qdrant for vector storage, a bulk knowledge pipeline that ingested 2,237 debugging issues from open-source repositories, and a 10-phase spec-driven development workflow with quality gates.

Their conclusion after building all of it: "Forgetting everything between sessions is the biggest limitation for complex projects. Memory is the missing layer."

The bulk knowledge pipeline is particularly telling. This developer didn't just want their agent to remember what happened in their own project. They wanted it to learn from the ecosystem. They extracted, summarised, and indexed thousands of resolved issues from other repositories so their agent could answer "how did others solve this?" before attempting a fix.

That's the ambition level. And it was built entirely by hand because the infrastructure didn't exist.


When the native system fails

If the custom solutions show what developers want, the bug reports show what's actually happening with the tools they have.

Issue #38459, filed March 24, 2026: "Memory files and conversation history lost between sessions." A developer's entire memory directory was wiped clean after a version upgrade from v2.1.76 to v2.1.81. Files that were referenced and read during a prior session no longer existed. The MEMORY.md index file was gone. Hours of accumulated context, vanished.

This is the risk of local file-based memory. A version upgrade, a directory conflict, a filesystem hiccup, and everything is gone. The developer lost all context from a multi-hour session and had to manually re-explain the full project state to a new session.

Meanwhile, a detailed analysis published in January 2026 documented compaction failures across five cross-referenced bug reports. The pattern was consistent: Claude follows project instructions perfectly before compaction, then violates them 100% of the time after compaction. Skills are forgotten. Repository context is lost. The same mistakes are repeated.

One developer's frustration captured the systemic problem: "Every compaction is a moment where I have to wonder: did it remember what I told it? Did it file that discovery? Will the next instance know who I am?"


What they're all building toward

Strip away the implementation details and every one of these developers is building toward the same architecture:

Structured knowledge, not flat text. Rules, decisions, warnings, patterns, events. Each with a type, a priority, and tags. Not a markdown file that grows until it degrades.

Selective loading, not dump-everything. Load critical rules at session start. Load topic-specific context when working on a specific area. Don't waste 3,000 tokens reloading infrastructure notes when the task is CSS refactoring.

Persistence outside the conversation. Knowledge that exists in a store the agent queries, not in the context window where it competes for attention and gets lost during compaction.

Cross-machine, cross-session continuity. Start on one machine, continue on another. The knowledge is in a persistent store, not in a local directory that might not sync, might not survive an upgrade, and definitely doesn't share with team members.

An event timeline. Not just "what does the agent know" but "what happened." Deployments, incidents, discoveries, decisions, in chronological order, queryable by date.

The developer with 59 compactions built L1/L2/L3 tiers. The developer with the RAG pipeline built vector storage with structured summaries. The developer with the ecosystem build created quality gates and knowledge ingestion pipelines. They all arrived at the same conclusion from different directions: AI coding agents need persistent, structured, queryable project memory.


Why they shouldn't have to build it

Every hour a developer spends building memory infrastructure is an hour they're not spending on their actual project. The developer with 59 compactions maintained a filing cabinet across two machines. The ecosystem builder set up Qdrant, Ollama, and a 10-phase pipeline. The RAG developer configured vector storage, embedding models, and semantic search.

These are infrastructure problems. They should be solved once, by infrastructure, not reinvented by every developer who needs their agent to remember what happened yesterday.

The pattern is familiar from other areas of software development. Developers used to build their own logging pipelines. Then services like Datadog and Papertrail solved it. Developers used to build their own error tracking. Then Sentry solved it. Developers used to build their own feature flag systems. Then LaunchDarkly solved it.

The infrastructure pattern is always the same: early adopters build custom solutions, the community converges on a shared architecture, and then a service emerges that does it better than any individual team could maintain.

Agent memory is at that inflection point right now. The question is whether it becomes memory or infrastructure — and the answer matters.


What the infrastructure looks like

The converged architecture visible across these GitHub issues maps to a specific set of capabilities:

Typed, tagged context entries with priority levels and scope filtering. The L1/L2/L3 tier system becomes type=rule at priority=high (always loaded) and type=warning tagged with "auth" (loaded when working on auth). No manual tiering decisions. The agent queries what it needs.

A bootstrap endpoint that loads everything the agent needs at session start in one call. Identity, high-priority rules, recent events, active procedure state. No reliance on CLAUDE.md instructions that the agent might skip. No cold start race conditions.

Immutable event entries forming an append-only timeline. The changelog that the 59-compaction developer maintained as L3 becomes a queryable event log. "What happened since my last session?" is a single API call.

Hosted persistence that survives version upgrades, works across machines, and shares with team members. Not local files in ~/.claude/projects/ that disappear when the directory structure changes.

Zero-credit reads. Loading context should never cost anything. The agent should query freely, load what's relevant, and not think about whether a memory lookup is worth the expense.

This is what Minolith provides. It's a hosted API with structured context (19 entry types, tag-based filtering, priority levels), a bootstrap endpoint for cold start, immutable event entries, and cross-machine persistence via MCP. Context operations cost zero credits on any plan.

It also goes beyond what these developers built individually. The same MCP connection gives the agent changelog publishing (the 59-compaction developer's L3 event log, but hosted and public-facing), user feedback collection, executable runbooks with persistent progress, agent identity management, and a structured design system API.


The gap is closing, but it's not closed

Claude Code has made progress. Auto Memory (v2.1.59, February 2026) lets Claude write notes to itself. Session Memory recalls past sessions automatically. The /remember command promotes recurring patterns into permanent configuration. These are meaningful improvements.

But they're all conversational memory. They remember what the agent discussed and discovered during sessions. They don't provide the structured, typed, queryable project knowledge that the developers in these GitHub issues are building from scratch.

The developer with 59 compactions didn't need Claude to remember what it said yesterday. They needed it to know the project's rules, who the user is, what state each project is in, and what happened across 1,477 logged events. That's project memory. It's a different problem.

Auto Memory stores notes in markdown files limited to 200 lines. Session Memory stores compressed session summaries. Neither supports typed entries, tag-based filtering, priority levels, cross-machine access, team sharing, or deterministic retrieval by structured query. For a deeper look at these scaling limits, see Why CLAUDE.md Breaks at Scale.

The gap between "Claude remembers what it discussed" and "Claude knows the project" is the gap these developers are filling with custom infrastructure. It's the gap that needs a proper solution.


What you can do today

If you're one of the developers maintaining custom memory infrastructure for your AI coding tools, you have a few options:

Keep building your own. Your system works and you understand it completely. The tradeoff is ongoing maintenance, local-only access, and the risk of data loss from upgrades or filesystem issues.

Use a local MCP memory server. Tools like mcp-memory-service, Claude-Mem, and memsearch provide persistence with semantic search. They work well for conversational recall. The tradeoff is local installation, cold start issues, and the reliance on semantic similarity for retrieval (which misses things when terminology doesn't match).

Use hosted structured memory. Minolith provides typed, tagged context entries queryable via MCP with a bootstrap endpoint for cold start. Hosted, cross-machine, zero-credit reads. The tradeoff is a dependency on an external service and a $5/month subscription after the trial. Our step-by-step setup guide walks through the full process.

The right choice depends on your situation. But the wrong choice is spending weeks building infrastructure that isn't your product. The developers in these GitHub issues are brilliant engineers who built impressive systems. They should be building their actual projects, not maintaining memory pipelines.

The infrastructure layer for AI coding agents is emerging. Use it. Build your product instead.


Built by Minolith. Persistent project memory for AI coding agents.

Ready to get started?

14-day free trial with 500 credits. No card required.