Skip to main content

The Context Engineering Problem Nobody's Solving

The Context Engineering Problem Nobody's Solving

Context engineering has gone from a niche concern to the defining discipline of AI engineering in under a year. Anthropic published their guide to effective context engineering for agents in September 2025. Manus shared lessons from rebuilding their agent framework four times. A QCon London 2026 talk was titled "Context Engineering: Building the Knowledge Engine AI Agents Need." Academic papers are being published on the topic. It even has its own newsletter.

The money is following. Mem0 raised $24M to build a "memory passport" for AI agents. Letta raised $10M to commercialise stateful MemGPT-style agents. Supermemory raised $2.6M for a universal memory API. Cognee raised $7.5M. Over $31M in venture funding and more than 120,000 GitHub stars are aimed at a single thesis: LLMs need persistent memory to be useful.

They're right about the thesis. But they're solving the wrong problem.


The Problem Everyone Is Solving

The agent memory space in 2026 looks like this: a dozen startups and open-source projects are building systems that store what the AI said and did, then retrieve relevant pieces for future sessions.

Mem0 stores "memories" — extracted facts from conversations — and retrieves them via semantic search. Letta gives agents direct control over their own memory blocks through tool calls. LangMem integrates with LangGraph for memory within that framework. Claude-Mem captures tool usage observations and compresses them into summaries. Zep tracks how facts change over time with temporal knowledge graphs.

These are all useful. They all solve a real problem: AI agents forget what happened in previous conversations. But they all share the same assumption about what "context" means.

They assume context is conversational history. What the user said. What the agent did. What was discussed. The memory systems store it, compress it, and retrieve it so the next session can pick up where the last one left off.

This is the personalization problem. It's important. But it's not the whole problem — and for coding agents specifically, it's not even the most important part.


The Problem Nobody Is Solving

A coding agent working on a real project doesn't just need to remember what happened last session. It needs to know:

What rules govern this project. Not what it discussed with you — what constraints exist regardless of conversation history. "All CSS must be in external files." "Never use offset pagination." "The API returns 403 for revoked keys, not 401." These aren't memories. They're institutional knowledge. They existed before the agent's first session and they'll exist after it.

What decisions were made and why. Not a compressed summary of a conversation where a decision was mentioned — a structured record with the options considered, the trade-offs evaluated, and the rationale. So the next agent (or the same agent in a new session) doesn't re-litigate settled choices.

What procedures to follow. Deployment has 10 steps. Incident response has a specific sequence. Client onboarding has a checklist. These aren't memories to retrieve by semantic similarity — they're executable procedures with branching, checkpoints, and persistent progress that survives session interruptions.

What users are saying. Bug reports, feature requests, complaints, praise. The feedback that should inform what the agent works on next. This isn't conversational history — it's structured input from people who use the software the agent is building.

What was shipped and when. A timeline of deployments, migrations, and releases. Not hidden in git logs — queryable, structured, and available at session start so the agent knows the current state of the project.

What the design system looks like. Colours, typography, spacing, component definitions with variants and states. Not described in prose — structured data the agent reads and implements directly.

None of this is "memory" in the way the memory startups define it. It's not extracted from conversations. It's not retrieved by semantic similarity. It's not compressed summaries of past interactions. It's structured, typed, operational knowledge that the agent needs to function effectively on a real project.


Why Semantic Search Is the Wrong Retrieval Model

Every major memory system uses semantic search as its primary retrieval mechanism. Store memories as embeddings. When the agent needs context, embed the current query and find the closest matches in vector space.

This works beautifully for conversational recall. "What did the user say about authentication?" retrieves memories where authentication was discussed, even if the exact word wasn't used.

It fails for operational context. Consider these scenarios:

"Give me all high-priority rules for this project." This isn't a similarity query. It's a structured filter: type equals "rule", priority equals "high". Semantic search would return memories that are about rules, not necessarily the rules themselves. And it would rank them by similarity to the query, not by priority — which is the opposite of what you want.

"What are the deployment steps?" You need an ordered procedure, not a ranked list of memories that mention deployment. Semantic search might return five fragments from three different conversations where deployment was discussed. What you need is a single, authoritative, step-by-step runbook.

"What are users reporting about the export feature?" You need structured feedback items filtered by topic, not memories of conversations where exports were mentioned. The feedback came from users via a widget, not from the agent's own conversation history.

The Manus team's blog post on context engineering makes this point from a different angle. They describe rebuilding their agent framework four times, each time discovering a better way to shape context. Their key insight: the KV-cache hit rate — how efficiently the model reuses cached context across turns — is the single most important metric for a production agent. Semantic retrieval on every turn is expensive and unpredictable. Structured, cacheable context is fast and reliable.

Anthropic's own context engineering guide echoes this. They describe context as a finite resource requiring careful curation, not a similarity search problem. Their guidance focuses on structured context interfaces — tools, system prompts, and explicit information loading — not on embedding-based retrieval from a memory store.

The pattern emerging from practitioners who've built production agents is clear: structured retrieval by type, tag, scope, and priority outperforms semantic retrieval for operational context. Semantic search is a complement, not a foundation.


The Gap in the Market

Here's what exists today for a developer using Claude Code or Cursor who wants persistent project context:

Flat files (CLAUDE.md, .cursorrules) — free, zero setup, breaks at scale. No filtering, no types, no team sharing. Every instruction competes for attention in the context window.

General-purpose memory (Mem0, Letta, Zep) — sophisticated systems designed for conversational AI memory. Semantic retrieval. Optimized for personalization. Not designed for structured project knowledge, deployment tracking, feedback loops, or executable procedures.

Local MCP memory servers (mcp-memory-service, Claude-Mem) — open source, run locally, provide basic persistence. Require local installation and maintenance. Don't share across machines or team members. Most use vector search.

The missing category: A hosted, structured, operational context platform designed specifically for coding agents. Where project rules, decisions, warnings, and patterns are stored as typed entries with tag-based filtering. Where changelogs are published via API as features ship. Where user feedback is collected and queryable by the agent. Where deployment procedures are executable runbooks with persistent progress. Where agent identities are defined with specific roles, tools, and context requirements.

Not memory. Infrastructure.


Memory vs Infrastructure

The distinction matters because it determines what you build, how you retrieve it, and who it serves.

Memory answers: "What happened?" It stores observations, extracts facts, and retrieves them when relevant. It's retrospective. The agent looks backward to inform the present.

Infrastructure answers: "What should I do?" It stores rules to follow, procedures to execute, feedback to act on, and identities to assume. It's prescriptive. The agent reads operational context to inform its actions going forward.

Memory is about the agent's experience. Infrastructure is about the project's needs.

A memory system stores "the user prefers early returns over nested conditionals." An infrastructure system stores a rule: type preference, tagged code-style, priority normal, body "prefer early returns over nested conditionals." The difference is subtle in this example. It becomes significant at scale.

When the project has 300 entries, the infrastructure approach lets the agent query "give me all code-style preferences" and get exactly 12 entries. The memory approach requires semantic search across 300 memories and hopes the right ones surface.

When a new developer joins the team, the infrastructure approach gives their agent access to the same structured knowledge base. The memory approach gives them nothing — the memories belong to the previous developer's conversation history.

When the agent needs to follow a deployment procedure, the infrastructure approach provides a runbook with steps, branching, and progress tracking. The memory approach provides fragments of past conversations where deployment was discussed.


What the Industry Is Missing

As one analysis of the agent memory space put it: "Over 120K GitHub stars and $31.5M in venture funding aimed at a single thesis: LLMs need persistent memory to be useful. They're right about the thesis. The disagreement is about how to get there."

The disagreement so far has been about retrieval strategies — vector search vs graph traversal vs agent-managed memory blocks. But the deeper disagreement should be about what "context" means for different types of agents.

For a general-purpose conversational AI, memory (personal facts, preferences, conversation history) is the right context. Mem0, Zep, and Letta are well-positioned for this.

For a coding agent working on a real project with a team, memory is necessary but not sufficient. The agent also needs:

  • Structured project knowledge with types, tags, priorities, and scopes
  • Changelog tracking so it records what it ships as it ships it
  • Feedback ingestion so it knows what users need without the developer relaying it
  • Executable procedures with persistent progress for multi-step operations
  • Agent identity management for consistent behaviour across sessions and multi-agent coordination
  • Design system data so it builds UI consistently without burning context on component definitions

Each of these is a service. Together they form an infrastructure layer that sits between the AI model and the coding tool — providing the operational context that neither the model nor the tool provides natively.


Why This Window Matters

The coding agent market is growing fast. Cursor has crossed $500M ARR. Claude Code is the default terminal agent. Windsurf, Cline, and others are growing their user bases. Every one of these tools will need an answer to the context problem — and right now, they all share the same blind spots.

Right now, that answer is flat files. CLAUDE.md, .cursorrules, project settings. These work for small projects. They're already failing for larger ones — Claude Code's own documentation recommends keeping CLAUDE.md under 200 lines because adherence degrades with length.

The question is what comes after flat files. The memory startups are offering one answer: semantic memory extracted from conversations. But coding agents need more than memory. They need infrastructure.

The tools themselves are unlikely to build this infrastructure natively. Cursor's job is to be the best editor. Claude Code's job is to be the best terminal agent. Adding changelog hosting, feedback widgets, runbook execution engines, and design system APIs to each of them would dilute their core products. They'll build basic memory (and they are — Claude Code's Auto Memory and Session Memory are evidence). They won't build the full operational stack.

That leaves a gap. And the gap is exactly where context engineering — real context engineering, not just memory — becomes critical.


What Real Context Engineering Looks Like

The Manus team said it best: "Context engineering is still an emerging science — but for agent systems, it's already essential. Models may be getting stronger, faster, and cheaper, but no amount of raw capability replaces the need for memory, environment, and feedback."

Memory, environment, and feedback. Not just memory.

Real context engineering for coding agents means:

  1. Loading the right context at session start — not everything, not a semantic guess, but structured queries that return exactly what the agent needs for the current task.

  2. Capturing knowledge in the moment — storing rules, decisions, and warnings as they're discovered during development, not after the session ends.

  3. Connecting the agent to its operational environment — user feedback, deployment history, active procedures, design system constraints — through structured, queryable services.

  4. Defining agent identity and coordination — who the agent is, what it can do, what context it should load, and how it works with other agents.

  5. Making context portable — across sessions, across machines, across team members, across coding tools. Not locked in a local file or a single tool's memory system.

This is what context engineering looks like when it grows up. Not bigger context windows. Not smarter compression. Not better embeddings. A structured infrastructure layer that gives agents the operational context they need to function as genuine development partners.

Everyone's building memory. Almost nobody is building infrastructure. That's the problem. Meanwhile, developers are building their own memory systems because nobody else has.


Built by Minolith — micro-services for AI coding agents. Structured context, changelogs, feedback, runbooks, agent orchestration, and design systems. All via MCP.

Ready to get started?

14-day free trial with 500 credits. No card required.