Two Kinds of Memory for Your CLI Agent

So you set up a memory layer for your local CLI agent. Now what? How do you actually get that memory in front of the agent so it does something useful?

I’m going to walk through what I did with mem0, but the shape of this applies to pretty much any memory layer. The first thing worth understanding is that CLI agents work with memory in two very different ways, and the difference matters.

The first way is text that’s always loaded. It gets injected into every session’s context automatically. No action needed on the agent’s part, it’s just there. This is your guaranteed data, the stuff that shows up at the start of every conversation.

The second way is semantic memory. For me that’s mem0 and the tooling I’ve built around it. This layer is accessed through an MCP server that exposes commands like recall and remember. It’s poll-based. The agent has to decide to call recall, because nothing gets auto-injected. The agent needs to be smart enough to say “I’m not sure about this, let me go look it up.”

Those are the two flavors. Let me break them down.

Layer 1A: The Shared Instructions File

For most CLI agents, this is a single markdown file that the harness auto-loads into every session. Claude Code reads CLAUDE.md. Gemini and Antigravity read GEMINI.md. And AGENTS.md has become the cross-tool convention, read by OpenCode, Antigravity, and Cursor alike. Same idea everywhere, just a different filename.

The one rule here: keep it minimal. Every line in this file is context you’re paying for on every single session. So don’t dump your whole knowledge base into it. The durable conventions, the project-specific facts, the things you only occasionally need? Those belong in your semantic layer, not here. This file is for the handful of rules that need to be loaded 100% of the time.

Layer 1B: Auto-Memory

Claude Code shipped a feature called auto-memory. It lives in ~/.claude/projects/, inside a subfolder that’s basically a slug of your project’s path on disk. In there you get a memory folder with a MEMORY.md file alongside the individual memory files.

MEMORY.md works like an index. It holds short pointers to the durable memories stored next to it, and the whole thing gets loaded every session. It’s still part of layer 1, the always-loaded kind.

Worth noting: this is a Claude Code thing. OpenCode and Antigravity don’t load or even know about these files. There’s no equivalent. Antigravity does have its own separate memory store that it syncs on its own, but it’s a different mechanism entirely, not a reader of Claude’s auto-memory.

Layer 2: The Semantic Layer

This is where it gets fun. I built a small MCP server in Go, a local binary that forwards requests to another server on my network. That server talks to two databases: Qdrant for the vectors, and Neo4j for the graph. The three functions I lean on most are recall, remember, and add_relation.

If MCP is new to you, the short version: it’s an open standard that lets your agent connect to external tools and data over a common protocol. Instead of N bespoke integrations, you run one MCP server per capability and the host discovers it. People call it “a USB-C port for AI,” which is annoying and also pretty accurate.

Wiring it up is just config. For Claude Code, it goes in ~/.claude.json under the top-level mcpServers.memory block. For OpenCode, it’s ~/.config/opencode/opencode.json under mcp.memory, with type: local and a command that runs the binary.

The Part People Forget

Here’s the step that ties it all together. Setting up the MCP server doesn’t do anything on its own. Remember, the semantic layer is poll-based. The agent won’t call recall unless it knows it should.

So you go back to layer 1, your always-loaded instructions, and you add a few lines telling the agent how and when to use the MCP server. Something like “before answering questions about this project, call recall with a relevant query” and “when the user tells you something worth keeping, call remember.” That instruction is small, it’s cheap, and it’s what turns a dormant memory store into a memory layer the agent actually reaches for.

That’s the whole architecture. Always-loaded text that’s guaranteed but expensive, and a semantic store that’s huge but only as good as the agent’s instinct to go check it. Get both layers talking and your agent stops forgetting who you are every morning.

Sources

I’d appreciate a follow. You can subscribe with your email below. The emails go out once a week, or you can find me on Mastodon at @[email protected].

AI Tools Agents Mcp Memory