Your Context Window Is a Budget — Here's How to Stop Blowing It

If you’re using agentic coding tools like Claude Code, there’s one thing you should know by now: your context window is a budget, and everything you do spends it.

I’ve been thinking about how to manage the budget. As we are learning how to use sub-agents, MCP servers, and all these powerful capabilities we haven’t been thinking enough about the cost of using them. Certainly the dollars and cents matters too if you are using API access, but the raw token budget you burn through in a single session impacts us all regardless. Once it’s gone, compaction kicks in, and it’s kind of a crapshoot on whether it knows how to pick up where we left off on the new session.

Before we talk about what you can do about it, let’s talk about where your tokens go, or primarily are used.

Why Sub-Agents Are Worth It (But Not Free)

Sub-agents are one of the best things to have in agentic coding. The whole idea is that work happens in a separate context window, leaving your primary session clean for orchestration and planning. You stay focused on what needs to change while the sub-agent figures out how.

Sub-agents still burn through your session limits faster than you might expect. There are actually two limits at play here:

  • the context window of your main discussion
  • the session-level caps on how many exchanges you can have in a given time period.

Sub-agents hit both. They’re still absolutely worth using and working without them isn’t an option, but you need to be aware of the cost.

The MCP Server Problem

MCP servers are another area where things get interesting. They’re genuinely useful for giving agentic tools quick access to external services and data. But if you’ve loaded up a dozen or two of them? You’re paying a tax at the start of every session just to load their metadata and tool definitions. That’s tokens spent before you’ve even asked your first question.

My suspicion, and I haven’t formally benchmarked this, is that we’re headed toward a world where you swap between groups of MCP servers depending on the task at hand. You load the file system tools when you’re coding, the database tools when you’re migrating, and the deployment tools when you’re shipping. Not all of them, all the time.

There’s likley more subtle problems too. When you have overlapping MCP servers that can accomplish similar things, the agent could get confused about which tool to call. It might head down the wrong path, try something that doesn’t work, backtrack, and try something else. Every one of those steps is spending your token budget on nothing productive.

The Usual Suspects

Beyond sub-agents and MCP servers, there are the classic context window killers:

  • Web searches that pull back pages of irrelevant results
  • Log dumps that flood your context with thousands of lines
  • Raw command output that’s 95% noise
  • Large file reads when you only needed a few lines

The pattern is the same every time: you need a small slice of data, but the whole thing gets loaded into your context window. You’re paying full price for information you’ll never use.

And here’s the frustrating part — you don’t know what the relevant data is until after you’ve loaded it. It’s a classic catch-22.

Enter Context Mode

Somebody (Mert Köseoğlu - mksglu) built a really clever solution to this problem. It’s available as a Claude Code plugin called context-mode. The core idea is simple: keep raw data out of your context window.

Instead of dumping command output, file contents, or web responses directly into your conversation, context-mode runs everything in a sandbox. Only a printed summary enters your actual context. The raw data gets indexed into a SQLite database with full-text search (FTS5), so you can query it later without reloading it.

It gives Claude a handful of new tools that replace the usual chaining of bash and read calls:

  • ctx_execute — Run code in a sandbox. Only your summary enters context.
  • ctx_execute_file — Read and process a file without loading the whole thing.
  • ctx_fetch_and_index — Fetch a URL and index it for searching, instead of pulling everything into context with WebFetch.
  • ctx_search — Search previously indexed content without rerunning commands.
  • ctx_batch_execute — Run multiple commands and search them all in one call.

There are also slash commands to check how much context you’ve saved in a session, run diagnostics, and update the plugin.

The approach is smart. All the data lives in a SQLite FTS5 database that you can index and search, surfacing only the relevant pieces when you need them. If you’ve worked with full-text search in libSQL or Turso, you’ll appreciate how well this maps to the problem. It’s the right tool for the job.

The benchmarks are impressive. The author reports overall context savings of around 96%. When you think about how much raw output typically gets dumped into a session, it makes sense. Most of that data was never being used anyway.

What This Means for Your Workflow

I think the broader lesson here is that context management is becoming a first-class concern for anyone doing serious work with agentic tools. It’s not just about having the most powerful model, it’s about using your token budget wisely so you can sustain longer, more complex sessions without hitting the wall.

A few practical takeaways:

  • Be intentional about MCP servers. Load what you need, not everything you have.
  • Use sub-agents for heavy lifting, but recognize they cost session tokens.
  • Avoid dumping raw output into your main context whenever possible.
  • Tools like context-mode can dramatically extend how much real work you get done per session.

We’re still early in figuring out the best practices for working with these tools. But managing your context window? That’s one of the things that separates productive sessions from frustrating ones.

Hopefully something here saves you some tokens.

/ AI / Programming / Developer-tools / Claude