AI
-
AI-Assisted vs AI-Agentic Coding
There are two ways to work (c0de) with AI tools right now. I think most people know the other one exists, but they haven’t taken the time to try it. You should know how to do both. And when to do both.
Assisted Mode
Everybody knows this one. You write some code, you get stuck, you ask a question.
How does date parsing work in Python? What’s this function do? Haven’t we built this already? I need some fucking Regex again.
The AI answers. You copy-paste or accept the suggestion. You keep going. You’re driving. The AI is in the passenger seat reading the map.
I mean, this is really useful. I’m not going to pretend it isn’t. It’s also just autocomplete with opinions. Fancy autocomplete. Smart autocomplete.
Great. You’re doing the thinking. You’re deciding what gets built and how to structure it and what order to do things in. You’re just asking for help on some of the blanks. That’s assisted mode.
Agentic Mode
This is different.
You describe what you want. You need to know how to describe what you want.
That is extremely important. Let me say that again. You need to know how to describe what you want.
You need to build an agent that understands how to interpret your description as what you want.
Sometimes it’s going to get it correct and sometimes it’s not. It’s going to go in a different direction than you wanted and you’re going to have to correct it. That’s the job now. You’re reviewing the output, the code, and how it’s producing the code. What are the gaps? You have to find the gaps and improve the agent so that it understands you better.
When I Use Which
I wish I had a clean rule for this. I don’t. That’s the vibes part.
Small or specific things can be assisted. Quick answers. Great. Easy. Move on.
Once you start wanting to touch multiple files, agentic. Major features like commands or parser changes or handler rewrites, recipes or tests. I’m not writing all that by hand. I can describe what I want way better than I can autocomplete it.
Bug fixes? Depends. If I already know where the bug is, assisted. If I don’t, agentic. Let the agent grep around and figure it out. It’s better at reading a whole codebase quickly than I am. Not better at understanding it. Better at reading it.
New features? Almost always agentic. I describe the feature, point it at similar code in the repo, and let it go.
Again, review is super important. Sometimes you have to send it back or start over or change major portions of it. And if you build a system that learns, it’ll get better along the way.
The Review Problem
Switching to agentic mode, your entire job is code review. All day, all the time, constant. That’s the human’s job. Code review.
Are you good at code review? You should get better at it. You need to get better at it.
This is not whether or not the tests pass. You need to identify possible issues and then describe tests that can check for those issues.
The nuanced bugs are the worst. And if those make it to production, you’re going to have problems.
Don’t skim the diff.
That should be the new motto. Read the code. Get better at code comprehension. It’s extremely important. You may be writing less code but you need to sure as shit understand what the code is doing and how it can be bad.
The Hybrid Reality
It’s totally fine to switch between modes depending on what you’re doing or your work session. Agentic can be way more impactful, but assisted mode is way better at helping you understand what the code is doing because you can select code blocks and easily ask questions about it.
So it’s not a toggle, it’s a spectrum. Now isn’t that funny? I’m on the spectrum of agentic development.
Where are you on the spectrum of agentic development?
So Which Is Better?
Neither. Both. It depends. Whatever, just build stuff.
Is assisted mode safer? Really? Like, does the human actually write better code this way? I don’t know. Agentic mode can be faster and you need to be super careful that it’s not gaslighting you into thinking it knows what it’s doing.
Build software for you. And when it makes sense, help out with the community stuff. Support open source.
If you’re a developer, I’d appreciate a follow. You can subscribe with your email below. The emails go out once a week. Or you can find me on Mastodon at @[email protected].
/ AI / Development / Claude / Agents
-
Claude Opus 4.7 Is Here
Anthropic just announced Claude Opus 4.7 yesterday, and here is my take on the new model after reading the blog post and doing a bit of research on their rollout plans from previous models.
What’s New
The headline is a 13% improvement on a 93-task coding benchmark over Opus 4.6. Rakuten’s SWE-Bench saw 3x more production tasks resolved, which is the kind of real-world metric that actually matters. Benchmarks are one thing, but “can it handle my actual codebase” is another.
The big quality-of-life improvement is that Opus 4.7 is better at verifying its own output before telling you it’s done. If you’ve ever had a model confidently hand you broken code and say “there you go,” you know why this matters. It handles long-running tasks with more precision, and the instruction following is noticeably tighter.
There’s also a major vision upgrade. The new model accepts images up to 2,576 pixels on the long edge, which is more than 3x the resolution of previous Claude models. If you’re working with technical diagrams, architecture charts, or screenshots of code, that’s a real improvement.
When Can You Actually Use It?
For enterprise customers, Anthropic says Opus 4.7 is available from your cloud vendor: the API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. But most of us aren’t using the API directly.
As of right now, Opus 4.7 is not yet available in Claude Code or the desktop app. It’s also not showing up in the model picker on claude.ai for Pro plan users. Anthropic’s announcement says “available today across all Claude products,” but that doesn’t seem to have fully rolled out yet for consumer plans.
Looking at previous releases, Opus 4.6 launched on February 5th and was accessible on claude.ai and the API the same day. Historically, Anthropic hasn’t gated new Opus models behind higher tiers, so there’s no reason to think Pro, Max, Team, and Enterprise won’t all get access. The question is just when. If past patterns hold, it should show up within a few days. Keep checking your model picker.
Claude Code Users
As of today, Claude Code on the stable release is still on Opus 4.6. I’m not sure if it’s available on the bleeding edge builds, but for most people it’s not there yet.
The announcement mentions a few Claude Code features coming with 4.7:
/ultrareviewis a new slash command for dedicated code review sessions. Pro and Max users get three free ultrareviews to try it out.- Auto mode has been extended to Max plan users, letting Claude make more decisions autonomously.
- The default effort level is being bumped to
xhigh(a new level betweenhighandmax), which means the model will spend more time reasoning through harder problems.
Once Opus 4.7 does show up in Claude Code, remember to check any custom agents or skills that have a model hardcoded in the frontmatter. If you’ve got
claude-opus-4-6specified in your.claude/commands/directory or agent configurations, those will keep using the old model until you update them.Anthropic also notes that Opus 4.7 follows instructions more literally than previous models. Prompts written for earlier models can sometimes produce unexpected results. So if something feels off after switching, it’s worth re-tuning your prompts.
The Tokenizer and Cost Changes
One thing to be aware of: the tokenizer has been updated. The same input text will produce 1.0 to 1.35x more tokens than before. That means your costs could go up slightly even at the same per-token pricing ($5/million input, $25/million output, unchanged from 4.6). Not a dealbreaker, but worth watching if you’re running high-volume workloads.
Pricing hasn’t changed, the coding improvements look useful, and important to know that the model ID is
claude-opus-4-7. Keep an eye on your model picker over the next few days. -
Agentic Development Trends: What's Changed in Early 2026
I’ve been following the agentic development space around Claude Code and similar tools and the last couple months have been interesting. Here’s what I’m seeing as we move through March and April 2026.
From Solo Agents to Coordinated Teams
The biggest shift is that more people are moving away from trying to build one agent that does everything. Instead, we’re seeing coordinated teams of specialized agents managed by an orchestrator, often running tasks in parallel. I think this is the more proper use of these systems, and it’s great to see the community arriving here.
If you’re curious about the different levels of working with agentic software development, I created an agentic maturity model on GitHub that goes into more detail on this progression.
Long-Running Autonomous Workflows
Early on, agents handled what were essentially one-shot tasks. Now in 2026, agents can be configured to work for days at a time, requiring only strategic oversight at key decision points. Doesn’t that sound fun? You’re still the bottleneck, but at least now you’re a strategic bottleneck.
Graph-Based Orchestration
Frameworks like LangGraph and AutoGen are converging on graph-based state management to handle the complex logic of multi-agent workflows. I think this makes sense when you consider the branching and conditional logic of real-world tasks could map naturally to graphs.
MCP Is Everywhere
MCP (Model Context Protocol) has become the industry standard for tool integration. All vendors fully support it, and there’s no sign of slowing down. Every week there are new MCP servers popping up for connecting agents to different services and tools.
Unified Agentic Stacks
The developer tooling is becoming more consistent. Cursor is becoming more like Claude Code, and Codex is becoming more like Claude Code. Maybe you see a pattern there… might tell you something about who’s setting the pace.
What is also noteable, people are experimenting with using different tools for different parts of the workflow. You might use Cursor to build the interface, Claude Code for the reasoning and main logic, and Codex for specific isolated tasks. Mix and match based on strengths.
Scheduled Agents and Routines
Claude Code recently released routines or scheduled or trigger-based automations that can run 24/7 on cloud infrastructure without needing your laptop. Microsoft with GitHub Copilot are working on similar capabilities? Cursor had something like this a while back too.
Security Gets Serious
Two things happening here. First, people are getting better at leveraging agents for security reviews and monitoring. Tasks that previously required highly specialized InfoSec expertise. You no longer need to be a hacker to find vulnerabilities; you can let your AI try to hack you.
However, the same capabilities that harden defenses can also be used for offensive attacks. We’re seeing a major push for security-first architecture as a requirement for all new applications, specifically to defend against the rise of agentic offensive attacks. Red team and blue team are both getting AI-pilled.
FinOps: Watching the Bill
Last on the list is financial operations. Inference costs now account for over half of AI cloud spending according to recent estimates. Organizations are prioritizing frameworks that offer explicit cost monitoring and cost-per-task alerts. Getting granular about how much you’re spending to solve specific problems and optimizing at the task level. I think that’s pretty interesting and something we’ll see a lot more tooling around.
The common thread across all of these trends is maturity. We’re past the “wow, an AI wrote code” phase and into “how do we make this reliable, secure, and cost-effective at scale.” That’s a good place to be.
/ DevOps / AI / Development / Claude
-
What Is an AI Agent, Actually?
We need some actual definitions. The word “agent” is getting slapped onto every product and service, and marketers aren’t doing anybody favors as they SEO-optimize for the new agentic world we live in. There’s a huge range in what these things can actually do. Here is my attempt at clarity.
The Spectrum of AI Capabilities
Chatbot / Assistant — This is a single conversation with no persistent goals and no tool use. You ask it questions, it answers from a knowledge base. Think of the little chat widget on a product page that helps you find pricing info or troubleshoot a common issue. It talks with you, and that’s about it.
LLM with Tool Use — This is what you get when you open “agent mode” in your IDE. Your LLM can read files, run commands, edit code. A lot of IDE vendors call this an agent, but it’s not really one. It’s a language model that can use tools when you ask it to. The key difference: you are still driving. You give it a task, it does that task, you give it the next one.
Agent — Given a goal, it can plan and execute multi-step workflows autonomously. By “workflow” I mean a sequence of actions that depend on each other: read a file, decide what to change, make the edit, run the tests, fix what broke, repeat. It has reasoning, memory, and some degree of autonomy in completing an objective. You don’t hand it step-by-step instructions. You describe what you want done, and it figures out how to get there.
Sub-Agent — An agent that gets dispatched by another agent to handle a specific piece of a larger task. If you’ve used Claude Code or Cursor, you know what I’m talking about. The main agent kicks off a sub-agent to go research something, review code, or run tests in parallel while it keeps working on the bigger picture. The sub-agent has its own context and tools, but it reports back to the parent. It’s not a separate autonomous agent with its own goals. It’s more like delegating a subtask.
Multi-Agent System — Multiple independent agents coordinating together, either directly or through an orchestrator. The key difference from sub-agents: these agents have their own goals and specialties. They negotiate, hand off work, and make decisions independently. Think of a system where one agent monitors your infrastructure, another handles incident response, and a third writes the postmortem. Each Agent is operating autonomously but aware of the others.
So How Is Something Like OpenClaw Different From a Chatbot?
A chatbot is designed to talk with you, similar to how you’d just talk with an LLM directly. OpenClaw is designed to work for you. It has agency. It can take actions. It’s more than just a conversation.
Obviously, how much it can do depends on what skills and plugins you enable, and what degree of risk you’re comfortable with. But here’s the interesting part: it’s proactive. It has a heartbeat mechanism that keeps it running continuously in the background. It’ll automatically check on things or take action on a schedule you specify, without you having to prompt it.
A Few Misconceptions Worth Clearing Up
OpenClaw is just one specific framework for building and orchestrating agents, but the misconceptions around it apply broadly.
“Agents have to run locally." That’s how OpenClaw works, sure. But in reality, the enterprise agents are running invisibly in the background all the time. Your agent doesn’t need to live on your laptop.
“Agents need a chat interface." Because you can talk to an agent, people assume you must have a chat interface for it to be an agent. But by definition, agents don’t require a conversation. They can just run in the background doing things. No chat window needed.
“Sub-agents are just function calls." This one trips up developers. When your agent spawns a sub-agent, it’s not the same as calling a function. The sub-agent gets its own context window, its own reasoning loop, its own tool access. It can make judgment calls the parent didn’t anticipate. That’s fundamentally different from passing arguments to a function and getting a return value.
Why Write This Down
I mainly wrote this for myself. I keep running into these terms and needing a mental model to put them in context, so as I’m thinking about building agentic systems and trying to decide what level of capability I actually need for a given problem. The process of writing it down makes those decisions somewhat easier.
-
A Concrete Definition of an AI Agent
An AI agent pursues a goal by iteratively taking actions, evaluating progress, and deciding next steps. Useful agents must be reliable, adaptive, and accurate.
/ AI / links / agent / automation
-
The Death of Clever Code
One positive product of working with Agentic tools is they rarely suggest clever code. No arcane one-liners, no “look how smart I am” abstractions. And, well, I’m here for it.
Before we continue it helps to understand a bit about how LLMs work. These models are optimized for pattern recognition. They’ve been trained on massive amounts of code and learned what patterns appear most frequently.
Clever code, by definition, is bespoke. It’s the unusual pattern, the one-off trick. There just isn’t enough training data for cleverness. The AI gravitates toward the common, readable solution instead.
Let me give you an example.
Show Me the Code
Here’s a nested ternary:
const result = a > b ? (c > d ? 'high' : 'mid') : (e > f ? 'low' : 'none');I’d be impressed if you could explain that correctly on your first try. What happens when there’s a bug in one of those conditions? Good luck debugging that.
Now here’s the same logic:
let result; if (a > b) { if (c > d) { result = 'high'; } else { result = 'mid'; } } else { if (e > f) { result = 'low'; } else { result = 'none'; } }A lot easier, right? If it’s easy to read, it’s easy to maintain. The AI tooling doesn’t struggle to read either version, but you might, and when there is a bug, explaining exactly what needs to change becomes the hard part.
Actually wait. It turns out, not all complexity is created equal.
Two Kinds of Complexity
Essential complexity is the complexity of the problem itself. If you’re building a mortgage calculator or doing tax calculations, there’s inherent complexity in understanding the domain. You can’t simplify that away, and you shouldn’t try.
Accidental complexity is the stuff you introduce. The nested ternary instead of the if/else. Five layers of abstraction for the sake of abstraction that only runs in a specific edge case. Generic utility functions where you’ve tried to cover every possible scenario, but realistically you only need two or three cases.
Ok but what about abstraction, since abstraction is where accidental complexity loves to hide?
Good Abstraction vs. Bad Abstraction
Abstraction shows up everywhere in programming, but let’s think about it in two flavors.
Good abstraction hides details the caller doesn’t need to care about. The interface clearly communicates what it does. Think
array.sort(), you look at it and immediately know what’s happening. Those dang arrays getting some sort of sorted. You know exactly what it does without caring about the implementation.Bad abstraction hides details you do need to understand in order to use it correctly. Think of a
processData()method that’s doing six different things with an internal state that’s nearly impossible to test. And splitting it intoprocessData1()throughprocessData6()doesn’t help either. That’s just moving the vegetables around on your plate which doesn’t mean you’ve actually finished dinner.AI Signals
So why does any of this matter for working with AI coding tools?
Because if the agents keep getting your code wrong, if they consistently misunderstand what a function does or there are incorrect modifications, that’s a signal.
It’s telling you that your code has some flavor of cleverness that makes it hard to reason about. Not just for the AI, but for your team, and for you six months from now.
The goal is to code where the complexity comes from the problem, not from the solution. The AI struggling with your code is like a canary in the coal mine for maintainability.
/ AI / Programming / Code-quality
-
Your AI Agent Needs a Task Manager
If you’ve spent time working with AI coding tools, you’ve probably hit the compaction wall. Suddenly, your agent knows what it’s currently working on but has completely forgotten the five other things connected to it.
This is the memory problem, and it’s a big one.
The Context Window Isn’t Enough
Your AI agent needs some sort of memory system that lives outside the context window. When you’re working on simple, one-off tasks, the chat-as-workspace approach works fine. You ask a question, you get an answer, you move on. But the moment you’re tackling a complex set of related tasks? It breaks down fast.
I’ve been thinking about this through the lens of a framework I’m calling the Agentic Maturity Model. The short version is that there are distinct levels to how teams and developers use AI agents, and moving between levels isn’t about using “better” tools, but rather it’s a shift in how you approach the work.
Four months ago, there were no real options. The good news? It seems like all the model providers recognize this is the next frontier. Memory and persistence are where I’m looking for the actual progress to happen next.
Claude Code has certainly gotten better in these areas over the last couple of months. They’ve added an auto memory feature in beta. They added a lightweight Tasks system based on a Todo system called Beads built by Steve Yegge. His key idea was that the task state should live outside the context window.
These are meaningful building blocks towards an actual working memory system that persists across sessions and survives compaction.
We’re Almost There
The tooling and harneses we built on top of the LLMs are already changing how software gets built, but where we are headed? Here is what I think:
- auto-improving memory: where the agent learns your patterns, your codebase, your preferences
- persistent task tracking that survives compaction: Tasks, todos, issues, whatever you want to call them. The point is they exist outside the conversation.
When those two pieces come together properly, the workflow for everyone will change again.
Your agent doesn’t just respond to the current prompt. It knows where it is in a larger plan, what’s been done, what’s blocked, and what’s next. That’s the difference between a helpful chatbot and an actual collaborator.
We are so close I can taste the blood in the water, oh wait, that’s mine. ☠️
-
Using Claude to Think Through a Space Elevator
When I say I wanted to understand the engineering problems behind building a space elevator, I mean I really wanted to dig in. Not just read about it. I wanted to work through the challenges, piece by piece, with actual math backing things up.
So I decided to see what Claude and I could do with this kind of problem.
Setting it Up
I have an Obsidian vault that Claude Code/CoWork has access to, and I started by asking it to help me understand the core challenges of building a space elevator. First things first: clearly state all the problems. What are the engineering hurdles? What makes this so hard?
From there, I started asking questions. Could we use an asteroid as the anchor point and manufacture the cable in space? How would we spool enough cable to reach all the way down to Earth? Would it make more sense to build up from the ground, down from orbit, or meet somewhere in the middle?
I’ll admit I made some mistakes along the way. I confused low Earth orbit with geostationary orbit at one point but Claude corrected me and explained the difference. That’s part of what makes this approach work. You’re not just passively reading; you’re actively thinking through problems and getting corrected when your mental model is off.
Backing It Up With Math
Here’s where it got really interesting. I told Claude: don’t just describe the problems. Prove them. Back up every challenge with actual math and physics calculations.
I also told it not to try cramming everything into one massive document. Write an overview document first, then create supporting documents for each problem so we could work through them individually.
So Claude started writing Python code to validate all the calculations. I hadn’t planned on that initially, but once it started writing code, I jumped in with my typical guidance. Use a package manager, write tests for all the code.
What we ended up with is a Python module covering about 12 of the hardest engineering challenges for a space elevator. There’s a script that calls into the module, runs all the math, and spits out the results. It’s not a complete formal proof of anything, but it’s a structured way to think through problems where the code can actually catch mistakes in the reasoning.
And it did catch mistakes. That’s the whole point of this approach, you’re using the calculations as a check on the thinking, not just trusting the narrative.
Working Through Problems Together
As we worked through each challenge, I kept asking clarifying questions. What about this edge case? How would we handle that constraint?
It was genuinely collaborative, me bringing curiosity and some engineering intuition, Claude bringing the ability to quickly formalize ideas into code and calculations.
The code isn’t public or anything. But the approach is what I think is worth sharing.
The Hard Part Is Still Hard
My main limiting factor is time. The math looks generally fine to me, but if I really wanted to verify everything thoroughly, I’d need to spend a lot more time with it. A mathematician or physicist who’s deeply familiar with these calculations would be much faster at spotting issues. Providing guidance like, “no, you shouldn’t use this formula here, that approach is wrong.”
I can do that work. It’s just going to take me significantly longer than someone with that specialized background.
This is what I mean when I talk about working with agentic tools on hard problems. It’s not about asking an AI for the answer. It’s about using it as a thinking partner; one that can write code, run calculations, and help you check your reasoning as you go.
For me, that’s the real power of tools like Claude. Not replacing expertise, but amplifying curiosity.
/ AI / Claude / Space / Engineering
-
Voice-to-Text in 2026: The Tools and Models Worth Knowing About
As natural language becomes a bigger part of how we build software, it’s worth looking at the state of transcription models. What’s the best way to get voice to text right now?
For a lot of people, talking to your computer is faster than typing. You can stream-of-thought your way through an idea, prompt your tools, and get things moving without your fingers being the bottleneck. If you haven’t tried it yet, it will change how you work with your machine. I’m not exaggerating.
The Tools
Here’s what people are actually using for desktop voice-to-text:
- Willow Voice — Popular choice, lots of people swear by it
- SuperWhisper — My current pick
- Wispr Flow — Another well-regarded option
- Voice Ink — Worth a look?
- Aiko — From an Open Source dev, Sindre Sorhus
- MacWhisper — Solid Mac-native option
I’ve tried several of these, and the biggest pain point for people is going to be that many require monthly subscriptions. I’ve been happy with SuperWhisper and it is worth mentioning they still have a pay for it once (Lifetime) option, so you don’t get locked into monthly payments forever. That said, Willow Voice and Wispr Flow both have strong followings.
The Models Behind the Magic
Most of these tools started with OpenAI’s Whisper, the voice model released and open-sourced back in 2022. With Whisper, you could run solid transcription locally on your own hardware.
But we’re a few years past that now, and there are some more models to choose from. Here is a summary table of the current state of the transcription models.
---Model Company Released Local Run? Used in Desktop Tools? Best For Whisper Large-v3 OpenAI Nov 2023 Yes Yes (The Standard) Multilingual accuracy (99+ langs) Whisper v3 Turbo OpenAI Oct 2024 Yes Yes (Fast Settings) Best speed-to-accuracy ratio for local use Nova-3 Deepgram Apr 2025 Self-Host Limited (API-based) Real-time agents; handling messy background noise Parakeet TDT 1.1B NVIDIA May 2025 Yes Developer-focused / CLI Ultra-low latency; significantly faster than Whisper SenseVoice-Small Alibaba July 2024 Yes Emerging (Fringe) High-precision Mandarin/English and emotion detection Canary-1B NVIDIA Oct 2025 Yes Developer-focused Beating Whisper on technical jargon & punctuation Voxtral Mini V2 Mistral Feb 2026 Yes Yes (Privacy apps) High-speed local transcription on low-VRAM devices Granite Speech 3.3 IBM Jan 2026 Yes No (Enterprise focus) Reliable technical ASR with an Apache 2.0 license Scribe v2 ElevenLabs Jan 2026 No Via API Extremely lifelike punctuation and speaker labels We’re at an interesting inflection point. You can articulate your thoughts faster by speaking than typing, its becoming a real productivity gain. It’s not just an accessabiltiy aid anymore. People who can type well enough are using these tools on a daily basis.
That’s all for now!
/ Productivity / AI / Tools / Voice
-
Your Context Window Is a Budget — Here's How to Stop Blowing It
If you’re using agentic coding tools like Claude Code, there’s one thing you should know by now: your context window is a budget, and everything you do spends it.
I’ve been thinking about how to manage the budget. As we are learning how to use sub-agents, MCP servers, and all these powerful capabilities we haven’t been thinking enough about the cost of using them. Certainly the dollars and cents matters too if you are using API access, but the raw token budget you burn through in a single session impacts us all regardless. Once it’s gone, compaction kicks in, and it’s kind of a crapshoot on whether it knows how to pick up where we left off on the new session.
Before we talk about what you can do about it, let’s talk about where your tokens go, or primarily are used.
Why Sub-Agents Are Worth It (But Not Free)
Sub-agents are one of the best things to have in agentic coding. The whole idea is that work happens in a separate context window, leaving your primary session clean for orchestration and planning. You stay focused on what needs to change while the sub-agent figures out how.
Sub-agents still burn through your session limits faster than you might expect. There are actually two limits at play here:
- the context window of your main discussion
- the session-level caps on how many exchanges you can have in a given time period.
Sub-agents hit both. They’re still absolutely worth using and working without them isn’t an option, but you need to be aware of the cost.
The MCP Server Problem
MCP servers are another area where things get interesting. They’re genuinely useful for giving agentic tools quick access to external services and data. But if you’ve loaded up a dozen or two of them? You’re paying a tax at the start of every session just to load their metadata and tool definitions. That’s tokens spent before you’ve even asked your first question.
My suspicion, and I haven’t formally benchmarked this, is that we’re headed toward a world where you swap between groups of MCP servers depending on the task at hand. You load the file system tools when you’re coding, the database tools when you’re migrating, and the deployment tools when you’re shipping. Not all of them, all the time.
There’s likley more subtle problems too. When you have overlapping MCP servers that can accomplish similar things, the agent could get confused about which tool to call. It might head down the wrong path, try something that doesn’t work, backtrack, and try something else. Every one of those steps is spending your token budget on nothing productive.
The Usual Suspects
Beyond sub-agents and MCP servers, there are the classic context window killers:
- Web searches that pull back pages of irrelevant results
- Log dumps that flood your context with thousands of lines
- Raw command output that’s 95% noise
- Large file reads when you only needed a few lines
The pattern is the same every time: you need a small slice of data, but the whole thing gets loaded into your context window. You’re paying full price for information you’ll never use.
And here’s the frustrating part — you don’t know what the relevant data is until after you’ve loaded it. It’s a classic catch-22.
Enter Context Mode
Somebody (Mert Köseoğlu - mksglu) built a really clever solution to this problem. It’s available as a Claude Code plugin called context-mode. The core idea is simple: keep raw data out of your context window.
Instead of dumping command output, file contents, or web responses directly into your conversation, context-mode runs everything in a sandbox. Only a printed summary enters your actual context. The raw data gets indexed into a SQLite database with full-text search (FTS5), so you can query it later without reloading it.
It gives Claude a handful of new tools that replace the usual chaining of bash and read calls:
- ctx_execute — Run code in a sandbox. Only your summary enters context.
- ctx_execute_file — Read and process a file without loading the whole thing.
- ctx_fetch_and_index — Fetch a URL and index it for searching, instead of pulling everything into context with WebFetch.
- ctx_search — Search previously indexed content without rerunning commands.
- ctx_batch_execute — Run multiple commands and search them all in one call.
There are also slash commands to check how much context you’ve saved in a session, run diagnostics, and update the plugin.
The approach is smart. All the data lives in a SQLite FTS5 database that you can index and search, surfacing only the relevant pieces when you need them. If you’ve worked with full-text search in libSQL or Turso, you’ll appreciate how well this maps to the problem. It’s the right tool for the job.
The benchmarks are impressive. The author reports overall context savings of around 96%. When you think about how much raw output typically gets dumped into a session, it makes sense. Most of that data was never being used anyway.
What This Means for Your Workflow
I think the broader lesson here is that context management is becoming a first-class concern for anyone doing serious work with agentic tools. It’s not just about having the most powerful model, it’s about using your token budget wisely so you can sustain longer, more complex sessions without hitting the wall.
A few practical takeaways:
- Be intentional about MCP servers. Load what you need, not everything you have.
- Use sub-agents for heavy lifting, but recognize they cost session tokens.
- Avoid dumping raw output into your main context whenever possible.
- Tools like context-mode can dramatically extend how much real work you get done per session.
We’re still early in figuring out the best practices for working with these tools. But managing your context window? That’s one of the things that separates productive sessions from frustrating ones.
Hopefully something here saves you some tokens.
/ AI / Programming / Developer-tools / Claude
-
AI-Powered Process Orchestration Across the Enterprise | Appian
Simplify digital operations with Appian’s agentic automation platform - purpose-built for enterprise growth.
/ AI / links / agent / automation / platform
-
How to Write a Good CLAUDE.md File
Every time you start a new chat session with Claude Code, it’s starting from zero knowledge about your project. It doesn’t know your tech stack, your conventions, or where anything lives. A well-written
CLAUDE.mdfile fixes that by giving Claude the context it needs before it writes a single line of code.This is context engineering, and your
CLAUDE.mdfile is one of the most important pieces of it.Why It Matters
Without a context file, Claude has to discover basic information about your project — what language you’re using, how the CLI works, where tests live, what your preferred patterns are. That discovery process burns tokens and time. A good
CLAUDE.mdfront-loads that knowledge so Claude can get to work immediately.If you haven’t created one yet, you can generate a starter file with the
/initcommand. Claude will analyze your project and produce a reasonable first draft. It’s a solid starting point, but you’ll want to refine it over time.The File Naming Problem
If you’re working on a team where people use different tools: Cursor has its own context file, OpenAI has theirs, and Google has theirs. You can easily end up with three separate context files that all contain slightly different information about the same project. That’s a maintenance headache.
It would be nice if Anthropic made the filename a configuration setting in
settings.json, but as of now they don’t. Some tools like Cursor do let you configure the default context file, so it’s worth checking.My recommendation? Look at what tools people on your team are actually using and try to standardize on one file, maybe two. I’ve had good success with the symlink approach , where you pick your primary file and symlink the others to it. So if
CLAUDE.mdis your default, you can symlinkAGENTS.mdorGEMINI.mdto point at the same file.It’s not perfect, but it beats maintaining three separate files with diverging information.
Keep It Short
Brevity is crucial. Your context file gets loaded into the context window every single session, so every line costs tokens. Eliminate unnecessary adjectives and adverbs. Cut the fluff.
A general rule of thumb that Anthropic recommends is to keep your
CLAUDE.mdunder 200 lines. If you’re over that, it’s time to trim.I recently went through this exercise myself. I had a bunch of Python CLI commands documented in my context file, but most of them I rarely needed Claude to know about.
We don’t need to list every single possible command in the context file. That information is better off in a
docs/folder or your project’s documentation. Just add a line in yourCLAUDE.mdpointing to where that reference lives, so Claude knows where to look when it needs it.Maintain It Regularly
A context file isn’t something you write once and forget about. Review it periodically. As your project evolves, sections become outdated or irrelevant. Remove them. If a section is only useful for a specific type of task, consider moving it out of the main file entirely.
The goal is to keep only the information that’s frequently relevant. Everything else should live somewhere Claude can find it on demand, not somewhere it has to read every single time.
Where to Put It
Something that’s easy to miss: you can put your project-level
CLAUDE.mdin two places../CLAUDE.md(project root)./.claude/CLAUDE.md(inside the.claudedirectory)
A common pattern is to
.gitignorethe.claude/folder. So if you don’t want to check in the context file — maybe it contains personal preferences or local paths — putting it in.claude/is a good option.Rules Files for Large Projects
If your context file is getting too large and you genuinely can’t cut more, you have another option: rules files. These go in the
.claude/rules/directory and act as supplemental context that gets loaded on demand rather than every session.You might have one rule file for style guidelines, another for testing conventions, and another for security requirements. This way, Claude gets the detailed context when it’s relevant without bloating the main file.
Auto Memory: The Alternative Approach
Something you might not be aware of is that Claude Code now has auto memory, where it automatically writes and maintains its own memory files. If you’re using Claude Code frequently and don’t want to manually maintain a context file, auto memory can be a good option.
The key thing to know is that you should generally use one approach or the other. If you’re relying on auto memory, delete the
CLAUDE.mdfile, and vice versa.Auto memory is something I’ll cover in more detail in another post, but it’s worth knowing the feature exists. Just make sure you enable it in your
settings.jsonif you want to try it.Quick Checklist
If you’re writing or revising your
CLAUDE.mdright now, here’s what I’d focus on:- Keep it under 200 lines — move detailed references to docs
- Include your core conventions — package manager, runtime, testing approach
- Document key architecture — how the project is structured, where things live
- Add your preferences — things Claude should always or never do
- Review monthly — cut what’s no longer relevant
- Consider symlinks — if your team uses multiple AI tools
- Use rules files — for detailed, task-specific context
That’s All For Now. 👋
/ AI / Programming / Claude-code / Developer-tools
-
Claude Code Skills vs Plugins: What's the Difference?
If you’ve been building with Claude Code, you’ve probably seen the terms “skill,” “plugin,” and “agent” thrown around. They’re related but distinct concepts, and understanding the difference will help you build better tooling. Let’s focus on skills versus plugins since those two are the most closely related.
Skills: Reusable Slash Commands
Skills are user-invocable slash commands, essentially reusable prompts that run directly in your main conversation. You trigger them with
/skill-nameand they execute inline. They can be workflows or common tasks that are done frequently.Skills can live inside your
.claude/skills/folder, or they can live inside a plugin (where they’re called “commands” instead). Same concept, different home.The important frontmatter you should pay attention to is the
allowed-toolsproperty. This defines which tool calls the skill can access, and there are three formats you can use:- Comma-separated names —
Bash, Read, Grep - Comma-separated with filters —
Bash(gh pr view:*), Bash(gh pr diff:*) - JSON array —
["Bash", "Glob", "Grep"]
I don’t think there’s a meaningful speed difference between them? The filtered format might take slightly longer to parse if you have a huge list, but in practice it’s negligible. Pick whichever is most readable for your use case.
The real power here is that skills can define tool calls and launch subagents. That turns a simple slash command into something that can orchestrate complex workflows.
Plugins: The Full Package
A plugin is a bigger container. It can bundle commands (skills), agents, hooks, and MCP servers together as a single distributable unit. Every plugin needs a
.claude-plugin/plugin.jsonfile; which is just a name, description, and author.Plugins are a good way to bundle agents with skills. If your workflow needs a specialized agent that gets triggered by a slash command, a plugin is a good option for that.
Pushing the Boundaries of Standalone Skills
However, I wanted to experiment with what’s actually possible using standalone skills, so I built upkeep. It turns out that you can bundle actual compiled binaries inside a skill directory and call them from the skill. That opens up a lot of possibilities.
Here’s how I did it:
- The skill has a prerequisite section that checks for a
bin/folder containing the binary - A workflow calls the binary, passing in the commands to run
- Each step defines what we expect back from the binary
You can see the full implementation in the SKILL.md file. It’s a pattern that lets you distribute real functionality, not just prompts, through the skill.
Quick Summary
- Skills are slash commands. Reusable prompts with tool access that run in your conversation.
- Plugins bundle skills, agents, hooks, and MCP servers together with a
plugin.json. - Skills are more flexible than you might expect, you can call subagents, distribute binaries, and build real workflows.
If you’re just getting started, skills are the easier entry point. When you need to package multiple pieces together or distribute agents alongside commands, that’s when you reach for a plugin.
Have fun building!
/ AI / Development / Claude-code
- Comma-separated names —
-
Claude Code Now Has Two Different Security Review Tools
If you’re using Claude Code, you might have noticed that Anthropic has been quietly building out security tooling. There are now two distinct features worth knowing about. They sound similar but do very different things, so let’s break it down.
The /security-review Command
Back in August 2025, Anthropic added a
/security-reviewslash command to Claude Code. This one is focused on reviewing your current changes. Think of it as a security-aware code reviewer for your pull requests. It looks at what you’ve modified and flags potential security issues before you merge.It’s useful, but it’s scoped to your diff. It’s not going to crawl through your entire codebase looking for problems that have been sitting there for months.
The New Repository-Wide Security Scanner
Near the end of February 2026, Anthropic announced something more ambitious: a web-based tool that scans your entire repository and operates more like a security researcher than a linter. This is the thing that will help you identify and fix security issues across your entire codebase.
First we need to look at what already exists to understand why it matters.
SAST tools — Static Application Security Testing. SAST tools analyze your source code without executing it, looking for known vulnerability patterns. They’re great at catching things like SQL injection, hardcoded credentials, or buffer overflows based on pattern matching rules.
If a vulnerability doesn’t match a known pattern, it slips through. SAST tools also tend to generate a lot of false positives, which means teams start ignoring the results.
What Anthropic built is different. Instead of pattern matching, it uses Claude to actually reason about your code the way a security researcher would. It can understand context, follow data flows across files, and identify logical vulnerabilities that a rule-based scanner would never catch. Think things like:
- Authentication bypass through unexpected code paths
- Authorization logic that works in most cases but fails at edge cases
- Business logic flaws that technically “work” but create security holes
- Race conditions that only appear under specific timing
These are the kinds of issues that usually require a human security expert to find or … real attacker.
SAST tools aren’t going away, and you should still use them. They’re fast, they catch the common stuff, and they integrate easily into CI/CD pipelines.
Also the new repository-wide security scanner isn’t out yet, so stick with what you got until it’s ready.
/ DevOps / AI / Claude-code / security
-
Ever wanted your CLAUDE.md to automatically update from your current session before the next compact? There’s a skill for that and it’s been helpful. In case you missed it, here’s a link to the skill:
/ AI / Claude-code
-
Managing Your Context Window in Claude Code
If you’re using Claude Code, there’s a feature you should know about that gives you visibility into how your context window is being used. The
/contextskill breaks everything down so you can see exactly where your tokens are going.Here’s what it shows you:
- System prompt – the base instructions Claude Code operates with
- System tools – the built-in tool definitions
- Custom agents – any specialized agents you’ve configured
- Memory files – your CLAUDE.md files and auto-memory
- Skills – any skills loaded into the session
- Messages – your entire conversation history
Messages is where you have the most control, and it’s also what grows the fastest. Every prompt you send, every response you get back, every file read, every tool output; it all shows up in your message history.
Then there’s the free space, which is what’s left for actual work before a compaction occurs. This is the breathing room Claude Code has to think, generate responses, and use tools.
You’ll also see a buffer amount that’s reserved for auto-compaction. You can’t use this space directly, it’s set aside so Claude Code has enough room to summarize the conversation and hand things off cleanly.
Why This Matters
Understanding your context usage helps you work more efficiently. A few ways to keep your context lean:
- Start fresh sessions for new tasks instead of reusing a long-running one
- Be intentional about file reads — only read what you need, not entire directories
- Use sub-agents — when you delegate work to a sub-agent, it runs in its own context window instead of yours. All those file reads, tool calls, and intermediate reasoning happen over there, and you just get the result back. It’s one of the best ways to preserve your primary context for the work that actually needs it.
- Trim your CLAUDE.md — everything in your memory files loads every session, so keep it tight
I’ll dig into sub-agents more in a future post. For now, don’t forget about
/context/ AI / Claude-code / Developer-tools
-
I published an Agentic Maturity Model on GitHub, a mental framework for thinking about and categorizing AI tools. It’s open to contributions and I’m looking for coauthors.
/ AI / Open-source / Agentic
-
As you’ve probably noticed, something is happening over at Anthropic. They are a spaceship that is beginning to take off.
-
Are We Becoming Architects or Butlers to LLMs?
In a recent viral post , Matt Shumer declares dramatically that we’ve crossed an irreversible threshold. He asserts that the latest AI model…
/ AI / links / automation / reflections
-
Don’t sleep on OpenClaw. There are a ton of people building with it right now who aren’t talking about it yet. The potential is real, and when those projects start surfacing, it’s going to turn heads. Sometimes the most exciting stuff happens quietly before it hits the mainstream.
/ AI / Open-source / Openclaw
-
REPL-Driven Development Is Back (Thanks to AI)
So you’ve heard of TDD. Maybe BDD. But have you heard of RDD?
REPL-driven development. I think most programmers these days don’t work this way. The closest equivalent most people are familiar with is something like Python notebooks—Jupyter or Colab.
But RDD is actually pretty old. Back in the 70s and 80s, Lisp and Smalltalk were basically built around the REPL. You’d write code, run it immediately, see the result, and iterate. The feedback loop was instant.
Then the modern era of software happened. We moved to a file-based workflow, probably stemming from Unix, C, and Java. You write source code in files. There’s often a compilation step. You run the whole thing.
The feedback loop got slower, more disconnected. Some languages we use today like Python, Ruby, JavaScript, PHP include a REPL, but that’s not usually how we develop. We write files, run tests, refresh browsers.
Here’s what’s interesting: AI coding assistants are making these interactive loops relevant again.
The new RDD is natural language as a REPL.
Think about it. The traditional REPL loop was:
- Type code
- System evaluates it
- See the result
- Iterate
The AI-assisted loop is almost identical:
- Type (or speak) your intent in natural language
- AI interprets and generates code
- AI runs it and shows you the result
- Iterate
You describe what you want. The AI writes the code. It executes. You see what happened. If it’s not right, you clarify, and the loop continues.
This feels fundamentally different from the file-based workflow most of us grew up with. You’re not thinking about which file to open, You’re thinking about what you want to happen, and you’re having a conversation until it does.
Of course, this isn’t a perfect analogy. With a traditional REPL, you have more control. You understood exactly what was being evaluated because you wrote it.
>>> while True: ... history.repeat()/ AI / Programming / Development
-
I usually brainstorm spec docs using Gemini or Claude, so if you are like me, this prompt is interesting insight into your software decisions.
Based off our previous chats and the previous documents you've helped me with, provide a detailed summary of all my software decisions and preferences when it comes to building different types of applications./ AI / Development
-
Here’s a tip: if you ask Claude (via an API not Code) to Vibe a typing hacker game make sure to tell it not to return valid exploits. I asked Claude to use actual Python code snippets in the game today and… GitHub’s security scanner was not happy with me. Oopsie doopsie. Lesson learned!
-
Knowledge Without a Knower
How do we define knowledge in the age of AI? Can new knowledge even be created if we’re outsourcing our thinking to the models or the systems we built around the models?
Let’s start with what knowledge actually is. Traditionally, to know something, you have to believe it’s true and have some justification for that belief. It’s implicit knowledge earned through experience, study, or reasoning.
AI doesn’t work that way. To the tools, it’s a probabilistic map of patterns extracted from massive amounts of text. There’s no belief, no understanding in the human sense. It’s knowledge without a knower.
That distinction matters more than we might think.
From Retention to Curation
The way we work with knowledge is shifting. For centuries, the paradigm was retention: memorize facts, write things down, build personal libraries of information.
Now we have tools that can do that for us, often better and faster than we ever could.
So what’s our new role?
Curation.
The skills that matter now are about what we can retrieve, what we can verify, and what we can synthesize.
We don’t need to remember everything, we need to know how to find it, evaluate it, and combine it in useful ways.
The Skills We Actually Need
If we’re not going to be the primary repositories of knowledge anymore, what should we focus on?
Spotting bullshit. This might be the most important skill of the next decade. When the tool outputs something that doesn’t match what we know to be true, can we catch it? AI systems are confident even when they’re wrong. They don’t hedge. They don’t say “I’m not sure about this.” So we need that internal alarm that goes off when something doesn’t add up.
Asking good questions. This has always been important, but it’s now essential. Understanding the problem means knowing where the gaps in your knowledge actually lie. A well-formed question is half the answer. An AI can give you a thousand responses, but only a good question will get you a useful one.
Reasoning about reasoning. How did the system arrive at that answer? What steps did it take? Why does it think that’s the case? We need to be able to trace the logic, not just accept the output. This is meta-cognition applied to our tools.
The Human in the Loop
New knowledge will continue to need humans. Not for the grunt work of data processing or pattern matching, AI can handle that better than we ever could.
Instead our role is to identify the anomalies. We need to become detectives, finding the errors in the data. Skepticism will be extremely valuable in the times ahead.
Being a critical thinker. We need to be able to evaluate the evidence, weigh the pros and cons, and make informed decisions.
In computing, we see error correcting used in the semiconductor industry, and we see a different technique also used in the quantum computing industry. And while reducing the amount of errors in a given system will continue to be important, what we really after here?
Well, the truth, right?
I propose we come up with a new name for truth. I think it should be called “HAT” or a “human accepted truth.”
The aggregate of HATs is what we shall call “knowledge.” Knowledge is the sum of all human accepted truths.
-
The Broken Promise of Reach
AI is changing things we take for granted:
the relationship between effort and reach.For years, the implicit promise of the internet was straightforward.
Put in the work, create something valuable, and you’d find your audience. Maybe not millions, but someone.
The effort you invested had a reasonable correlation to the impact you could achieve. A thoughtful blog post might get shared.
A well-crafted tutorial could help thousands of developers. The work mattered because it reached people who needed it.
That equation is no longer guaranteed.
Now we’re in a world where AI can generate endless content at near-zero cost.
The supply of words, images, and ideas has become functionally infinite.
So, what happens to the value of any individual piece?
- Your carefully researched article drowns in a database of a thousand AI-generated summaries.
- Your authentic ideas are lost in a sea of algorithmic content designed for engagement.
So, why put in the work if the reward isn’t there?
The dream used to be building something sustainable or something big enough to matter. Enough to support yourself while doing work you care about. And it still can be, that, a dream.
We need to return to the act of finding value in the act itself
Don’t let your self-worth depend on metrics decided by a platform.
Your entire creative output shouldn’t be measured in likes, shares, and subscriber counts. That’s when they win.
We can’t hand over the definition of “success” to the algorithms
The old promise of platforms will provide is broken.
We’re not going back.
Now we need to build something new.
What we build is up to us.
/ AI / Indieweb / Publishing