Claude
-
AI-Assisted vs AI-Agentic Coding
There are two ways to work (c0de) with AI tools right now. I think most people know the other one exists, but they haven’t taken the time to try it. You should know how to do both. And when to do both.
Assisted Mode
Everybody knows this one. You write some code, you get stuck, you ask a question.
How does date parsing work in Python? What’s this function do? Haven’t we built this already? I need some fucking Regex again.
The AI answers. You copy-paste or accept the suggestion. You keep going. You’re driving. The AI is in the passenger seat reading the map.
I mean, this is really useful. I’m not going to pretend it isn’t. It’s also just autocomplete with opinions. Fancy autocomplete. Smart autocomplete.
Great. You’re doing the thinking. You’re deciding what gets built and how to structure it and what order to do things in. You’re just asking for help on some of the blanks. That’s assisted mode.
Agentic Mode
This is different.
You describe what you want. You need to know how to describe what you want.
That is extremely important. Let me say that again. You need to know how to describe what you want.
You need to build an agent that understands how to interpret your description as what you want.
Sometimes it’s going to get it correct and sometimes it’s not. It’s going to go in a different direction than you wanted and you’re going to have to correct it. That’s the job now. You’re reviewing the output, the code, and how it’s producing the code. What are the gaps? You have to find the gaps and improve the agent so that it understands you better.
When I Use Which
I wish I had a clean rule for this. I don’t. That’s the vibes part.
Small or specific things can be assisted. Quick answers. Great. Easy. Move on.
Once you start wanting to touch multiple files, agentic. Major features like commands or parser changes or handler rewrites, recipes or tests. I’m not writing all that by hand. I can describe what I want way better than I can autocomplete it.
Bug fixes? Depends. If I already know where the bug is, assisted. If I don’t, agentic. Let the agent grep around and figure it out. It’s better at reading a whole codebase quickly than I am. Not better at understanding it. Better at reading it.
New features? Almost always agentic. I describe the feature, point it at similar code in the repo, and let it go.
Again, review is super important. Sometimes you have to send it back or start over or change major portions of it. And if you build a system that learns, it’ll get better along the way.
The Review Problem
Switching to agentic mode, your entire job is code review. All day, all the time, constant. That’s the human’s job. Code review.
Are you good at code review? You should get better at it. You need to get better at it.
This is not whether or not the tests pass. You need to identify possible issues and then describe tests that can check for those issues.
The nuanced bugs are the worst. And if those make it to production, you’re going to have problems.
Don’t skim the diff.
That should be the new motto. Read the code. Get better at code comprehension. It’s extremely important. You may be writing less code but you need to sure as shit understand what the code is doing and how it can be bad.
The Hybrid Reality
It’s totally fine to switch between modes depending on what you’re doing or your work session. Agentic can be way more impactful, but assisted mode is way better at helping you understand what the code is doing because you can select code blocks and easily ask questions about it.
So it’s not a toggle, it’s a spectrum. Now isn’t that funny? I’m on the spectrum of agentic development.
Where are you on the spectrum of agentic development?
So Which Is Better?
Neither. Both. It depends. Whatever, just build stuff.
Is assisted mode safer? Really? Like, does the human actually write better code this way? I don’t know. Agentic mode can be faster and you need to be super careful that it’s not gaslighting you into thinking it knows what it’s doing.
Build software for you. And when it makes sense, help out with the community stuff. Support open source.
If you’re a developer, I’d appreciate a follow. You can subscribe with your email below. The emails go out once a week. Or you can find me on Mastodon at @[email protected].
/ AI / Development / Claude / Agents
-
Claude Opus 4.7 Is Here
Anthropic just announced Claude Opus 4.7 yesterday, and here is my take on the new model after reading the blog post and doing a bit of research on their rollout plans from previous models.
What’s New
The headline is a 13% improvement on a 93-task coding benchmark over Opus 4.6. Rakuten’s SWE-Bench saw 3x more production tasks resolved, which is the kind of real-world metric that actually matters. Benchmarks are one thing, but “can it handle my actual codebase” is another.
The big quality-of-life improvement is that Opus 4.7 is better at verifying its own output before telling you it’s done. If you’ve ever had a model confidently hand you broken code and say “there you go,” you know why this matters. It handles long-running tasks with more precision, and the instruction following is noticeably tighter.
There’s also a major vision upgrade. The new model accepts images up to 2,576 pixels on the long edge, which is more than 3x the resolution of previous Claude models. If you’re working with technical diagrams, architecture charts, or screenshots of code, that’s a real improvement.
When Can You Actually Use It?
For enterprise customers, Anthropic says Opus 4.7 is available from your cloud vendor: the API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. But most of us aren’t using the API directly.
As of right now, Opus 4.7 is not yet available in Claude Code or the desktop app. It’s also not showing up in the model picker on claude.ai for Pro plan users. Anthropic’s announcement says “available today across all Claude products,” but that doesn’t seem to have fully rolled out yet for consumer plans.
Looking at previous releases, Opus 4.6 launched on February 5th and was accessible on claude.ai and the API the same day. Historically, Anthropic hasn’t gated new Opus models behind higher tiers, so there’s no reason to think Pro, Max, Team, and Enterprise won’t all get access. The question is just when. If past patterns hold, it should show up within a few days. Keep checking your model picker.
Claude Code Users
As of today, Claude Code on the stable release is still on Opus 4.6. I’m not sure if it’s available on the bleeding edge builds, but for most people it’s not there yet.
The announcement mentions a few Claude Code features coming with 4.7:
/ultrareviewis a new slash command for dedicated code review sessions. Pro and Max users get three free ultrareviews to try it out.- Auto mode has been extended to Max plan users, letting Claude make more decisions autonomously.
- The default effort level is being bumped to
xhigh(a new level betweenhighandmax), which means the model will spend more time reasoning through harder problems.
Once Opus 4.7 does show up in Claude Code, remember to check any custom agents or skills that have a model hardcoded in the frontmatter. If you’ve got
claude-opus-4-6specified in your.claude/commands/directory or agent configurations, those will keep using the old model until you update them.Anthropic also notes that Opus 4.7 follows instructions more literally than previous models. Prompts written for earlier models can sometimes produce unexpected results. So if something feels off after switching, it’s worth re-tuning your prompts.
The Tokenizer and Cost Changes
One thing to be aware of: the tokenizer has been updated. The same input text will produce 1.0 to 1.35x more tokens than before. That means your costs could go up slightly even at the same per-token pricing ($5/million input, $25/million output, unchanged from 4.6). Not a dealbreaker, but worth watching if you’re running high-volume workloads.
Pricing hasn’t changed, the coding improvements look useful, and important to know that the model ID is
claude-opus-4-7. Keep an eye on your model picker over the next few days. -
Agentic Development Trends: What's Changed in Early 2026
I’ve been following the agentic development space around Claude Code and similar tools and the last couple months have been interesting. Here’s what I’m seeing as we move through March and April 2026.
From Solo Agents to Coordinated Teams
The biggest shift is that more people are moving away from trying to build one agent that does everything. Instead, we’re seeing coordinated teams of specialized agents managed by an orchestrator, often running tasks in parallel. I think this is the more proper use of these systems, and it’s great to see the community arriving here.
If you’re curious about the different levels of working with agentic software development, I created an agentic maturity model on GitHub that goes into more detail on this progression.
Long-Running Autonomous Workflows
Early on, agents handled what were essentially one-shot tasks. Now in 2026, agents can be configured to work for days at a time, requiring only strategic oversight at key decision points. Doesn’t that sound fun? You’re still the bottleneck, but at least now you’re a strategic bottleneck.
Graph-Based Orchestration
Frameworks like LangGraph and AutoGen are converging on graph-based state management to handle the complex logic of multi-agent workflows. I think this makes sense when you consider the branching and conditional logic of real-world tasks could map naturally to graphs.
MCP Is Everywhere
MCP (Model Context Protocol) has become the industry standard for tool integration. All vendors fully support it, and there’s no sign of slowing down. Every week there are new MCP servers popping up for connecting agents to different services and tools.
Unified Agentic Stacks
The developer tooling is becoming more consistent. Cursor is becoming more like Claude Code, and Codex is becoming more like Claude Code. Maybe you see a pattern there… might tell you something about who’s setting the pace.
What is also noteable, people are experimenting with using different tools for different parts of the workflow. You might use Cursor to build the interface, Claude Code for the reasoning and main logic, and Codex for specific isolated tasks. Mix and match based on strengths.
Scheduled Agents and Routines
Claude Code recently released routines or scheduled or trigger-based automations that can run 24/7 on cloud infrastructure without needing your laptop. Microsoft with GitHub Copilot are working on similar capabilities? Cursor had something like this a while back too.
Security Gets Serious
Two things happening here. First, people are getting better at leveraging agents for security reviews and monitoring. Tasks that previously required highly specialized InfoSec expertise. You no longer need to be a hacker to find vulnerabilities; you can let your AI try to hack you.
However, the same capabilities that harden defenses can also be used for offensive attacks. We’re seeing a major push for security-first architecture as a requirement for all new applications, specifically to defend against the rise of agentic offensive attacks. Red team and blue team are both getting AI-pilled.
FinOps: Watching the Bill
Last on the list is financial operations. Inference costs now account for over half of AI cloud spending according to recent estimates. Organizations are prioritizing frameworks that offer explicit cost monitoring and cost-per-task alerts. Getting granular about how much you’re spending to solve specific problems and optimizing at the task level. I think that’s pretty interesting and something we’ll see a lot more tooling around.
The common thread across all of these trends is maturity. We’re past the “wow, an AI wrote code” phase and into “how do we make this reliable, secure, and cost-effective at scale.” That’s a good place to be.
/ DevOps / AI / Development / Claude
-
Using Claude to Think Through a Space Elevator
When I say I wanted to understand the engineering problems behind building a space elevator, I mean I really wanted to dig in. Not just read about it. I wanted to work through the challenges, piece by piece, with actual math backing things up.
So I decided to see what Claude and I could do with this kind of problem.
Setting it Up
I have an Obsidian vault that Claude Code/CoWork has access to, and I started by asking it to help me understand the core challenges of building a space elevator. First things first: clearly state all the problems. What are the engineering hurdles? What makes this so hard?
From there, I started asking questions. Could we use an asteroid as the anchor point and manufacture the cable in space? How would we spool enough cable to reach all the way down to Earth? Would it make more sense to build up from the ground, down from orbit, or meet somewhere in the middle?
I’ll admit I made some mistakes along the way. I confused low Earth orbit with geostationary orbit at one point but Claude corrected me and explained the difference. That’s part of what makes this approach work. You’re not just passively reading; you’re actively thinking through problems and getting corrected when your mental model is off.
Backing It Up With Math
Here’s where it got really interesting. I told Claude: don’t just describe the problems. Prove them. Back up every challenge with actual math and physics calculations.
I also told it not to try cramming everything into one massive document. Write an overview document first, then create supporting documents for each problem so we could work through them individually.
So Claude started writing Python code to validate all the calculations. I hadn’t planned on that initially, but once it started writing code, I jumped in with my typical guidance. Use a package manager, write tests for all the code.
What we ended up with is a Python module covering about 12 of the hardest engineering challenges for a space elevator. There’s a script that calls into the module, runs all the math, and spits out the results. It’s not a complete formal proof of anything, but it’s a structured way to think through problems where the code can actually catch mistakes in the reasoning.
And it did catch mistakes. That’s the whole point of this approach, you’re using the calculations as a check on the thinking, not just trusting the narrative.
Working Through Problems Together
As we worked through each challenge, I kept asking clarifying questions. What about this edge case? How would we handle that constraint?
It was genuinely collaborative, me bringing curiosity and some engineering intuition, Claude bringing the ability to quickly formalize ideas into code and calculations.
The code isn’t public or anything. But the approach is what I think is worth sharing.
The Hard Part Is Still Hard
My main limiting factor is time. The math looks generally fine to me, but if I really wanted to verify everything thoroughly, I’d need to spend a lot more time with it. A mathematician or physicist who’s deeply familiar with these calculations would be much faster at spotting issues. Providing guidance like, “no, you shouldn’t use this formula here, that approach is wrong.”
I can do that work. It’s just going to take me significantly longer than someone with that specialized background.
This is what I mean when I talk about working with agentic tools on hard problems. It’s not about asking an AI for the answer. It’s about using it as a thinking partner; one that can write code, run calculations, and help you check your reasoning as you go.
For me, that’s the real power of tools like Claude. Not replacing expertise, but amplifying curiosity.
/ AI / Claude / Space / Engineering
-
How Smart People Are Using Claude Code Skills to Automate Anything
π Build agentic systems that run your business: skool.com/scrapes Don’t miss the next build - www.youtube.com/@simonscr…
/ Programming / Claude / links / code
-
Your Context Window Is a Budget β Here's How to Stop Blowing It
If you’re using agentic coding tools like Claude Code, there’s one thing you should know by now: your context window is a budget, and everything you do spends it.
I’ve been thinking about how to manage the budget. As we are learning how to use sub-agents, MCP servers, and all these powerful capabilities we haven’t been thinking enough about the cost of using them. Certainly the dollars and cents matters too if you are using API access, but the raw token budget you burn through in a single session impacts us all regardless. Once it’s gone, compaction kicks in, and it’s kind of a crapshoot on whether it knows how to pick up where we left off on the new session.
Before we talk about what you can do about it, let’s talk about where your tokens go, or primarily are used.
Why Sub-Agents Are Worth It (But Not Free)
Sub-agents are one of the best things to have in agentic coding. The whole idea is that work happens in a separate context window, leaving your primary session clean for orchestration and planning. You stay focused on what needs to change while the sub-agent figures out how.
Sub-agents still burn through your session limits faster than you might expect. There are actually two limits at play here:
- the context window of your main discussion
- the session-level caps on how many exchanges you can have in a given time period.
Sub-agents hit both. They’re still absolutely worth using and working without them isn’t an option, but you need to be aware of the cost.
The MCP Server Problem
MCP servers are another area where things get interesting. They’re genuinely useful for giving agentic tools quick access to external services and data. But if you’ve loaded up a dozen or two of them? You’re paying a tax at the start of every session just to load their metadata and tool definitions. That’s tokens spent before you’ve even asked your first question.
My suspicion, and I haven’t formally benchmarked this, is that we’re headed toward a world where you swap between groups of MCP servers depending on the task at hand. You load the file system tools when you’re coding, the database tools when you’re migrating, and the deployment tools when you’re shipping. Not all of them, all the time.
There’s likley more subtle problems too. When you have overlapping MCP servers that can accomplish similar things, the agent could get confused about which tool to call. It might head down the wrong path, try something that doesn’t work, backtrack, and try something else. Every one of those steps is spending your token budget on nothing productive.
The Usual Suspects
Beyond sub-agents and MCP servers, there are the classic context window killers:
- Web searches that pull back pages of irrelevant results
- Log dumps that flood your context with thousands of lines
- Raw command output that’s 95% noise
- Large file reads when you only needed a few lines
The pattern is the same every time: you need a small slice of data, but the whole thing gets loaded into your context window. You’re paying full price for information you’ll never use.
And here’s the frustrating part β you don’t know what the relevant data is until after you’ve loaded it. It’s a classic catch-22.
Enter Context Mode
Somebody (Mert KΓΆseoΔlu - mksglu) built a really clever solution to this problem. It’s available as a Claude Code plugin called context-mode. The core idea is simple: keep raw data out of your context window.
Instead of dumping command output, file contents, or web responses directly into your conversation, context-mode runs everything in a sandbox. Only a printed summary enters your actual context. The raw data gets indexed into a SQLite database with full-text search (FTS5), so you can query it later without reloading it.
It gives Claude a handful of new tools that replace the usual chaining of bash and read calls:
- ctx_execute β Run code in a sandbox. Only your summary enters context.
- ctx_execute_file β Read and process a file without loading the whole thing.
- ctx_fetch_and_index β Fetch a URL and index it for searching, instead of pulling everything into context with WebFetch.
- ctx_search β Search previously indexed content without rerunning commands.
- ctx_batch_execute β Run multiple commands and search them all in one call.
There are also slash commands to check how much context you’ve saved in a session, run diagnostics, and update the plugin.
The approach is smart. All the data lives in a SQLite FTS5 database that you can index and search, surfacing only the relevant pieces when you need them. If you’ve worked with full-text search in libSQL or Turso, you’ll appreciate how well this maps to the problem. It’s the right tool for the job.
The benchmarks are impressive. The author reports overall context savings of around 96%. When you think about how much raw output typically gets dumped into a session, it makes sense. Most of that data was never being used anyway.
What This Means for Your Workflow
I think the broader lesson here is that context management is becoming a first-class concern for anyone doing serious work with agentic tools. It’s not just about having the most powerful model, it’s about using your token budget wisely so you can sustain longer, more complex sessions without hitting the wall.
A few practical takeaways:
- Be intentional about MCP servers. Load what you need, not everything you have.
- Use sub-agents for heavy lifting, but recognize they cost session tokens.
- Avoid dumping raw output into your main context whenever possible.
- Tools like context-mode can dramatically extend how much real work you get done per session.
We’re still early in figuring out the best practices for working with these tools. But managing your context window? That’s one of the things that separates productive sessions from frustrating ones.
Hopefully something here saves you some tokens.
/ AI / Programming / Developer-tools / Claude
-
Claudine β A kanban board for Claude Code
Manage all your Claude Code conversations with a visual kanban board. Auto-status detection, full-text search, drag-and-drop, and more.
/ Tools / Claude / links / digital organization
-
As youβve probably noticed, something is happening over at Anthropic. They are a spaceship that is beginning to take off.
-
Here’s a tip: if you ask Claude (via an API not Code) to Vibe a typing hacker game make sure to tell it not to return valid exploits. I asked Claude to use actual Python code snippets in the game today and… GitHub’s security scanner was not happy with me. Oopsie doopsie. Lesson learned!
-
Design engineering for Claude Code. Craft, memory, and enforcement for consistent UI. - Dammyjay93/interface-design
-
Claude Cowork: First Impressions (From the Sidelines)
Claude Cowork released this week, and the concept seems genuinely useful. I think a lot of people are going to love it once they get their hands on it.
Unfortunately, I haven’t been able to get it working yet. Something’s off with my local environment, and I’m not entirely sure what. Claude Desktop sometimes throws up a warning asking if I want to download Node and I usually say no, but this time I said yes. Whether that’s related to my issues, I honestly don’t know. I did submit a bug report though, so hopefully that helps.
Here’s the thing that really impresses me: Anthropic noticed a trend and shipped a major beta feature in about 10 days.
That’s remarkable turnaround for something this substantial. Even if it’s not working perfectly for everyone yet (hi, that’s me), seeing that kind of responsiveness from a company is genuinely exciting.
I’m confident they’ll get it sorted before it leaves beta. These things take time, and beta means beta.
I have explored using the CLI agents outside of pure coding workflows and so I think there’s a lot more flexibility there than you might expect.
For now, I’m watching from the sidelines, waiting for my environment issues to sort themselves out.
-
After two days with Beads, my agent-based workflows feel supercharged. I’ve been experimenting with agents for about a month now, and something clicked. Could be the new Claude Code 2.1 upgrade helping too, but the combination is πππ.
-
How AI Tools Help Founders Code Again: My Experience with Claude Code
From intimidation to empowerment: how AI tools made modern web development accessible again for a founder who hadn’t coded in 15 years
/ Development / Claude / links / code / coding lessons