How Do You Actually Review Code an Agent Wrote?
We’ve all had the magic moment by now. You fire up an agent in Claude Code, hand it a bunch of tasks, then let it do its thing. It feels like the future.
Right up until you have to hit git commit.
That’s when the magic curdles into a very specific kind of anxiety: how do I actually review this? I didn’t write it. I barely watched it happen. And now I’m supposed to vouch for it.
The AI Reviewer Trap
The obvious move is to fight fire with fire. An AI wrote it, so pipe the diff into an AI PR review tool and let the machines sort it out, right?
I wrote about this back in May: AI code reviewers won’t save you. Having one LLM grade another LLM’s homework can potentially lead to disaster. They share the same blind spots. They’re trained on the same patterns, so they tend to nod along at the same plausible-looking mistakes, and they’re notoriously bad at catching the subtle, systemic logic flaws that span an entire architecture. The bug that matters is rarely on one line. It’s the interaction between four files, and that’s exactly the kind of thing a second LLM waves through.
So you swing the other way and read it yourself, line by line. Every variable assignment, every branch. And that’s exhausting. Worse, it defeats the entire point of using an agent. If I have to mentally re-type every line the agent produced, I might as well have typed it for real.
So we’re stuck between a reviewer we can’t trust and a review process that erases the speedup. Neither one is the answer.
The Bottleneck Just Moved
Agentic development didn’t make software engineering easier. It moved the hard part.
Writing the implementation used to be the bottleneck. That’s the part the agent is genuinely good at now. What it can’t do for you is tell you the behavior is correct. Verifying the behavior is the whole job now, and that’s a different skill than writing code.
This is why migrating my test suite to Vitest earlier this year has paid off more than I expected at the time. When you’re driving autonomous agents, automated testing stops being a chore you do to keep a coverage badge green. It becomes the only safety net you actually have.
Trust the Spec, Not the Code
In an agentic workflow, my job as the human isn’t to write the function anymore. It’s to write the tests, or at least to rigorously verify the ones the agent proposes.
Think about what that buys you. If I have a comprehensive, fast test suite, I don’t ALWAYS need to read all 400 lines Claude Code just generated. I need to watch the runner light up green. If the tests pass, and the tests are good, the implementation details matter a lot less than they used to. The tests are the contract. The code is just one way to satisfy it.
That second condition is doing a lot of work, though, so I want to be honest about it. “If the tests are good” is the entire game. A passing suite that doesn’t cover the edge cases is worse than no suite, because it hands you false confidence at the exact moment you’ve stopped reading the code. So the scrutiny doesn’t disappear. It relocates. Instead of reviewing the implementation, you review the spec. Are the right behaviors tested? Are the failure modes tested? Did the agent quietly write a test that asserts its own bug?
That’s a much smaller surface to review than 400 lines of implementation, and it’s a far more durable thing to spend your attention on. The tests outlive any single refactor.
TDD Didn’t Die, It Got Promoted
For years people treated test-driven development as a discipline you adopted if you were virtuous and skipped if you were busy. Agentic coding flipped that. It made testing the load-bearing skill, because tests are now the interface between what you want and what the machine builds.
So the answer to “how do I review code an agent wrote” turns out to be: mostly, you don’t. You review what it’s supposed to do, you encode that in tests you trust, and you let the green checkmark tell you whether the agent got there.
I’m curious whether this matches your experience. I’ve found I write more tests now than I did before the agents showed up, not fewer. The implementation got cheap. Being sure it works did not.
I’d appreciate a follow. You can subscribe with your email below. The emails go out once a week, or you can find me on Mastodon at @[email protected].