Security and Reliability in AI-Assisted Development

You may not realize it, but AI code generation is fundamentally non-deterministic. It’s probabilistic at its core, it’s predicting code rather than computing it.

And while there’s a lot of orchestration happening between the raw model output and what actually lands in your editor, you can still get wildly different results depending on how you use the tools.

This matters more than most people realize.

Garbage In, Garbage Out (Still True)

The old programming adage applies here with renewed importance. You need to be explicit with these tools. Adding predictability into how you build is crucial.

Some interesting patterns:

  • Specialized agents set up for specific tasks
  • Skills and templates for common operations
  • Orchestrator conversations that plan but don’t implement directly
  • Multiple conversation threads working on the same codebase via Git workspaces

The more structure you provide, the more consistent your output becomes.

The Security Problem

This topic doesn’t get talked about enough. All of our common bugs have snuck into the training data. SQL injection patterns, XSS vulnerabilities, insecure defaults… they’re all in there.

The model can’t always be relied upon to build it correctly the first time. Then there’s the question of trust.

Do you trust your LLM provider?

Is their primary focus on quality and reliable, consistent output? What guardrails exist before the code reaches you? Is the model specialized for coding, or is it a general-purpose model that happens to write code?

These are important engineering questions.

Deterministic Wrappers Around Probabilistic Cores

The more we can put deterministic wrappers around these probabilistic cores, the more consistent the output will be.

So, what does this look like in practice?

Testing is no longer optional. We used to joke that we’d get to testing when we had time. That’s not how it works anymore. Testing is required because it provides feedback to the models. It’s your mechanism for catching problems before they compound.

Testing is your last line of defense against garbage sneaking into the system.

AI-assisted review is essential. The amount of code you can now create has increased dramatically. You need better tools to help you understand all that code. The review step, typically done during a pull request, is now crucial for product development. Not optional. Crucial.

The models need to review itself, or you need a separate review process that catches what the generating step missed.

The Takeaway

We’re in an interesting point in time. These tools can dramatically increase your output, but only if you build the right guardrails around them should we trust the result.

Structure your prompts. Test everything. Review systematically. Trust but verify.

The developers who figure out how to add predictability to unpredictable processes are the ones who’ll who will be shipping features instead of shitting out code.

/ DevOps / AI / Programming