Skip to main content
Post categories: Learn

AI coding assistant: when to delegate and when to collaborate

22 June 2026 - 7 minutes reading

I have a project where I haven’t opened the code once since AI rewrote it. And another where I review every single line AI generates. Same AI tools, same developer, radically different approaches.

The first is a swimming relay optimizer I built years ago: a command-line tool that processed Excel files to generate optimal team combinations. It had been sitting untouched for years, abandoned for lack of time rather than lack of interest, and AI was what made me think it was worth reviving. I fed the entire codebase to an AI model and asked it to modernize it, add a web UI, make it configurable, publish it as a web app instead of a clunky executable. I tested the final result, verified the features worked, and shipped it. I’ve never looked at the code since, and I genuinely couldn’t tell you what frontend framework it picked.

The second is a legacy C# system I work on professionally, a 15-20 year old machine control software that’s been touched by dozens of developers, has minimal documentation, zero unit tests, and is full of client-specific customizations. When AI generates code here, I scrutinize every line, checking that it does exactly what I asked, that no edge cases are missing, and that it follows our existing patterns. I also spend significantly more time thinking before I even write the prompt.

Same tools. Wildly different contexts.

Two parallel worlds

The swimming project is my personal playground: the goal is speed and exploration, and the stakes are low enough that if something breaks, I just restart. The codebase is clean, the constraints are clear, and I’m the only one affected by anything that goes wrong. In this environment, I let AI operate as an autopilot: I describe what I want, it generates the code, I verify it works, and I move on. The code itself is a black box I never feel the need to open.

The professional project is a different beast. It’s a minefield of legacy patterns, undocumented assumptions, and business logic that evolved over two decades. The goal isn’t speed, it’s stability and maintainability, and the cost of getting it wrong is real: bugs affect production systems and real customers, and there are no unit tests to catch regressions automatically, no documentation worth mentioning. In this environment, AI becomes a copilot: I stay in control, iterate constantly, and treat it as a thinking partner rather than a generator I accept without question.

The key difference isn’t complexity, because I use AI for complex problems in both contexts: architecture decisions, business logic implementation, debugging. What changes is how I use it, and the fact that in both cases AI is still writing the code. What differs is how much I trust that code before it ships.

The real problem: validation, not trust

Most discussions about AI coding assistants focus on the wrong question: “Is AI reliable?” or “Can I trust AI?”

The better question is:

“How do I validate what AI produces?”

The problem isn’t that AI is unreliable, it’s that in many contexts its output is simply hard to validate, and the gap between those two situations is enormous. Easy validation makes AI incredibly powerful. Hard validation makes it genuinely risky.

When validation is easy

In my personal projects, validation is straightforward. I run the code, and if something’s wrong I see it immediately. There are no hidden dependencies, no undocumented edge cases, because the codebase is clean and I built it from scratch. When you also have good test coverage, it gets even easier: AI generates code, you run the test suite, and when the tests go green you’re done; when they don’t, you know exactly where to look. It’s about as reliable a feedback loop as you can ask for.

Legacy code without tests and without documentation is a different story entirely.

When validation becomes difficult

Here’s a real scenario from the C# project: I asked AI to analyze an old section of code, understand what it did, and port the functionality to the new architecture. The code it produced looked complete, compiled cleanly, and ran without errors, so at first I assumed the job was done. But when I carefully compared the new implementation against the original, I noticed pieces were missing: not major features, but subtle edge case handling that wasn’t obvious from a quick read, the kind of thing that only surfaces when you’re actively looking for it. The AI confirmed the gaps only after I pointed them out, and then implemented what was missing.

This wasn’t the AI being unreliable, it was me not having an easy way to validate completeness. Without tests that covered those edge cases, without documentation that listed all the requirements, how would I know something was missing? Manual testing can only cover so much, and code review helps only if you deeply understand the context, which in legacy systems is often impossible.

The hidden cost of plausible code

The AI also did something that surprised me: when I pointed out mistakes, it sometimes patched around the problem rather than fixing it properly. Instead of removing incorrect code, it would add layers on top to make things work. This wasn’t malicious, it was just the path of least resistance, but without careful review those patches accumulate silently and you end up with code that works for the wrong reasons. That’s a kind of debt that’s particularly hard to pay back later, because it doesn’t look broken.

Why risk grows exponentially

These factors don’t just add up, they multiply each other. In a legacy codebase with no tests and no documentation, you have almost no way to verify whether what AI produced is actually complete and correct, and when the underlying logic is also business-critical, a silent error can cost real money. That’s what makes accepting AI output without scrutiny genuinely risky in that context: not because AI is bad at its job, but because you have no reliable way to check that it did the job completely.

The practical rule I’ve developed: the more critical the code, and the harder it is to validate, the less I delegate to AI.

From autopilot to copilot

The interaction pattern changes completely between these two contexts.

In personal projects, it looks something like: “Modernize this codebase and add a web UI”. I accept the output, test that the features work, and ship it. If AI generates something unexpected, I either accept it or ask for a revision, but I’m not reading the implementation either way.

At work, the same task turns into a conversation: I ask AI to explain how the legacy module works, then review its analysis against the actual code. I ask what edge cases the code handles, then verify the answer myself. I ask it to implement the new version preserving specific behaviors, then read through the result line by line and push back on anything that doesn’t look right.

None of this means I’m writing the code myself: I’m still asking AI to generate everything, but I stay in the loop at every step, checking the work before we move forward rather than accepting it wholesale.

The shift is from “do this for me” to “help me understand this, then let’s look at what you built before we move on”.

I use AI to analyze legacy code I’m trying to understand, to explain cryptic error logs during debugging, to explore multiple architectural approaches before committing to one, and to generate code that I then review and push back on when something looks off. I stay in the driver’s seat while AI acts as a navigator with a good map that doesn’t know the local road conditions: the client-specific quirks, the undocumented assumptions, the historical reasons things were done a certain way.

What i’ve learned

Context matters more than capability: the AI doesn’t change between my projects, but my approach to using it does.

The critical factor isn’t whether the task is complex, it’s whether I can validate the output easily. Before prompting, I now ask myself: do I have test coverage? Is this standard boilerplate or business-critical logic? Am I working with clean code or a legacy system with no documentation? The answers determine how much I delegate, and more importantly, how carefully I review what comes back.

I use AI coding assistants constantly, for simple boilerplate and complex business logic alike, and they’re genuinely powerful. But that power only works in your favor when you know what you can safely accept and what needs a second look: sometimes that means trusting the output and shipping; sometimes it means treating every generated line as a draft that needs review before it’s actually code.

I still don’t know what frontend framework the swimming app uses, and I don’t need to. At work, I can tell you exactly why every function is structured the way it is. Same tool, same developer. Knowing which situation you’re in, and adjusting accordingly, is most of what this is about.

Article written by