Conscious AI coding – Quality, security and productivity
In recent years, artificial intelligence applied to software development has become a core component of the production cycle in many organizations.
GitHub Copilot, Cursor and Claude Code are just some of the many tools that countless developers now interact with every day.
The narratives surrounding this adoption have become familiar: AI increases productivity, democratizes access to software development, eliminates repetitive work and gives technical professionals more time to focus on higher-value activities.
These claims are, of course, partly true. The acceleration and quality gains are undeniable in many contexts, such as rapid prototyping, handling boilerplate code and writing unit tests for well-defined functions.
However, all of this is often framed through the lens of hype around a technological revolution. “Local” benefits are frequently extrapolated into global advantages, which the data currently available does not seem to support with the same absolute certainty.
Another aspect that remains somewhat in the background of the mainstream debate — which tends to be dominated by output metrics that are easy to measure and hard to challenge in the short term — is the possible long-term implications of these tools: on code quality, system security, engineering skills and the cognitive well-being of teams.
In this short two-article series, i aim to examine these risks. Not to argue that AI should be abandoned, but to show that the adoption of AI tools needs to be placed within a broader framework of governance, metrics and training, in order to avoid the emergence of systemic fragilities that may only become visible over time.
I will analyze seven dimensions of risk, four of which are covered in this article:
- code quality and maintainability;
- security and vulnerabilities;
- workflow dynamics;
- the productivity narrative;
- cognitive load and team well-being;
- the erosion of engineering skills;
- the deeper implications for developers’ epistemic capacity, meaning their ability to acquire, process, justify and manage knowledge.
In the following sections, we will look at the main operational and architectural risks introduced by the ungoverned adoption of AI-assisted development tools, highlighting how speed and automation can turn into new systemic fragilities.
In my second article, i will explore the cognitive and organizational consequences of the widespread adoption of AI assistants: from the mental load on teams to the erosion of technical skills, all the way to the epistemic implications of progressively delegating reasoning to generative systems.
Quality, maintainability and trust in AI-generated code
The illusion of correctness
LLMs (Large Language Models) are very good at doing what they were trained to do: producing plausible text.
They generate syntactically correct, well-formatted and commented code. This professional appearance lowers the critical threshold of those reviewing it, a phenomenon known as automation bias [1], amplified by the pace of Agile development: a pull request that appears to work tends to go through with less scrutiny than one written from scratch.
The crucial distinction is between plausible code and correct code: a language model generates statistically likely sequences of tokens, not causal reasoning about runtime behavior. It may handle the standard case correctly but fail on edge cases not represented in the prompt, or extrapolate an architectural pattern that was correct in its original context and apply it inappropriately elsewhere.
Maintainability degradation
When every interaction with AI is contextually local — the model sees the prompt and a few context files — there is a risk of a “Frankenstein effect”: a codebase made of pieces that may be individually correct, but lack stylistic and architectural coherence. You cannot expect an LLM to implicitly acquire an understanding of the overall architecture, design decisions or abstractions built by the team.
The resulting technical debt does not emerge in individual commits and is difficult to catch during code review, unless the reviewer has deep knowledge of the entire system. According to Lightrun’s 2026 “State of AI-Powered Engineering” report, 43% of AI-generated code required manual debugging in production, with an increase in redeployment cycles and a reduction in overall observability [2].
Code ownership and invisible technical debt
Writing a function from scratch involves a cognitive process that produces an understanding of both what the code does and why it is structured that way — a resource you can draw on to explain, debug or modify it.
When that same function is generated by an AI assistant and accepted with only minor changes, this understanding does not develop: you are the author of the commit, but not in the cognitive sense of the word.
Code reviews risk drifting in two equally problematic directions:
- accepting AI-generated code almost passively, assuming its reliability without truly understanding it;
- spending much more time verifying it than it took to generate it.
The Amazon case [3] is emblematic: a service outage, partially attributed to changes introduced by an AI assistant, led the company to introduce stricter controls and deliberately slow down deployment.
The significant element is not the incident itself, but the decision to forcibly reintroduce “controlled friction” into a process accelerated by AI. The suggested lesson is that high code-generation speed without adequate control mechanisms can contribute to systemic instability.
Invisible technical debt is the hardest risk to manage, because it produces no immediate signals. Unlike a bug that causes a visible crash, a degraded architecture creates latent fragility that only emerges under stress — traffic spikes, feature changes, scaling — when it has already spread across the entire codebase.
Security, vulnerabilities and the “CVE surge” of AI-generated code
Insecure patterns and attack surface
LLMs are trained on large amounts of publicly available code, which inevitably include vulnerable code and deprecated patterns. These can potentially be replicated when explicit security constraints are not included in the prompt.
In April 2026, the Cloud Security Alliance published an empirical analysis of AI-assisted commits, finding a vulnerability density around 10 times higher than in traditional commits, with a significant increase in CVEs (Common Vulnerabilities and Exposures) attributable to AI-generated code [4].
The issue is partly mathematical: if AI makes it possible to generate code 2, 5 or 10 times faster, but security review capacity does not scale proportionally, the net result is an increase in the attack surface. The report introduces the concept of AI-accelerated security debt: while traditional technical debt accumulates slowly through suboptimal architectural choices, AI-assisted security debt can grow at the same pace as code production.
Responsibility, attribution and the Moonwell case
In many organizations, incident response systems are based on the assumption that someone knows the code involved. When that code has been generated automatically and approved distractedly, this assumption breaks down, and the question of responsibility in the event of a vulnerability remains open.
In March 2026, the DeFi protocol Moonwell lost around $1.78 million [5] in an exploit linked to a bug in the pricing definition of the cbETH token [6]. According to security auditor Pashov, the post-mortem analysis revealed that the vulnerable code [7] had been written with the assistance of Claude Opus, and that the bug concerned an edge case that a developer with full ownership of the code would probably have examined more carefully.
The case is illustrative not because AI “caused” the exploit in a direct sense, but because it shows how the path from “AI-assisted code” to “production vulnerability” is possible even in real systems, with concrete financial consequences.
Workflow and the risks of “vibe coding”
Cycle compression and the collapse of design thinking
One of the most profound effects of AI assistants is the compression of the time between idea and artifact. Vibe coding — a term coined by Andrej Karpathy in February 2025 to describe an approach where the developer interacts with AI through high-level descriptions, accepting the generated code with minimal changes — tends to collapse the traditionally separate phases of problem understanding, design, prototyping, implementation and testing into a single prompt-driven flow.
Forbes, in an analysis of the phenomenon published in April 2026 [8], suggests that this compression bypasses established mechanisms for quality control and engineering judgment. The design phase is normally the moment when teams reason about trade-offs, identify problem constraints, consider edge cases and define system invariants. When it is replaced by a prompt, no one actually performs these steps. The result is not necessarily code that is immediately wrong, but rather new parts added without an understanding of the surrounding constraints — fragility that only emerges as the system evolves.
Local optimization vs. global degradation
Vibe coding can produce local optimization at the cost of global degradation: speed increases visibly within individual sprints, but over longer time horizons the risk is ending up with a system that no one truly understands in its entirety. The deliberate reintroduction of friction into Amazon’s processes [3] is a concrete empirical response to this failure — not Luddism, but an evidence-based correction after observing the harmful consequences of an overly compressed development cycle.
The productivity narrative: metrics and hidden costs
Output vs. outcome
“AI increases developer productivity” is one of the most common claims in the debate around AI adoption. The question is: what exactly does this productivity measure?
Most studies measure lines of code per unit of time, PRs approved per sprint, or the speed of completion of isolated tasks — metrics with two critical limitations: they measure output, not outcome, and they do not account for downstream costs. IBM emphasizes that productivity should be measured end to end, including cost, speed, reasoning and value for users, not just local throughput [9].
A team that generates 40% more code per sprint but accumulates twice as many bugs is not more productive: it is simply shifting costs over time and making them less visible. Data from Uplevel, published in September 2024 [10], showed a provocative result: virtually no net productivity gain despite an increase in pull request volume, accompanied by longer review times and more merge conflicts.
Although that research refers to technologies that are now two years old, and despite other studies showing positive results in specific contexts, it seems fair to say that “more generated code” does not automatically mean “more value produced”.
The “review tax” as hidden cost
Cost and time shift from the developer who produces the code to the reviewer who examines it. In many cases, reviewing AI-generated code is more demanding than reviewing code written by an experienced developer, both because of the amount of code and because it requires checking aspects — semantic correctness, architectural consistency, edge case handling — that in traditional code were implicitly guaranteed by the developer’s understanding. The Lightrun report [2] quantifies this in terms of increased redeployment cycles and debugging time in production: real costs that do not appear in sprint velocity metrics.
Speed
An interesting data point comes from METR (Model Evaluation & Threat Research), which conducted a study on experienced open-source developers in the first quarter of 2025.
Contrary to expectations, AI-assisted developers were not significantly faster on tasks of real-world complexity, and in some cases were slower [11]. The productivity advantage of AI may therefore be concentrated in simple, repetitive tasks — exactly the ones that matter least for creating engineering value. The sample, made up of experienced open-source developers, may not be representative, but the finding is enough to justify skepticism toward broad generalizations about productivity.
For completeness, it should be noted that another study by the same organization [12], published a few months later, reports more optimistic estimates following the spread and evolution of AI-coding tools. At the same time, it also highlights the increased difficulty of obtaining meaningful data due to a “selection effect”: the developers involved chose not to include in the experiment the tasks they would no longer want to complete without AI — an aspect that I find quite significant, and that I discuss in the second article, “Conscious AI Coding – Skills, cognitive load and epistemic fragility” (ADD LINK).
Conclusions
The adoption of AI coding tools is profoundly changing the way software is designed, written and maintained. The benefits in terms of speed and automation are real, but they risk leading to dangerous fragilities when they are assessed solely through throughput metrics or apparent productivity.
Code quality, application security, architectural consistency and workflow sustainability cannot be considered automatic consequences of the acceleration introduced by LLMs. On the contrary, the analysis shows that the faster code generation becomes, the more central governance practices, critical review and conscious design become.
AI does not remove the need to reason about trade-offs. In fact, it risks masking them and making them less visible, driven by the promise of short-term speed. For this reason, introducing “controlled friction” into development processes is not a step backward, but a way to protect technical quality and system stability.
In my second article, i will analyze the less immediate but potentially deeper impacts of AI-assisted development: the cognitive load on teams, the erosion of engineering skills and the epistemic implications of progressively delegating reasoning to AI systems.
References
- Forbes — “Automation Bias: What It Is And How To Overcome It“ (mar.2024) https://www.forbes.com/sites/brycehoffman/2024/03/10/automation-bias-what-it-is-and-how-to-overcome-it/
- Lightrun — “State of AI-Powered Engineering 2026“ (apr.2026) https://www.linkedin.com/posts/lightruntech_weve-just-released-our-2026-state-of-ai-powered-activity-7449817705118887936-ebJj
- Business Insider — “Amazon Tightens Code Guardrails After Outages Rock Retail Business“ (mar.2026) https://www.businessinsider.com/amazon-tightens-code-controls-after-outages-including-one-ai-2026-3
- Cloud Security Alliance — “Vibe Coding’s Security Debt: The AI-Generated CVE Surge“ (apr.2026) https://labs.cloudsecurityalliance.org/research/csa-research-note-ai-generated-code-vulnerability-surge-2026/
- Grafa — “Moonwell loses $1.78M in oracle mishap“ (feb.2026) https://grafa.com/en/news/crypto/moonwell-1-78m-exploit-ai-vibe-coding
- Moonwell Forum — “MIP-X43 cbETH Oracle Incident Summary – Announcements & Updates“ (feb.2026) https://forum.moonwell.fi/t/mip-x43-cbeth-oracle-incident-summary/2068
- Pashov — “Claude Opus 4.6 wrote vulnerable code“ (feb.2026) https://x.com/pashov/status/2023872510077616223
- Forbes — “Vibe coding will break your company“ (apr.2026) https://www.forbes.com/sites/jasonwingard/2026/04/23/vibe-coding-will-break-your-company/
- IBM — “Top 5 tips for measuring the productivity of gen AI in an enterprise“ https://www.ibm.com/think/insights/top-5-tips-measuring-productivity-gen-AI-enterprise
- Uplevel — “Does GenAI Improve Software Developer Productivity?“ (set.2024) https://uplevelteam.com/blog/genai-developers
- METR — “Measuring Early-2025 AI on Experienced OSS Developers Productivity“ (lug.2025) https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
- METR — “Wider adoption of AI has made it more difficult to measure task-level productivity“ (feb.2026) https://metr.org/blog/2026-02-24-uplift-update/