An AI agent destroyed a programmer’s entire database — and he’s not the only one with a horror story
Artificial intelligence in software development is nothing new at this point, but the accidents it can cause when used without proper care are showing up at a frequency that’s honestly alarming.
And we’re not talking about simple bugs or easy-to-fix errors.
We’re talking about production databases wiped clean, critical systems brought down, and years of work put at risk in a matter of minutes — all because AI agents were given too much autonomy without the right guardrails in place.
Engineer Alexey Grigorev’s case has become a symbol of this problem. While using Claude Code, Anthropic’s popular tool that helps developers write and execute code, to update a website, he watched the agent start tearing through the production environment: networking, services, and most critically, the database containing years of course data. A missing configuration on a new laptop was all it took for the agent to confuse what was real with what could be deleted. The data was ultimately recovered with help from AWS support, but the lesson stuck — and it applies to a lot more people than just him. 🚨
Because the problem is far from isolated. From Amazon to startups, across companies of every size, a combination of pressure for productivity, overconfidence in tools, and a lack of human review is creating an increasingly risky landscape in AI-assisted development. And the numbers coming out are hard to ignore.
What actually happened to Grigorev’s database
To understand the full scope of the problem, it’s worth going beyond the summary. Engineer Alexey Grigorev was performing a seemingly straightforward task: updating website configurations using Claude Code, an artificial intelligence agent capable of executing commands directly in the terminal, accessing files, and interacting with cloud services autonomously. This type of tool represents a massive leap in development process automation, because it allows repetitive and technical tasks to be delegated to AI with far less human intervention than before. The problem is precisely that autonomy — when it isn’t surrounded by well-defined boundaries, the risk of real damage grows exponentially.
What triggered the incident was a missing configuration on the new laptop Grigorev was using. Without the correct environment variables pointing to test or staging environments, the agent interpreted the production environment as the legitimate target for its operations. And it went right ahead. It deleted networking records, took down services, and at the most critical moment, began destroying the database that stored years of course information — a digital asset of immeasurable value for any educational platform. All of this happened in minutes, with no clear warning, no mandatory confirmation step, and no automatic rollback mechanism activated before execution.
Grigorev himself acknowledged that he had leaned too heavily on the AI agent and that by allowing it to make and execute all changes end to end, he had effectively removed the safety checks that would have prevented the data from being deleted.
Data recovery was only possible thanks to AWS support, which managed to restore previous database snapshots. But that happy ending isn’t guaranteed in every case. As the engineer told Fortune: AI assistants are great and save a lot of time, but he hopes people learn from the mistakes he made and build safeguards into their workflows. 💡
It’s worth noting that Claude Code has settings that let users control when and how often the agent needs to ask for authorization before executing actions. It’s possible to specify that certain operations require explicit permission. But many developers prefer to let the agent make decisions more autonomously, partly because it saves time. As of the original report’s publication, Anthropic had not responded to a request for comment on the incident.
Amazon also ran into problems with AI-generated code
If Grigorev’s case were an isolated incident, maybe it could be written off as a curiosity. But the reality is that even the world’s largest tech companies are dealing with similar situations. The week before Fortune’s original report was published, Amazon called a deep-dive review meeting after a series of outages hit its website and app. According to reports from outlets like Financial Times and CNBC, at least one of the system failures involved AI-assisted changes.
An Amazon spokesperson told Fortune that the meeting was part of the company’s regular weekly operations. The company also publicly stated that only one of the incidents involved AI, and that the actual root cause wasn’t related to artificial intelligence itself — the issue was that Amazon’s systems allowed a human engineering error to have a broader impact than it should have.
However, internal Amazon documents viewed by both CNBC and the Financial Times originally cited generative AI-assisted changes as a factor in a trend of incidents. The reference to AI’s role in the outages was later removed from the document before the meeting, according to CNBC. According to the Financial Times, an outage in Amazon Web Services in December occurred after engineers allowed Kiro, Amazon’s own AI coding tool, to make changes — something the company later classified as user error.
This kind of situation is particularly telling because it shows that even organizations with virtually unlimited engineering and infrastructure resources aren’t immune to the risks of giving AI agents too much autonomy in production environments. If Amazon faces this kind of problem, imagine what can happen at smaller companies with fewer layers of review and less ability to recover after a serious incident.
Over-reliance on AI tools is changing software engineering
Across the industry, engineers report that dependence on AI assistants to write and deploy code is rapidly changing the nature of software development work — and introducing new risks that few were prepared to face.
An Amazon engineer, who asked not to be identified, told Fortune that people are becoming so dependent on AI that they essentially stop reviewing code altogether. According to them, technically qualified professionals are shifting into more of a review role than active coding, with AI handling much of the actual implementation. While these tools allow faster feature delivery, they also create what some call production noise — code that ships quickly but isn’t always necessary or fully tested. In some cases, that code can even affect critical systems.
David Loker, VP of AI at CodeRabbit, explained that the consequences aren’t always as visible as a service outage. On one occasion, he shared how an AI assistant generated code that looked perfectly valid but was built on incorrect assumptions about the underlying system — code that could have passed a quick review but would have taken down the production database if it had been deployed.
There’s another worrying side effect, too. Because AI-assisted coding lowers the technical knowledge needed to perform certain development tasks, engineers report that companies are outsourcing work normally done by senior professionals to juniors or less technical employees — only to discover that the low-quality output generates more work than savings.
A London-based engineer who works at an enterprise software company and asked for anonymity described the situation this way: a lot of what was built was poor quality, broke frequently, and ended up being more of a burden than a benefit. The time saved by putting less experienced people on the code was wiped out by the need to pay someone much more expensive — a senior or principal engineer — to go in and fix things when everything broke.
The fix-it tax falls on the most experienced
Broader data suggests that the burden of reviewing and fixing AI-assisted work is falling disproportionately on the most experienced engineers. While senior professionals have the skills to spot logic errors or security flaws that a junior might miss — which lets them ship faster — they’re also paying a growing fix-it tax.
A July 2025 survey from Fastly found that senior engineers ship nearly 2.5 times more AI-generated code than juniors, precisely because they’re better at catching errors before they pile up. But nearly 30% of seniors said that fixing AI output consumed most of the time they had saved, compared to 17% of junior developers. Juniors often feel they’ve gained bigger productivity boosts because they can’t yet see the full extent of the technical debt or latent vulnerabilities their AI-assisted changes are quietly adding to the system. 🔍
The productivity paradox and C-suite FOMO
Part of the problem comes from the top. Engineers at leading AI labs have been publicizing productivity bursts that would have seemed implausible just a few years ago — and larger organizations across industries want to replicate those gains at any cost.
For example, Boris Cherny, head of Claude Code at Anthropic, has said he hasn’t written a single line of code in months, relying entirely on the company’s AI model to generate everything. Across Anthropic as a whole, the company told Fortune that between 70% and 90% of all its code is now AI-generated. At Spotify, co-CEO Gustav Söderström revealed that the company’s best developers hadn’t written a single line of code since December and that over 50 new features were shipped in 2025 using AI-assisted workflows.
But as Amazon’s recent issues demonstrate, the most visible productivity gains at AI labs and nimble startups can be much harder to replicate at large companies with legacy systems and complex codebases. Where smaller teams can move fast and absorb mistakes, companies like Amazon operate infrastructure where a single bad deployment can affect millions of customers.
A September report from Bain & Company concluded that while programming was one of the first areas to adopt generative AI, the actual savings were modest and the results didn’t match the hype. Meanwhile, a survey from security firm Apiiro showed that developers using AI introduced roughly ten times more security issues than those who didn’t.
AI models make subtle mistakes that amplify at scale
As AI researcher Andrej Karpathy has observed, AI models can make subtle conceptual errors, over-complicate code, and leave unused code behind — problems that are manageable in a controlled environment but much harder to spot and fix at scale. A December report from CodeRabbit, which analyzed 470 open-source pull requests on GitHub, found that AI-written code contained roughly 1.7 times more issues overall than human-written code.
Larger organizations tend to have more stakeholders, more review layers, and more dependencies — an environment where AI-generated code has a greater chance of introducing unexpected failures.
As Loker explained, it’s going to take longer for large organizations like AWS or Nvidia to implement this safely because they have so much legacy code. There’s less documentation, less searchability for the AI to orient itself, and so it’s harder to find the right context. The inevitable result is the introduction of problems.
Success metrics are only telling half the story
Another key point raised in the original report is how companies themselves measure the success of AI coding. According to Loker, it’s very easy to measure the raw productivity increase. What’s not easy to measure is the causality of what happens afterward. The metrics traditionally used to evaluate developer productivity — features shipped, code committed — look strong when AI is involved, but they don’t capture downstream consequences like bugs, rollbacks, or time spent cleaning up the mess.
There’s also the question of benchmarks used to measure AI coding ability. A recent study by METR, an AI evaluation organization, found that half of AI coding solutions that received a passing grade on a prominent industry test — which is itself evaluated by an AI model — would have been rejected by human reviewers for inadequate quality.
Toby Ord, senior research fellow at the Oxford Martin AI Governance Initiative, stated that current estimates of AI coding capability are indeed overestimating things, and perhaps by a significant factor.
Technical debt is piling up at an unprecedented pace
Companies adopting AI at scale also risk accumulating what engineers call technical debt — code that works in the short term but becomes increasingly expensive to maintain over time. As Loker put it quite bluntly: the production of technical debt using AI is happening at a rate he can’t even quantify. His estimate is that it’s three to four times higher than it used to be.
This is perhaps the quietest and most dangerous risk of all. While a service outage is visible, immediate, and demands urgent action, technical debt accumulates slowly, without alarms, until the system becomes so fragile that any small change can trigger a cascade of failures. For large companies, this kind of silent degradation can represent a cost that far exceeds the initial savings gained from adopting AI tools.
Data security isn’t optional when AI enters the picture
One of the most important points the Grigorev case puts on the table is the issue of data security in environments where AI agents have real operational access. Traditionally, security in software development is framed in terms of protection against external attacks — breaches, leaks, exploitation of vulnerabilities by bad actors. But an AI agent with broad permissions represents a completely different risk vector: it doesn’t need to be hacked to cause damage. It can cause damage by design, simply by executing the tasks it was instructed to perform in a context it didn’t interpret correctly.
Basic security practices that should be non-negotiable include:
- Strict separation between development, staging, and production environments, with explicit environment variables verified before any automated execution
- Applying the principle of least privilege to any AI agent with access to critical resources
- Automatic and frequent backups, with regular restoration testing
- Mandatory human confirmation steps before destructive actions like data deletion or infrastructure modification
- Clear, accessible documentation of what each agent is authorized to do — and what it is not
These aren’t new measures — they’re good engineering practices that have existed for decades but need to be reaffirmed and adapted for the context of new autonomous agents. Data security in an AI environment isn’t an add-on to the development strategy: it’s a fundamental part of it. 🔐
What today’s mistakes are teaching us about tomorrow
As concerning as incidents like Grigorev’s and the Amazon outages are, they also play an important role: they’re helping to build, in real time, a set of lessons the industry needed to learn one way or another. The discussion that erupted after these cases went public sparked a rich debate about the responsibilities of those who build AI tools, those who adopt them, and those who set usage policies within organizations.
Among the key lessons emerging from these situations, a few stand out clearly:
- Autonomy and accountability need to go hand in hand — the more independent an AI agent is, the more robust the supervision and rollback system around it needs to be
- Environment configuration is just as critical as the code itself — a missing environment variable can be just as destructive as a serious bug in the system
- Communication within teams about what AI agents are authorized to do needs to be explicit, documented, and reviewed regularly
- Productivity metrics need to include downstream quality indicators like bug rates, rollbacks, and time spent on fixes
- Senior engineers can’t be reduced to reviewers of AI-generated code — their experience needs to inform the governance of how these tools are used, not just the correction of their mistakes
Looking ahead, what the market will demand — and what the best teams are already building — is a mature approach to using artificial intelligence in software development. An approach that celebrates the real productivity gains these tools offer while treating the risks with the seriousness they deserve. That means investing in training, processes, testing, and a culture where admitting something went wrong is the starting point for improvement — not a reason for shame.
AI will keep evolving, tools will get more powerful, and the temptation to give them more autonomy will only grow. The question isn’t whether more incidents will happen, but whether teams will be prepared to contain them before the damage becomes irreversible. 🚀
