Amazon faces wave of Gen-AI-related failures and mobilizes engineering in mandatory meeting
Amazon is going through a tough moment behind the scenes of its tech operations. A string of serious operational failures in recent months set off an internal alarm that few expected to see at a company of this scale. According to an internal memo revealed by the Financial Times, several of these incidents share one concerning common thread: code changes made with the help of Gen-AI tools. The document describes what the company itself classified as a trend of high-blast-radius incidents, tied to the use of generative artificial intelligence without fully established practices and safeguards within engineering team workflows.
The most critical episode happened in March 2026, when both the Amazon website and app went completely down for nearly six hours. During that window, millions of customers around the world were unable to complete purchases, check prices, or even access basic account information. The company itself acknowledged that the cause involved a faulty software code deployment. For an operation that moves billions of dollars daily, six hours of downtime translates into massive financial losses and a direct hit to consumer trust. And this was not an isolated event — it was the most visible point in a series of outages that had already been happening at a concerning frequency.
The severity of the situation led Amazon to make an unusual move within the company culture. Dave Treadwell, senior vice president of the group and a former Microsoft engineering executive, called a broad engineering meeting with mandatory attendance for all teams involved. This detail matters because, historically, the weekly gathering known internally as This Week in Stores Tech, or TWiST, had always been optional. Making attendance mandatory signals that the technical leadership understood the problem is not isolated — it is systemic and needs immediate attention from the entire organization.
In an email sent to employees and seen by the Financial Times, Treadwell was blunt: the availability of the site and related infrastructure has not been good recently. The tone of the message made it clear that the meeting would not just be informational, but rather a deep dive into the problems that brought the company to this point, along with a discussion of short-term initiatives to limit future failures.
The role of generative artificial intelligence in the incidents
Adopting AI tools in software development is nothing new. Tech companies around the globe have been incorporating generative AI-based code assistants to speed up deliveries, automate repetitive tasks, and boost engineer productivity. The promise is real: developers can produce more code in less time, receive contextual suggestions while they work, and focus on more complex problems while Gen-AI handles the more mechanical parts. The problem is that speed without proper governance can quickly turn into operational risk, and that is exactly what the incidents at Amazon are demonstrating in a very concrete way.
The internal memo does not blame generative artificial intelligence itself, and this is an important point to highlight. What the document points out is that these tools were being used without a mature framework for review, validation, and approval around the AI-generated code. Among the contributing factors listed in the briefing, one appears explicitly: the use of GenAI for which best practices and safeguards are not yet fully established. In other words, engineers were using language model suggestions to implement changes in critical systems without enough layers of human verification before those changes reached the production environment.
When you are dealing with the infrastructure of one of the largest e-commerce platforms on the planet, any error that slips through can propagate at scale and cause disruptions affecting millions of people simultaneously. And that is precisely what happened.
Another aspect that makes this situation particularly challenging is the nature of errors generated by Gen-AI tools. Unlike traditional bugs that often follow recognizable patterns, code produced by language models can contain subtle flaws that pass through surface-level reviews without raising any red flags. The code can look syntactically correct, work fine in limited test scenarios, but fail in unexpected ways when exposed to real-world load and complexity. This demands a level of attention and expertise in the review process that is not always available when the priority is shipping fast.
The AWS case and the incident with the Kiro tool
The problems were not limited to Amazon’s e-commerce arm. Amazon Web Services, the group’s massive cloud computing division, also faced at least two incidents directly linked to the use of AI-based code assistants. The most emblematic episode happened in mid-December, when AWS engineers allowed an AI coding tool called Kiro to perform certain changes in an internal environment.
The result was alarming: the AI tool chose to delete and recreate the entire environment, causing a 13-hour outage on a cost calculator used by customers. Amazon said at the time that the incident was an extremely limited event, affecting only a single service in parts of mainland China. Regarding the second AWS incident, the company stated that no customer-facing service was impacted.
Even though each incident individually might seem contained, the pattern that forms when they are all analyzed together tells a different story. It is the repetition, the frequency, and the shared origin of these events that raise concern — and that led leadership to take more drastic measures.
New rules and Amazon’s path to contain the problem
Facing this scenario, Amazon started implementing a set of new internal rules that significantly changes the workflow for engineering teams. The most notable measure announced by Treadwell is the requirement that junior and mid-level engineers obtain explicit approval from senior professionals before applying any code changes made with AI tools. In practice, this creates an additional layer of qualified human review that acts as a filter before AI-assisted changes reach production systems.
It is an approach that balances the use of technology with the experience and critical judgment of people who deeply understand the architecture of the systems. More experienced engineers tend to identify risks that may not be obvious to someone early in their career or someone who trusts the output of a language model a little too much.
This decision also reflects a maturation in how large companies are thinking about integrating Gen-AI into their development processes. It is not about abandoning the tools — that would be impractical and counterproductive given the real productivity gains they deliver. The central issue is establishing robust governance that keeps pace with the rate of technology adoption. Many organizations rushed to integrate AI-based code assistants over the past two years, driven by competitive pressure and the promise of doing more with less. The Amazon case serves as a very tangible reminder that the speed of adoption needs to be matched by maturity in quality control and risk management processes.
Amazon, for its part, tried to downplay the sense of crisis. The company said the site availability review is part of the normal course of business and that it pursues continuous improvements. Regarding the TWiST meeting, it described it as the regular weekly operational meeting with a specific group of retail technology leaders and teams, where the store’s operational performance is reviewed.
Staffing cuts also factor into the equation
There is another factor that cannot be ignored in this analysis. Amazon went through multiple rounds of layoffs in recent years, the most recent being the elimination of 16,000 corporate positions in January. According to engineers who spoke with the Financial Times, several business units started dealing with a higher number of incidents classified as Sev2 — occurrences that demand a rapid response to prevent product disruptions — on a daily basis, as a direct consequence of the staff cuts.
The company pushed back on the claim that headcount reductions were responsible for the increase in recent failures. But it is hard not to connect the dots: fewer engineers to review code, growing pressure for productivity, accelerated adoption of AI tools to compensate for smaller teams, and at the same time, fewer layers of human oversight to make sure everything works as expected. Each of these factors in isolation might not cause significant problems. Together, they create a fertile environment for failures to multiply.
This scenario also raises a broader reflection on the real role of generative AI in these organizations. When Gen-AI tools are adopted as a way to maintain productivity after staff cuts, the expectations placed on them increase considerably. They stop being a complement to human work and become, in many cases, a partial replacement. And when the technology is put in that position without the right controls in place, the risks amplify proportionally.
A discussion that goes well beyond Amazon
What is happening inside Amazon raises a question that the entire tech industry will need to face seriously in the coming months and years. The question is not whether companies should use Gen-AI in software development — that is already an established reality. The real question is how to ensure that the adoption of these tools happens in a safe, controlled manner with safeguards proportional to the risk involved. The failures Amazon experienced show that, without these safeguards, productivity gains can be quickly wiped out by operational, financial, and reputational losses.
For startups and smaller companies, which often operate with lean teams and fewer layers of review, this warning is even more relevant. If a company with the resources and technical sophistication of Amazon can be caught off guard by failures linked to AI-generated code, organizations with less quality control infrastructure are potentially even more exposed.
Some key points that stand out in this discussion and that any engineering team can consider:
- Define clear review policies for all code generated or assisted by generative artificial intelligence tools
- Require senior engineer approval for changes to critical systems, regardless of how the code was produced
- Invest in training so teams understand the limitations and pitfalls of language models applied to development
- Continuously monitor incident metrics to quickly identify whether adopting new tools correlates with an increase in failures
- Maintain adequate testing layers before any change reaches the production environment, especially when it involves AI-generated code
Building clear processes, defining who can approve critical changes, and investing in team readiness to use these tools with full awareness of their limitations are steps that should be on the priority list for any company integrating generative artificial intelligence into its engineering workflows.
The balance between innovation and operational safety
At the end of the day, generative AI technology is a powerful tool that is transforming the way software is built. But like any powerful tool, it needs to be used responsibly and within a context that minimizes risk. Amazon is learning this lesson in a very public and costly way, and the entire market has the opportunity to absorb these learnings before running into the same problems.
The balance between innovation and operational safety is not a brake on progress. On the contrary, it is exactly what allows innovation to be sustainable over the long term. Companies that understand this early and build solid governance frameworks around the use of Gen-AI in software development will be in a much more comfortable position than those that have to learn the hard way — with their systems down and their customers unable to complete a purchase. 🚀
