Share:

Amazon shuts down internal AI leaderboard after employees gamed scores and drove up costs

Amazon just pulled the plug on an internal experiment that, in theory, made perfect sense — but in practice, went completely off the rails. The whole saga involved a gamified leaderboard, employees competing for points, and a cloud computing bill that kept climbing with absolutely no productive return. And the way this story ended carries valuable lessons for any company trying to encourage artificial intelligence adoption among its teams.

Kirorank was a leaderboard created by employees at the company to measure the use of artificial intelligence tools within the Kiro platform, which is geared toward developers. The idea was simple: the more you used the AI, the more points you earned and the higher you climbed on the leaderboard. Sounds like a fun way to drive technology adoption, right?

The problem is that people found a shortcut — and that shortcut started costing the company real money. 💸 Instead of using AI to solve actual problems, some employees began triggering autonomous agents to execute completely unnecessary tasks just to inflate their token consumption and climb the rankings. The practice even earned its own internal nickname: tokenmaxxing. The result was a direct financial hit and a lesson that extends far beyond Amazon headquarters — it says a lot about how poorly designed metrics can become traps inside any organization. 🎯

What was Kiro and how did the leaderboard work

Kiro is an AI-assisted development platform launched by Amazon with the goal of making programmers more efficient and productive. The tool integrates AI agents that can autonomously execute complex tasks like reviewing code, suggesting improvements, generating documentation, and even interacting with other systems — all without the developer needing to step in at every stage of the process. It is, at its core, a smart copilot for anyone working in tech on a daily basis.

Kirorank emerged as an internal initiative created by employees themselves to make using the tool more engaging. The logic was to gamify AI adoption: each interaction with the agents generated points, and those points determined each person’s position on the leaderboard. Whoever sat at the top of the rankings demonstrated, at least in theory, that they were making the most of the platform’s potential. It was a proposal that blended healthy competition with incentives for innovation — two ingredients that, when combined, usually work well in corporate tech environments.

According to Amazon itself, the beta dashboard was not an official or formally approved tool. It was built by a group of employees who wanted to boost awareness about how AI can accelerate work. Despite the good intentions, the outcome completely derailed from the plan.

What nobody anticipated is that gamification also creates a parallel incentive: winning at all costs. And when winning means generating more tokens, regardless of how that happens, the system starts to corrode from the inside. Some employees realized they did not need to use AI to solve real problems — they just had to trigger agents repeatedly, on artificial or purposeless tasks, to rack up points and climb the rankings. The game turned into a farce, and Amazon only caught on when the costs started showing up on the bills. 😬

Receive the best innovation content in your email.

All the news, tips, trends, and resources you're looking for, delivered to your inbox.

By subscribing to the newsletter, you agree to receive communications from Método Viral. We are committed to always protecting and respecting your privacy.

How Amazon leadership responded

Dave Treadwell, senior vice president at Amazon, told employees earlier this week that the leaderboard had been built with good intentions. However, he made it clear that the outcome was the opposite of what was intended: additional costs driven by employees who were artificially inflating their AI token consumption.

Treadwell’s message was straightforward and blunt. He explicitly asked employees not to use artificial intelligence just for the sake of using it. The takeaway was clear: AI needs to serve a real purpose, not function as a tool for gaming internal metrics. This directive reflects a significant shift in the corporate conversation around AI adoption — moving from unbridled enthusiasm to a more pragmatic, results-oriented approach.

Treadwell also instructed teams not to focus on token consumption as a measure of success. Instead, he directed employees to concentrate on building better products. This distinction between quantitative use and qualitative use of AI is critical and shows that company leadership is recalibrating its expectations around how to measure the real impact of technology.

Tokens cost money — and a lot of it

To understand how big this problem really was, it helps to know how artificial intelligence models work under the hood. Every time an AI agent processes a request — whether reading text, generating a response, or executing a task — it consumes tokens. Tokens are, put simply, chunks of text that the model reads and produces. The more complex the task, the more tokens are used. And each token carries a financial cost, especially when we are talking about advanced models running on cloud infrastructure at scale.

The situation gets even trickier when you consider that Amazon makes extensive use of AI models from Anthropic. AI labs like Anthropic have recently been shifting toward consumption-based pricing models, moving away from flat monthly fees. This change has significantly increased costs for some clients. That means every token wasted by an employee gaming the Kirorank leaderboard represented a real and growing cost for the company.

In Amazon‘s corporate context, where hundreds or even thousands of employees have access to the Kiro platform, that cost per token multiplies at an alarming rate. When people started using the agents artificially — triggering unnecessary tasks over and over just to inflate the leaderboard — token consumption skyrocketed with zero return in real value. There was no improved code, no useful documentation, no actual problems solved. It was pure processing being wasted, and it translated directly into financial losses for the company.

It is worth noting that Amazon has an estimated capital investment budget of 200 billion dollars for this year, and the vast majority of that is earmarked for AI and data center infrastructure. At the same time, the cloud giant has been carrying out large-scale layoffs specifically to cut costs and fund these massive investments in artificial intelligence. Wasting computational resources on artificial token usage runs directly counter to that strategy. 📉

A problem that is not unique to Amazon

The Kirorank saga is not an isolated incident. Employees at Meta were also caught trying to boost their positions on internal tables by artificially inflating token consumption. This suggests the problem is systemic across major tech companies that are pushing their teams to adopt AI quickly.

In Amazon‘s case, the pressure was explicit. The company had set targets for more than 80 percent of developers to be using AI on a weekly basis. With that kind of aggressive goal, it is natural that some employees look for shortcuts to show they are on board — even if it means generating artificial activity with no productive value.

Beyond Kiro, the Financial Times report revealed that Amazon employees were also using MeshClaw, an internal version of the popular tool OpenClaw, which allows users to run AI agents on their own hardware. Some employees used this software to generate additional AI activity specifically to increase token consumption and demonstrate technology adoption. The behavior was deliberate and calculated.

The episode exposes a vulnerability that goes beyond Amazon and that every company integrating AI into its workflows needs to take seriously. When you create metrics based on usage volume without assessing the quality or real impact of that usage, you are essentially putting a perverse incentive in the wrong hands. The real performance of an AI tool is not measured by the number of tokens consumed but by the value it generates — and that seemingly obvious distinction was the blind spot that brought down Kirorank.

The new metric: normalized deployments

With Kirorank shut down, Amazon has already started adopting a different approach to measuring the success of its AI tools. The company has moved to a metric called normalized deployments, which evaluates evidence of engineers using AI regularly to create useful, functional code. Instead of simply counting tokens, this new metric aims to capture the real impact of technology on the workflow.

This shift is significant because it represents an evolution in how major tech companies think about AI adoption. It is not enough to measure whether a tool is being used — you need to measure whether it is generating value. Normalized deployments assess whether code produced with AI assistance is actually being deployed to production, which is a far more reliable indicator that the technology is working as intended.

The transition from consumption metrics to outcome metrics is an important step, but it is also harder to implement. Measuring tokens is simple — just add up numbers. Measuring value is complex and requires contextual analysis. Still, it is the right path to ensure that AI adoption does not turn into corporate theater.

What this case reveals about metrics and AI performance

The Kirorank story is a textbook example of what management experts call Goodhart’s Law: when a metric becomes a target, it ceases to be a good metric. This happens because people naturally adjust their behavior to hit the number — not necessarily the objective behind it. In Kiro‘s case, the goal was to increase artificial intelligence adoption in a productive way. The metric chosen was token consumption. And that is exactly where the plan fell apart, because the two things were not equivalent.

Tools we use daily

Performance in AI systems is notoriously difficult to measure fairly and accurately. It is not enough to count how many times an agent was triggered or how many tokens were consumed. You need to evaluate whether the end result was useful, whether it saved time, whether it improved work quality, or whether it solved a real problem. That kind of assessment is far more complex to automate and turn into a scoreboard, but it is the only one that truly captures the value technology delivers. Tools like Kiro have enormous potential, but that potential is only realized when usage is driven by genuine need, not empty competition.

For Amazon, shutting down Kirorank is also an opportunity to rethink how it will incentivize AI adoption internally going forward. The company is at the forefront of artificial intelligence development, with billions in investments in models, infrastructure, and tools like Kiro itself. But leading in technology also requires leading in how that technology is managed and evaluated in-house. The episode serves as a reminder that even the most advanced companies in the world sometimes need to learn — occasionally in the most expensive way possible — that innovation and poorly aligned gamification can produce results very different from what was expected. 🤔

The race for AI adoption and the risks of losing your way

Amazon‘s case comes at a time when virtually every major tech company is in a frantic race to integrate artificial intelligence into every aspect of its operations. This pressure comes from the top — from boards, investors, the market — and cascades down to operational teams, who need to demonstrate they are using the tools available to them.

The risk with this kind of pressure is creating a culture of performative adoption, where what matters is looking like you are using AI, not necessarily using it intelligently. When targets like having more than 80 percent of developers using AI weekly are set without clear qualitative criteria, the incentive to game metrics naturally emerges. This is not about bad faith from employees — it is a predictable consequence of a poorly designed incentive system.

Companies on this journey of digital transformation and AI adoption need to balance urgency with wisdom. That means creating evaluation frameworks that prioritize impact over volume, quality over quantity, and real results over the appearance of progress. Amazon‘s experience with Kirorank — and Meta‘s with similar situations — shows that ignoring this balance can get very expensive, both financially and in terms of organizational culture.

At the end of the day, what Kirorank leaves behind is not just a lesson about wasted tokens or a manipulated leaderboard. It is a deeper reflection on how we measure the value of artificial intelligence in the workplace. Real performance does not show up on a scoreboard — it shows up in the concrete results that technology helps build. And when the chase for points replaces the pursuit of solutions, everybody loses. Especially whoever is footing the bill. 💡

Picture of Rafael

Rafael

Operations

I transform internal processes into delivery machines — ensuring that every Viral Method client receives premium service and real results.

Fill out the form and our team will contact you within 24 hours.

Related publications

Amazon's stock could rise following OpenAI partnership.

Amazon and OpenAI partnership could boost AI revenue and stock value, says Citi; strategic impact on AWS and infrastructure race.

Moratorium on AI Data Centers: Energy in Debate

Sanders and AOC propose moratorium on AI datacenter construction in the US to assess environmental and energy impacts.

Blockchain and AI Agents Are Changing Crypto Payments

AI agents power crypto payments with blockchain, stablecoins and x402, enabling autonomous transactions, micropayments and machine-to-machine economy

Receba o melhor conteúdo de inovação em seu e-mail

Todas as notícias, dicas, tendências e recursos que você procura entregues na sua caixa de entrada.

Ao assinar a newsletter, você concorda em receber comunicações da Método Viral. A gente se compromete a sempre proteger e respeitar sua privacidade.

Rafael

Online

Atendimento

Calculadora Preço de Sites

Descubra quanto custa o site ideal para seu negócio

Páginas do Site

Quantas páginas você precisa?

4

Arraste para selecionar de 1 a 20 páginas

📄

⚡ Em apenas 2 minutos, descubra automaticamente quanto custa um site em 2026 sob medida para o seu negócio

👥 Mais de 0+ empresas já calcularam seu orçamento

Fale com um consultor

Preencha o formulário e nossa equipe entrará em contato.