16/04/2026 10 minutos de leituraPor Rafael

Share:

Your Failed Company’s Data Is the New Fuel for Artificial Intelligence

Artificial intelligence is hungry for data, and it just found a pretty unusual source to feed on.

Imagine your company shuts its doors after years of operation. The emails exchanged in a rush before an important meeting, the Slack messages full of inside jokes, the project tickets that documented every win and every frustration your team went through — all of that disappears, right?

Wrong.

That digital trail is getting an entirely new destination: becoming fuel to train the next generation of AIs. That is exactly what happened with cielo24, a transcription and captioning company that recently shut down. Shanna Johnson, the company’s CEO, discovered while working with the startup SimpleClosure — which specializes in helping companies wind down operations — that 13 years of internal communications were worth hundreds of thousands of dollars to AI labs.

SimpleClosure handled all the usual shutdown paperwork: payroll, taxes, investor consents, and documentation with the IRS. But then came the part no entrepreneurship playbook teaches you: selling cielo24’s entire digital footprint — every Slack joke, every Jira ticket, emails documenting wins and frustrations stored in multi-terabyte drives — as training data for the next generation of AI.

Johnson told Forbes that the money from the sale took her from a scenario where she didn’t know how to pay the final bills to a situation where she was able to tie everything up neatly and move on. In her words, it is exciting to think that the company’s data can continue being useful and helping other people, even after closing.

This case isn’t an isolated episode — it’s a signal of an entirely new market emerging from the ashes of defunct companies. 🚀 And it raises some pretty important questions about privacy, anonymization, and what really happens to everything you produce during years of work.

Why the Public Internet Is No Longer Enough to Train AIs

To understand why internal company data has become so valuable, you need to look at what happened with traditional training sources. AI labs started training their models on content available on the public internet — Reddit threads, Wikipedia articles, digitized books. But that material simply ran out. According to former OpenAI chief scientist Ilya Sutskever, that entire public trove was exhausted by the end of 2024.

And there’s more: even when that type of data was abundant, it wasn’t exactly ideal for building what the industry calls agentic AI — models capable of actually executing tasks in the real world, not just answering questions. Public texts are edited, revised, and crafted for external audiences. What happens inside a company is raw, direct, and much closer to how people actually think and express themselves in their daily professional lives.

Receive the best innovation content in your email.

All the news, tips, trends, and resources you're looking for, delivered to your inbox.

By subscribing to the newsletter, you agree to receive communications from Método Viral. We are committed to always protecting and respecting your privacy.

Ali Ansari, whose company micro1 sells a product called Roots to AI labs — essentially a fictional company where AI agents can practice tasks like financial services and complex scheduling — summed up the situation well: the companies developing models are realizing that the noise of real work environments is necessary to test models accurately.

In other words, if you want an AI to know how to work in an office, you need to show it how work actually happens — with all the imperfections, interruptions, and ambiguous contexts that are part of corporate routine. And that kind of data simply doesn’t exist on the open web.

The Hidden Value in Digital Ruins

When a company shuts down, what remains isn’t just debt, office furniture, or expired contracts. There is an invisible asset accumulated over years: conversations, decisions, documented mistakes, creative solutions, and all the human dynamics that happen inside a functioning organization. This type of data is exactly what artificial intelligence labs need most to make their models smarter, more natural, and closer to the way humans actually communicate in professional settings.

cielo24 accumulated more than a decade of internal interactions before closing. These are emails, project threads, process documents, conversations in collaboration tools — all of it represents a layer of real, contextualized, and diverse language that simply doesn’t exist in public sources like the open internet. For learning models, that difference is massive.

What surprised cielo24’s CEO was the speed at which the market reacted once word got out that this workplace data was available. AI labs reached out quickly, well aware that this type of dataset is rare and valuable. The offer that landed on the table — in the hundreds of thousands of dollars — transformed a situation of financial desperation into a dignified and organized closure.

The Corporate Data Gold Rush

SimpleClosure’s CEO, Dori Yona, described the level of interest his company receives from AI companies as insane. According to him, there is a real gold rush feeling among these companies trying to get their hands on real-world data.

To meet this growing demand, SimpleClosure is launching Asset Hub, a platform where companies in the process of shutting down can sell their inventory of code, Slack archives, emails, and other digital assets. Parts of Asset Hub are still in beta, according to Yona, because SimpleClosure removes all personally identifiable information from companies’ internal data — a sensitive and technically difficult process they want to make sure is absolutely solid before scaling up.

Over the past year, SimpleClosure processed around 100 deals on behalf of shuttered companies, recovering more than 1 million dollars for founders. Payouts typically range between 10 thousand and 100 thousand dollars per company.

A competitor, Sunset, also buys data from defunct companies at similar prices. Its CEO, Brendan Mahony, explained to Forbes that pricing depends on the size of the company, its age, and something called data richness — a measure of internal traceability and connections between platforms within the dataset. A Jira ticket linked to a specific code commit, for example, is worth far more than a standalone document. Industries like healthcare and finance also command premium prices due to the complexity and specificity of the data they generate.

Anonymization: The Line Between Useful and Problematic

Naturally, the first question that comes to mind is: what about the employees who produced this data? Do they know their messages and emails might be used to train an AI?

Marc Rotenberg, founder of the Center for AI and Digital Policy, is blunt about it. According to him, even if employees signed clauses assigning intellectual property rights over work materials, that doesn’t settle the question of whether employers can sell internal communications to third parties — especially when employees would never have expected their Slack messages to be repurposed in this way.

Rotenberg considers the privacy concerns quite substantial. He pointed out that employee privacy is a central concern, especially because people have become very dependent on these internal communication tools like Slack. In his view, these aren’t generic data — they are identifiable people.

Rotenberg’s organization sent a letter to the U.S. Senate Commerce Committee asking the FTC to closely examine new business practices involving AI, citing concerns about safeguards for personal data protection.

The anonymization process itself is neither simple nor cheap. It requires specialized technology, human review in many cases, and a clear methodology to ensure no sensitive information slips through. Bobby Samuels, whose company Protege specializes in navigating the regulatory and legal landscape of real-world data, warns that if anonymization isn’t done correctly, there are risks that companies with access to the data could see the activities of specific organizations and individuals. And if not handled with care, that data can leak into model outputs.

Beyond anonymization, there is the risk that a person’s conversations could be literally regurgitated by AI models. A 2020 study by researchers from institutions including OpenAI and Google demonstrated that large language models can memorize sequences from their training datasets verbatim — and those sequences can be extracted with the right prompts. This adds an extra layer of concern over the sale of corporate communications for AI training. 😬

The New Training Gyms for AI Agents

The demand for real corporate data has given rise to an entirely new industry: so-called RL gyms, or reinforcement learning gyms. These are simulated environments built from defunct company data, where AI agents can practice navigating real workplaces.

And we’re talking serious money here. According to The Information, Anthropic is considering spending up to 1 billion dollars on RL gyms this year. There are already around 50 emerging startups in this space, plus data labeling companies like Mercor and micro1 — which traditionally make money paying humans to generate training data — jumping into the game.

Some of these startups are already reaching impressive valuations. Prime Intellect surpassed 1 billion dollars in valuation, according to a source familiar with the matter. Fleet is in talks to raise funding at a 750 million dollar valuation, also according to The Information.

A company called AfterQuery sells a series of ready-to-use worlds for AI labs, with names like Big Tech World, Finance World, and Tax World. In these environments, an AI agent practices navigating a digital office, interacting with simulated user agents, and learning to solve real-world problems.

Tools we use daily

One example task feels like the most tedious middle-management routine: the agent is instructed to plan a birthday party for a coworker named Bob. But without the agent knowing, another coworker is already planning the same party. To make matters worse, the agent has forgotten when Bob’s birthday is. To succeed, it needs to message other employees, do some detective work, and then decide whether to join forces with the other organizer or scrap the original plan. 🎂

A New Market Rising From the Ashes

The cielo24 case isn’t unique, and that’s the most telling part of this whole story. Today there is an emerging ecosystem of companies that specialize in identifying organizations that are winding down, negotiating the acquisition of their workplace data, and preparing those datasets for sale to artificial intelligence labs. It is a business model born directly from the explosion in demand for high-quality training data — a demand the public internet can no longer meet on its own.

For defunct companies, or rather for their founders, investors, and creditors, this market represents an unexpected opportunity to recover some value from an asset that, until recently, would have simply been discarded. A startup that didn’t survive market pressures can still leave a financial legacy through the sale of its communication and internal operations history. This changes how entrepreneurs and investors think about shutting down a company — the data accumulated over the years stops being a storage cost and becomes part of the asset inventory to be liquidated.

From the perspective of learning models, the impact is equally significant. Real corporate data, with all its complexity, abbreviations, industry-specific jargon, and natural variation in tone and context, enriches models in a way that web-scraped text simply can’t replicate. The difference between a model trained only on public content and one that has also absorbed years of internal communications from real companies can be felt in tasks like drafting professional emails, AI-assisted project management, and any application where corporate context matters. 🤖

What This Means for Anyone Who Works With Data

If you work in tech or closely follow the development of artificial intelligence, this movement brings reflections that go beyond curiosity about a single case. The central question is: what do we do with the digital trail we leave behind in professional settings? For years, the default answer was that this data stayed on company servers, accessible only internally, and vanished when the organization ceased to exist. That answer is no longer adequate.

The emergence of this market for workplace data from defunct companies highlights the need for clearer policies on ownership and the fate of data generated by employees. Who owns an email sent by an employee using the company’s corporate account? The legal answer varies by jurisdiction, but the practical answer — at least for now — seems to be leaning toward the organizations and whoever acquires or liquidates them. That could change as regulators pay more attention to this phenomenon, and there are good reasons to believe that attention is coming.

For anyone who develops or uses solutions based on artificial intelligence, understanding where the training data for the models you use comes from is increasingly relevant. Data provenance affects the quality, ethics, and legal compliance of any AI system.

And thinking about it from another angle, maybe those hours you thought you were wasting on Slack could end up being the most lasting work you ever did. Unless, of course, the AI model — having memorized your data a little too well — accidentally reveals to the next generation of office coworkers that it was you who forgot Bob’s birthday. 💡

Picture of Rafael

Rafael

Operations

I transform internal processes into delivery machines — ensuring that every Viral Method client receives premium service and real results.

Fill out the form and our team will contact you within 24 hours.

Related publications

Amazon's stock could rise following OpenAI partnership.

Amazon and OpenAI partnership could boost AI revenue and stock value, says Citi; strategic impact on AWS and infrastructure race.

Moratorium on AI Data Centers: Energy in Debate

Sanders and AOC propose moratorium on AI datacenter construction in the US to assess environmental and energy impacts.

Blockchain and AI Agents Are Changing Crypto Payments

AI agents power crypto payments with blockchain, stablecoins and x402, enabling autonomous transactions, micropayments and machine-to-machine economy

Receba o melhor conteúdo de inovação em seu e-mail

Todas as notícias, dicas, tendências e recursos que você procura entregues na sua caixa de entrada.

Ao assinar a newsletter, você concorda em receber comunicações da Método Viral. A gente se compromete a sempre proteger e respeitar sua privacidade.

Rafael

Online

Atendimento

Website Pricing Calculator

Find out how much the ideal website for your business costs

Website Pages

How many pages do you need?

Drag to select from 1 to 20 pages

In just 2 minutes, automatically find out how much a custom website for your business costs

More than 0+ companies have already calculated their quote

Fale com um consultor

Preencha o formulário e nossa equipe entrará em contato.