Share:

How OpenAI built a data agent with just two engineers that now serves thousands of employees

Imagine needing to find a specific piece of information in a sea of 70,000 datasets, writing SQL queries by hand, validating table schemas, and then putting together polished charts to present the results. Until recently, that was everyday life for analysts at OpenAI. A simple revenue comparison across geographic regions and customer segments could eat up hours of heavy lifting. Today, that same analyst opens Slack, types a question in plain English, and gets a finished chart back in minutes. The tool behind this transformation was built by just two engineers in three months, had 70% of its code written by artificial intelligence, and is now used daily by thousands of the company’s employees.

In an exclusive interview with VentureBeat, Emma Tang, OpenAI’s data infrastructure lead, pulled back the curtain on the system and explained how it works, where it falls short, and what it signals about the future of enterprise data. The conversation, along with the company’s official blog post about the tool, reveals something every organization will need to face soon: the bottleneck to building smarter companies isn’t better models. It’s better data.

A natural language interface for 600 petabytes of corporate data

To understand why OpenAI decided to build this system, you need to grasp the sheer scale of the problem. The company’s data platform spans more than 600 petabytes spread across 70,000 different datasets. Just finding the right table to answer a question could burn hours of an experienced data scientist’s time. Emma Tang’s Data Platform team, which sits within the company’s infrastructure org and manages big data systems, streaming, and the data tooling layer, supports an impressive internal user base.

OpenAI currently has around 5,000 employees. More than 4,000 of them use data tools provided by Tang’s team. That means over 80% of the company depends directly on this infrastructure on a daily basis.

The agent was built on GPT-5.2 and is available where employees already work: Slack, a web interface, IDEs, the Codex CLI, and the internal ChatGPT app. A user types a question in natural language and gets back charts, interactive dashboards, and full analytical reports. The team estimates each query saves between two and four hours of work. But Tang made a point of highlighting that the hardest gain to measure is also the most important one: the agent gives people access to analyses they simply couldn’t have done before, no matter how much time they had.

Engineers, growth teams, product teams, and even non-technical groups who don’t know all the ins and outs of data systems and table schemas can now pull sophisticated insights on their own 🎯

From revenue comparisons to latency debugging, one agent does it all

Tang walked through several concrete use cases that show just how versatile the system is. OpenAI’s finance team uses the agent to compare revenue across geographic regions and customer segments. In her words, you just send the question as plain text and the agent responds with charts, dashboards, and all kinds of visualizations.

But the real power shows up in multi-step strategic analyses. Tang described a recent case where a user spotted discrepancies between two dashboards tracking Plus subscriber growth. The data agent was able to generate a chart showing, line item by line item, exactly what the differences were. It turned out to be five distinct factors causing the divergence. For a human, that investigation would have taken hours or even days. The agent resolved it in a few minutes.

Product managers use the tool to understand feature adoption. Engineers use it to diagnose performance regressions — asking, for example, whether a specific ChatGPT component actually got slower compared to the day before, and if so, which latency components explain the change. The agent can break it all down and compare against prior periods from a single prompt.

What makes this system especially different is that it operates across organizational boundaries. Most enterprise AI agents out there today live in silos within specific departments — a bot for finance here, another for HR there. OpenAI’s agent cuts horizontally across the entire company. Tang explained that the rollout was done department by department, with curated memory and context for each group, but at a certain point everything converges on the same database. A senior leader can combine sales data with engineering metrics and product analytics in a single query.

Receive the best innovation content in your email.

All the news, tips, trends, and resources you're looking for, delivered to your inbox.

By subscribing to the newsletter, you agree to receive communications from Método Viral. We are committed to always protecting and respecting your privacy.

How Codex solved the hardest problem in enterprise data

Finding the right table among 70,000 datasets is, according to Tang herself, the biggest technical challenge her team faces. And that’s exactly where Codex — OpenAI’s coding agent — plays its most clever role.

Codex serves three functions in the system. First, users access the data agent through Codex via MCP. Second, the team used Codex to generate over 70% of the agent’s own code, enabling two engineers to ship the project in three months. But the third function is the most fascinating from a technical standpoint: a daily, asynchronous process where Codex examines important data tables, analyzes the code of the pipelines feeding those tables, and determines upstream and downstream dependencies, ownership, granularity, join keys, and similar tables for each one.

Tang explained the process in straightforward terms: the team provides a prompt, Codex examines the code and responds with the necessary information, which is then persisted in the database. When a user asks about revenue, the agent searches a vector database to find which tables Codex has already mapped to that concept.

This process, called Codex Enrichment, is one of six context layers the agent uses. The layers range from basic schema metadata and expert-curated descriptions to institutional knowledge extracted from Slack, Google Docs, and Notion, plus a learning memory that stores corrections from previous conversations. When no prior information exists, the agent falls back to live queries against the data warehouse.

The team also ranks historical query patterns by relevance levels. Tang explained that all query history includes a lot of simple commands like select star limit 10, which aren’t really useful. Canonical dashboards and executive reports — where analysts invested significant effort to determine the correct representation of the data — are flagged as source of truth. Everything else gets deprioritized.

The prompt that forces AI to slow down and think before acting

Even with six sophisticated context layers, Tang was remarkably honest about the agent’s biggest behavioral flaw: overconfidence. It’s a problem anyone who has worked with large language models will recognize immediately.

Tang described the scenario like this: what the model tends to do is feel too confident. It says it found the right table and immediately starts generating analyses. And that rushed approach is exactly the wrong path.

The solution came through prompt engineering that forces the agent to spend more time in a discovery phase. Tang explained that the longer the agent spends exploring possible scenarios and comparing which table to use — just investing more time in the discovery phase — the better the results. The prompt works almost like coaching a junior analyst: before rushing off with an analysis, the system is instructed to run more validations on whether that’s really the right table and to check more sources before producing any concrete data.

The team also discovered through rigorous evaluations that less context can actually produce better results. Tang explained that it’s very tempting to dump everything you have and hope the model performs better. But in their evaluations, the team found the opposite. The less information you provide, as long as it’s curated and accurate, the better the agent performs.

To build trust with users, the agent streams its intermediate reasoning in real time, shows which tables it selected and why, and provides direct links to query results. Users can interrupt the agent mid-analysis to redirect it. The system also creates progress checkpoints, allowing work to resume after failures. And at the end of each task, the model evaluates its own performance. The team asks the model how it thinks it did, whether the result was good or bad. And according to Tang, the model is surprisingly good at assessing its own performance.

Simple guardrails that work surprisingly well

When it comes to safety, Tang took a pragmatic approach that might surprise companies expecting sophisticated AI alignment techniques.

Her philosophy is straightforward: sometimes you need guardrails that are even kind of simple. The system has strict access controls. It always uses the employee’s personal token, so each person only has access to exactly what their permissions already allowed before. The agent functions purely as an interface layer, inheriting the same permissions that already govern OpenAI’s data.

The agent never shows up in public channels — only in private channels or the user’s individual interface. Write access is restricted to a temporary test schema that gets periodically wiped and can’t be shared. Tang also emphasized that the system doesn’t have permission to write randomly to other systems.

User feedback closes the loop. Employees flag incorrect results directly, and the team investigates. The model’s self-evaluation adds another layer of verification. Long-term, Tang said the plan is to migrate to a multi-agent architecture, where specialized agents monitor and assist each other. But even in its current state, the system has already come a long way.

Why OpenAI won’t sell this tool, but wants you to build your own

Despite the obvious commercial potential, OpenAI confirmed to VentureBeat that it has no plans to turn its internal data agent into a product. The strategy is to provide the building blocks and let companies create their own solutions. Tang made it clear that everything her team used to build the system is already available externally.

In her words: the team uses the same publicly available APIs — the Responses API, the Evals API. There’s no fine-tuned model. They simply use GPT-5.2. So yes, any company can build something similar.

That message aligns with OpenAI’s broader enterprise strategy. The company launched OpenAI Frontier in early February, a full platform for businesses to build and manage AI agents. Since then, it has partnered with McKinsey, Boston Consulting Group, Accenture, and Capgemini to help sell and implement the platform. AWS and OpenAI are jointly developing a Stateful Runtime Environment for Amazon Bedrock that replicates some of the persistent context capabilities OpenAI built into its data agent. And Apple recently integrated Codex directly into Xcode.

According to data shared with VentureBeat, Codex is used by 95% of OpenAI’s engineers and reviews every pull request before it gets merged. The global weekly active user base has tripled since the beginning of the year, surpassing one million. Overall usage has grown more than five times.

Tang described a shift in how employees use Codex that goes well beyond coding. According to her, Codex is no longer just a coding tool. Non-technical teams use it to organize thoughts, create presentations, and generate daily summaries. One of her engineering managers set up Codex to review her notes every morning, identify the most important tasks, pull Slack messages and DMs, and draft responses. The system is literally operating on her behalf in multiple ways.

The unglamorous prerequisite that will determine who wins the AI agent race

When asked what other companies should learn from OpenAI’s experience, Tang didn’t point to model capabilities or sophisticated prompt engineering. She pointed to something far more mundane.

Tools we use daily

Her words were direct: data governance is extremely important for data agents to work well. The data needs to be clean enough and annotated enough, and there needs to be a source of truth somewhere for the agent to navigate.

The underlying infrastructure — storage, compute, orchestration, and business intelligence layers — wasn’t replaced by the agent. It still needs all of those tools to do its job. But it functions as a fundamentally new entry point for data intelligence, more autonomous and accessible than anything that existed before.

Tang closed the interview with a warning for companies that are hesitating. According to her, organizations that adopt this kind of solution will reap benefits very quickly. And those that don’t will fall behind. The gap between the two groups will widen. Companies that use this technology will move very, very fast.

When asked whether that acceleration worried her own colleagues — especially after a recent wave of layoffs at companies like Block — Tang paused and responded: how much OpenAI can accomplish as a company has accelerated significantly, but that still doesn’t come close to matching the company’s ambitions.

Practical lessons for anyone looking to replicate this model

OpenAI’s case brings valuable takeaways for any organization thinking about building internal data agents. The first and most important is that data quality matters more than model sophistication. There’s no point in having access to the most advanced model on the market if your datasets are disorganized, poorly documented, and lacking a clear source of truth.

The second takeaway is that small teams can deliver massive results when supported by the right AI tools. Two engineers in three months, with 70% of the code generated by artificial intelligence, produced a system that serves thousands of people. That completely redefines what’s possible in terms of development speed.

The third point is about the importance of forcing the agent to slow down. The natural tendency of LLMs is to respond quickly and confidently, even when they should be investigating further. Investing in prompts that encourage a longer, more careful discovery phase can be the difference between useful answers and dangerously wrong ones.

Finally, guardrails don’t need to be complex to be effective. Access control based on personal tokens, write restrictions, operation only in private channels, and continuous user feedback form a simple security layer that actually works in practice. Sophistication can — and should — come over time, but it doesn’t need to be a prerequisite to get started.

We’re entering an era where the role of the data analyst changes dramatically. Less time writing SQL and more time formulating the right questions, interpreting results, and making sure the data infrastructure is healthy. The agent doesn’t replace people, but it profoundly transforms what’s expected of them 🚀

Picture of Rafael

Rafael

Operations

I transform internal processes into delivery machines — ensuring that every Viral Method client receives premium service and real results.

Fill out the form and our team will contact you within 24 hours.

Related publications

Performance and Growth: Nvidia, AI Agents, and Data Centers

Nvidia accelerates revenue with data centers, GB300 NVL72, and Rubin; efficiency and AI Agents demand drive record growth and profit.

AI and Copyright: Supreme Court Denies Copyright Protection for Artistic Creation

Supreme Court rejected the AI-generated art case; in the US only humans can hold authorship — a direct impact on

AI Reveals the Identity of Anonymous Social Media Users

Vulnerable anonymity: how modern AI unmasks social media profiles and why this threatens your online privacy.

Receive the best innovation content in your email.

All the news, tips, trends, and resources you're looking for, delivered to your inbox.

By subscribing to the newsletter, you agree to receive communications from Método Viral. We are committed to always protecting and respecting your privacy.

Rafael

Online

Atendimento

Calculadora Preço de Sites

Descubra quanto custa o site ideal para seu negócio

Páginas do Site

Quantas páginas você precisa?

4

Arraste para selecionar de 1 a 20 páginas

📄

⚡ Em apenas 2 minutos, descubra automaticamente quanto custa um site em 2026 sob medida para o seu negócio

👥 Mais de 0+ empresas já calcularam seu orçamento

Fale com um consultor

Preencha o formulário e nossa equipe entrará em contato.