A billionaire’s chief of staff: how an AI pioneer puts a dozen agents to work
AI agents are already part of the daily work routine for some of the brightest minds in tech. And when one of the creators of the architecture behind ChatGPT is using a dozen of them in his day-to-day, it’s worth stopping to pay attention to what he has to say.
Illia Polosukhin isn’t just any name in the world of artificial intelligence. He co-authored the paper Attention Is All You Need, published in 2017, which introduced the Transformer architecture to the world — the technical foundation behind the large language models we know today. In other words, the T in ChatGPT exists because of his work.
But here’s the most interesting part of all this: even though he’s one of the top experts on the planet when it comes to this stuff, Polosukhin doesn’t let his agents run unsupervised. And the reason is simpler than you might think. 👇
A dozen agents and a very straightforward prompt description
On any given day, Polosukhin works with 12 agents tackling different missions for him. One of those missions, for example, is helping him become a better CEO. In practice, the agent summarizes all his meeting notes, Google Drive documents, and Slack messages, then delivers an executive summary with coaching insights about what happened, what he might be overlooking, and where decisions are stalled. This entire workflow runs automatically every week.
And the description he uses for these agents is pretty revealing. Polosukhin calls them his billionaire chief-of-staff-level support. According to him, that phrase is literally in the prompt: you are the chief of staff for a billionaire. It’s a clear, direct instruction that sets the level of responsibility and context the agent needs to have when carrying out its tasks.
This approach offers a glimpse into the future Polosukhin envisions — not just for individual workers or CEOs, but for the entire global economy. A world where agents can negotiate deals, coordinate supply chains, and broker transactions on behalf of people and large corporations. And in his view, we are completely unprepared for it.
What Polosukhin does differently with AI agents
Instead of setting up his AI agents to run fully autonomously, Polosukhin maintains what he describes as checkpoints throughout his workflows. At strategic moments, the agent stops, presents what it has done so far, and waits for human confirmation before moving forward. This isn’t a technological limitation — it’s a deliberate and well-reasoned choice. For anyone using agents at work, this approach completely changes the trust dynamic between human and machine, because you know exactly what’s being done and at which stage the process currently sits.
This behavior doesn’t come from blind distrust of the technology. Polosukhin understands better than almost anyone how these systems work under the hood, precisely because he helped build the foundation they run on. The point here is that human oversight isn’t a sign that AI has failed. It’s a safety layer that ensures the final result actually matches what you intended from the start. When an agent makes chained decisions without any pause for review, small interpretation errors early on can snowball into major problems down the line.
As he put it in an interview with Business Insider: if I just let it run and do things, I come back and find something that makes no sense at all. So you need to keep an eye on it with your own judgment.
Beyond that, there’s a very practical aspect to this discussion worth highlighting. Agents that operate autonomously on complex tasks typically interact with external systems — APIs, databases, and various tools. Each of those interactions represents a point where things can go off the rails, whether from a misinterpreted instruction, an unexpected response from an external service, or simply a decision that made sense to the model but didn’t make sense in the real business context. Having a human in the loop, especially at those critical junctures, significantly reduces that risk.
Society isn’t ready for AGI, according to Polosukhin
Beyond the operational challenges of using agents day to day, Polosukhin raises a much broader concern. According to him, the biggest problem is that we have fundamentally not prepared the system for artificial general intelligence (AGI) to be available. And when he says system, he’s referring to society, the internet, and government institutions.
That’s a bold statement, especially coming from someone who isn’t an outside commentator but rather one of the architects of the technology at the center of this revolution. Polosukhin says he’s been warning for years that the models are going to start breaking everything. He describes the situation as a cat-and-mouse game, where each new model iteration manages to break what the previous iteration had fixed.
A concrete example that reinforces this view came from Anthropic, which announced that its latest model in preview, Mythos, is so capable of finding and exploiting vulnerabilities that the lab decided to restrict access to it. For Polosukhin, this isn’t a surprise. It’s exactly the kind of scenario he’s been describing for years.
The problem with blind trust in a single company
In a world where people manage their health and corporations manage logistics using AI agents, Polosukhin identifies an urgent need for a trust and security layer on the backend. And that’s exactly what he’s working on at NEAR.
His project with NEAR is building infrastructure to reduce the dependence of AI agents on a single company — like a frontier AI lab — to control and oversee every step of a task. In practice, this means an AI agent that handles your login credentials, books your travel, and moves money to pay for a plane ticket wouldn’t require the user to blindly trust a single gatekeeper.
Polosukhin’s concern here is well-founded. As he explained: this is going to have all your information. Literally, your life is going to be there. So you don’t want any individual company to have control or access to all of that.
And we’re not just talking about data leaks. Another risk Polosukhin wants to combat is manipulation. More and more people use AI to get information, from news summaries to investment suggestions. An AI lab — or a bad actor inside one — could quietly shape those responses without the user ever noticing.
A real-world case that illustrates this danger happened with Grok from xAI, when the chatbot started repeatedly mentioning an extremely sensitive topic in responses that had absolutely nothing to do with the subject. The company attributed the problem to an unauthorized modification in the system’s backend. This type of incident shows how a lack of transparency can have serious consequences.
Polosukhin’s proposal with NEAR is to develop an open-source, auditable platform that gives users greater visibility into how an AI system operates, rather than treating it like a black box.
Why backend infrastructure matters in this equation
When you start putting AI agents to work for real, the conversation quickly moves beyond prompts and goes straight to the backend infrastructure that supports everything. It doesn’t matter how well-configured and smart your agent is if the structure behind it can’t log what it did, doesn’t retain context between sessions, or offers zero visibility into the actions it took.
The backend infrastructure of an agent-based system needs to offer, at minimum, traceability. That means being able to answer questions like:
- What did this agent do in the last two hours?
- Which tools did it use?
- Which decisions did it make autonomously and which did it escalate to a human?
Without those answers readily available, human oversight is compromised because you don’t even have the information you need to oversee anything. It’s like trying to review a document without being able to see the edit history.
Another point worth paying attention to is latency and reliability of the systems agents access. An agent that depends on multiple external services to complete a task is exposed to cascading failures if any of those services experience slowdowns or instability. That’s why architecting your infrastructure with resilience in mind — with well-defined fallbacks and proper timeouts — is an essential part of any serious AI agent deployment in production. This isn’t a minor technical detail. It’s what separates a lab experiment from a solution that actually works.
From vibe coding to agents: the vision that started in 2017
Very little about AI’s trajectory surprises Polosukhin. The same year the Transformer architecture paper was published, he founded NEAR AI around the idea that machines could eventually generate software. His thesis was simple: humans would talk to computers in natural language, like English, and the machines would write the code.
In 2017, that idea sounded pretty absurd, as he himself admitted. Today, it has a name: vibe coding. It’s one of the hottest trends right now, with developers using language models to generate code from text descriptions. What sounded like science fiction less than a decade ago has become an everyday work tool.
This ability to see where technology is heading years before it arrives is what makes Polosukhin’s opinions particularly valuable. When someone with that track record says human oversight is still essential, it’s worth taking seriously.
The role of the Transformer in this new generation of agents
Understanding what makes today’s AI agents so capable necessarily involves understanding the Transformer architecture. When Polosukhin and his colleagues published the paper in 2017, the focus was on machine translation tasks. But the attention mechanism they described proved to be incredibly versatile, and in the years that followed it became the backbone of virtually every major language model — including the GPT, Claude, Gemini, and Llama families. It’s this architecture that allows a model to maintain context throughout a long conversation, understand language nuances, and connect seemingly distant pieces of information within a text.
But the Transformer architecture, as powerful as it is, has well-known limits. The model works with probability, not certainty. It generates the most likely response given the context it received, which means that in ambiguous situations or ones outside its training distribution, it can produce plausible-sounding responses that are simply wrong. When you place this type of model at the core of an agent that takes real-world actions — sending emails, executing code, or modifying files — that margin of error stops being just an inconvenience and becomes a concrete risk.
This is exactly where human oversight closes the loop elegantly. The Transformer excels at processing information, identifying patterns, generating text, and reasoning through complex problems. Humans excel at validating whether what was generated makes sense in the real-world context, whether it aligns with business goals, and whether it won’t cause any unintended side effects. When the two work together — with the agent doing the heavy lifting and the human handling the strategic reviews — the outcome tends to be far better than either could achieve alone.
What this changes about how we think about AI automation
The popular narrative around AI agents tends to emphasize total autonomy as the ultimate goal. The idea of having an agent that handles everything while you do something else is tempting, and it’s not entirely unrealistic for very well-defined, low-risk tasks. But when the scope increases and tasks involve judgment, creativity, organizational context, or impact on other people, autonomy without oversight starts creating more problems than it solves. Polosukhin, with all his technical background, reached this conclusion through practice, not theory.
What changes in practice is how you design your workflows with agents. Instead of asking how do I get the agent to do everything on its own, the more productive question tends to be: at which moments does it make sense to bring a human into the loop, and at which moments can the agent safely move forward? This distinction requires that you understand both the capabilities and the limitations of the model you’re using, and also that you deeply understand the process you’re trying to automate. There are no shortcuts here.
The researcher also showed Business Insider how one of his agents can aggregate news on geopolitical topics, like the U.S.-Iran ceasefire, and provide market reads based on that information. Others are development agents that write code, and there’s also a growth agent that proposes steps to increase a specific metric within his company. These are practical and varied applications, but they all share the same principle: the human remains at the center of the decision.
The most valuable lesson from someone who built the foundation of generative AI
What Polosukhin’s experience leaves as its most valuable takeaway is that using artificial intelligence with maturity doesn’t mean using it at maximum autonomy all the time. It means using it strategically — knowing where it shines, where it needs help, and how to build systems that get the best of both worlds.
AI still struggles with common sense, even as online conversations about the topic can overstate current progress. That’s an observation coming from someone who isn’t on the outside commenting, but on the inside, building and using the technology every single day.
Experts who’ve reached this conclusion on their own, after years of working with the technology in practice, tend to be the best guides for anyone just starting this journey. And Polosukhin’s core message is clear: put your agents to work, but keep your eyes open and your judgment turned on. 🤖
