Artificial Intelligence in the daily lives of Netflix, Meta, and IBM: agents, context, and the so-called preparation tax
Generative AI already feels like magic in a lot of demos, but in practice, it is still very far from something like saying Alexa, build me a complete e-commerce and walking away to grab a coffee. And it is no use adding a dramatic request like DO NOT HALLUCINATE, that does not fix the underlying problem. What is becoming clearer by the day, especially for people working at companies like Netflix, Meta, and IBM, is that the more you want AI agents to do for you, the more backstage work needs to be done before, during, and after every interaction.
On stage at recent AI events, executives and engineers from these companies keep repeating the same point: the technology does multiply our capacity, but it only delivers reliable results when there is a solid level of preparation, review, and orchestration. This lines up directly with something well known in economics, the so-called Jevons Paradox: when a technology dramatically increases the efficiency of using a resource, we tend to use even more of that resource. With AI, it is something similar. Instead of simply replacing human tasks, it opens new fronts of work, creates roles that did not exist, and pushes teams into a kind of routine where there is always one more automation to build, one more context tweak to try, one more experiment to run.
Today, anyone using models as assistants for coding, writing, research, or data analysis can indeed become a 10x developer in terms of output volume. The catch is that this comes with a twist: you gain 10x speed, but you also end up having to review 10x more results, trim 10x more rough edges, and decide 10x more often what stays and what goes into the trash. If we ever get to a sci-fi scenario where a superintelligence runs the show, the road to that point will go through a long phase where humans still need to push a lot of data, clean up context, and review agent output all day long.
Netflix and adversarial review: many agents, lots of context, endless conversations
A very concrete example of how this dynamic is showing up in the real world comes from inside Netflix. Ben Ilegbodu, a UI architect at the company, shared at a conference how his work changed with the arrival of AI agents. Instead of just writing code and reviewing other devs PRs, he now builds full pipelines of specialized agents for specific tasks, mainly code review and automation of parts of the development process.
When he creates an agent to automate a task, such as implementing a feature in a large codebase, he does not stop there. Next, he sets up a second agent with a focused mission to evaluate the work done by the first one, critically inspecting bugs, style, security, and adherence to internal standards. He calls this layered approach adversarial code review, because one agent acts as a kind of counterweight to the other, trying to find flaws, inconsistencies, and red flags.
In practice, this means that the flow stops being human → code → review and becomes something like human → implementing agent → reviewing agent → human who closes the loop. And it gets even more interesting: quite often, Ilegbodu breaks the review itself into multiple agents, each focused on a different slice of the problem. One might handle performance, another accessibility, another component standardization. And yes, a third agent steps in to orchestrate the conversation between all the others, consolidate results, and produce an actionable summary for the human to make the final call.
While one of these agents is working on a part of the codebase, he already spins up another one to prepare the ground for the next task in parallel. In his words, it is like parallelizing himself, keeping work moving on multiple fronts at the same time. This working style makes the Jevons Paradox very obvious in practice: the easier it gets to create agents for specific tasks, the more agents appear, the more reviews get done, and the more coordination seeps into the day to day.
One curious detail is how this affects the work experience itself. With so many agents running, Ilegbodu started coding comfortably in languages he did not master before, like Python, Bash, and Groovy, because he can ask the AI for code snippets, examples, and reviews on top of well-defined contexts. At the same time, he admits the mental impact is heavy: at the end of the day, the feeling of exhaustion does not come from having written every line of code, but from spending hours talking, instructing, correcting, and refining with AI agents. Instead of less back and forth, the job became a constant dialogue with systems that need context, boundaries, and direction all the time.
Meta and the intern that never gets tired: too much context, not enough focus
At Meta, the favorite metaphor to describe AI in development is that of a super enthusiastic intern. The comparison fits: the model is fast, tireless, can handle an absurd volume of information, and will happily take on almost any task. The difference from a human intern is that the AI does not burn out, at least not in terms of fatigue, but the system starts suffering from a different problem: what Justin Jeffress, a Developer Advocate at the company, calls context rot.
It works like this: you start a project with an AI agent, share documentation, explanations, examples, coding standards, known bugs, and so on. Over time, the conversation grows, the history piles up, and every new response needs to be computed on top of an ever longer trail of messages, instructions, and details. The more things compete for the model’s attention, the higher the chance it will latch onto an irrelevant piece of the conversation or some old example that no longer applies, returning a result far from what you expected.
This gradual loss of focus is the context slowly rotting over time. To fight this, Jeffress argues that context engineering should sit at the center of any serious work with agents: instead of just dumping stuff into the chat history, you carefully define what goes in, what gets thrown out, and what is pulled in at each step. In simple terms, it is like building a set of rules, tools, and skills that the agent can call on demand, instead of relying only on the free-form text in the conversation.
One technique he strongly recommends is prompt chaining. Instead of sending one huge, vague, wishful request, you break the task into specific steps and guide the agent through a sequence: first understand the problem, then gather requirements, then propose solutions, only then generate code, and so on. Does it take effort to prep this? Absolutely. But in practice, that upfront work significantly lowers the chances of drift later and reduces the need to redo everything because the AI decided to follow some weird shortcut.
Jeffress also suggests something very practical for everyday use: keep a markdown file or another type of document as the single source of truth for what is in progress, what has already been decided, and what the rules of the game are. The agent can consult this file instead of relying only on scattered messages, which helps prevent it from forgetting objectives along the way. Working like this, AI usually gets you to around 80 percent of a task, leaving the last 20 percent for the human. But then comes an interesting twist: when he tried to automate parts of that remaining 20 percent as well, he found that another 80 percent slice of that layer could be done by bots. And the cycle repeats, almost like a fractal version of the 80/20 rule applied to a seemingly endless process of cleanup and refinement.
IBM, decomposition, and mellea.ai: less magic, more engineering
While Netflix and Meta bring hands-on stories about using agents, IBM pushes the conversation in a more structural direction. Luis Lastras, director of language and multimodal technologies at the company, pokes at a common illusion: believing that the problem is always the AI refusing to obey. According to him, in most cases the real issue is how we describe the work, or better yet, how we fail to break the problem down into pieces small enough for the machine to handle safely.
He calls it illusory prompting when people try to solve everything with drama, as in over-the-top messages like my career depends on this, do not hallucinate, do it perfectly. At the end of the day, that is almost like trying to cast a spell and hoping a giant model will do magic with vague instructions. Instead, Lastras emphasizes something that sounds basic but is easy to forget in the middle of AI hype: decomposition is Engineering 101. It is the art of taking a complex system, identifying critical parts, modularizing, designing each piece, and, if needed, assigning different specialists to handle each module.
In the world of agents, this means dropping the idea of throwing a huge wall of text at a generic LLM and hoping for the best. The approach he defends is building well-defined functions that help the agent execute specific tasks: validating formats, enforcing policies, checking consistency, detecting potentially harmful output, controlling response styles, structuring results into predictable schemas, and so on. To support this type of flow, IBM released the open source library mellea.ai, which brings ready-made Python patterns to structure language model calls in a more disciplined way.
With these patterns, you can attach requirements to LLM calls, intercept and inspect responses before returning them to the user, enforce strict output formats, or include automated checks on certain types of content. In parallel, IBM is researching a concept Lastras describes as brain swapping: agents that can switch between different models, choosing the most suitable one for each subtask. Instead of always relying on a huge, generic, and expensive model, you can lean on smaller, specialized ones, as long as you give them enough inference time and keep context well aligned. In several tested scenarios, this combo of solid engineering patterns with smaller, focused models outperformed, in quality, heavier and more generalist options.
The preparation tax: why assuming the machine knows is technical debt
To wrap up this mosaic, there is a practical view from Justin Chau, a senior engineer at Intuit. He sums up a sensitive point with a blunt line: implicit assumptions are technical debt. What feels obvious to you does not exist for the AI until it is explicitly stated. If you do not make clear what is acceptable, what is off-limits, and where the boundaries of the task are, you are basically planting time bombs of unexpected behavior down the road.
One piece of advice he highlights is flipping how we usually write instructions. Instead of focusing only on what the AI should do, it is worth emphasizing the limits of what it can do. Well-defined constraints work like hard nos that are harder for the model to ignore. If you say that under no circumstances an agent may use HTML in a given output, for example, that rule tends to be followed much more strictly than a simple please return plain text.
Even stronger than explicit constraints is the absence of permission. If the agent cannot access a repository, an API, or a codebase, it simply cannot touch it, no matter how good the underlying language model is. For Chau, this is a practical way to reduce risk: instead of trusting only textual instructions, you design the environment so that certain actions are impossible, protecting critical systems without having to manually review every response.
This work of preparing context, designing constraints, picking sources, and breaking problems into smaller parts is what many people are already calling the preparation tax. There is no way around it. Just like in The Hitchhiker’s Guide to the Galaxy, where a supercomputer spends ages calculating the answer to the ultimate question of life, the universe, and everything and returns a dry 42, we are finding out that having powerful tech without asking the right questions, without decomposing problems properly, and without taking care of context only leads to empty answers. Instead of a future where AI does everything on its own, what we see today is a landscape where it pushes us into a constant cycle of preparation, orchestration, and review.
In the end, the message coming from Netflix, Meta, IBM, Intuit, and others is neither apocalyptic nor utopian. It is a very pragmatic warning: AI really can turbocharge your workday, create room to experiment more, test more hypotheses, and speed up projects. But the price for that acceleration is learning to think like a context engineer, accepting the preparation tax, and seeing agents not as infallible wizards, but as powerful tools that need clear boundaries, good sources, and a human carefully conducting the orchestra.
