The bottleneck nobody talks about with AI agents in code
Anyone following the software development world already knows that AI agents capable of writing code at insane speed stopped being a promise a long time ago. What few people openly discuss is what happens after that code gets generated. At most companies, the output simply doesn’t survive contact with reality: it breaks internal engineering standards, fails compliance checks, ignores architecture conventions, and at the end of the day creates more rework than actual savings. It’s like having an extremely fast intern who delivers everything off-spec — you spend more time reviewing than you would have spent doing it from scratch.
Stephen Newman, global engineering technology leader for clients at EY, summed up the problem well. According to him, you can generate tons of code, but that doesn’t mean much if the output isn’t integrable, isn’t in compliance, and ends up creating more work on the back end just because the generation process was sped up on the front end. This is the kind of trap that catches many companies off guard: the illusion of speed that, in practice, just pushes the bottleneck to another stage of the workflow.
EY, one of the largest consulting firms on the planet, decided to tackle this problem head-on. And what Newman’s team managed to deliver deserves the attention of any organization working with technology at scale. By integrating code agents directly into the company’s existing repositories, compliance frameworks, and engineering standards, the teams responsible for building audit, tax, and finance platforms achieved productivity gains between 4x and 5x. We’re not talking about lab metrics or isolated proofs of concept — these are results measured across real teams, working on real products, delivering for real clients.
But it’s important to be clear right away: none of this happened overnight. The road to those numbers involved 18 to 24 months of building that was as much cultural as it was technical. It included organic adoption of assistive tools, careful platform selection, and perhaps most importantly, a genuine transformation in the role of developers within the organization.
From assistance to orchestration: how EY built the path
EY’s first step wasn’t rolling out autonomous AI agents across every project. In fact, it was pretty much the opposite. The company started with tools like GitHub Copilot, letting engineers get comfortable with prompt engineering and using assistive AI in their daily workflow. This initial phase was critical for building familiarity with the technology and, most importantly, for making adoption happen organically.
Newman was emphatic on this point: the most valuable lesson was understanding that bringing AI capabilities as a bottom-up adoption works far better than imposing tools from the top down. When developers feel like they’re choosing to use the technology — rather than being forced into it — engagement and quality of use increase dramatically.
During this period, the engineering team mapped out which activities consumed the most time, which types of errors were most recurring, and where human intervention actually added value versus where it was just operational red tape. This practical diagnosis, built from real day-to-day data, is what allowed them to make the next leap with confidence. Developers wanted to go beyond simple code generation — they wanted to move into building, deployment, and operationalization. But productivity gains hit a plateau without deeper integration.
The turning point came when Newman realized that agents needed access to what he calls the universe of context: the company’s code repositories, engineering standards, and internal resource catalogs. Without that context, agents produced generic outputs that required extensive rework. The integration wasn’t superficial. Agents began automatically consulting approved internal libraries, respecting naming conventions, applying the security rules required by the industry, and even factoring in jurisdiction-specific regulatory requirements. In practice, the generated code was already up to standard from the start, eliminating that endless cycle of review, correction, and re-review that used to eat up entire weeks of work.
Choosing the platform: no mandates, just data
The platform choice wasn’t trivial either. EY evaluated multiple agent options: Lovable, Replit, and Factory’s IDE-based Droids. Instead of having the leadership team pick a tool and push the decision down, Newman and his team measured adoption, usage, and productivity across all three platforms simultaneously.
Newman explained that he didn’t want to be overly prescriptive as leadership, picking a tool and oversimplifying the decision. The goal was to let the developers themselves signal where they were finding real value. And that’s exactly what happened. Engineers naturally gravitated toward Factory, which became the clear signal that the platform was delivering concrete results.
When Factory was elevated from evaluation to pilot, adoption, in Newman’s words, spread like wildfire. EY actually had to throttle traffic to Factory and the Droids, restricting which repositories could be connected before obtaining full compliance and security approval. That kind of runaway enthusiasm, which would normally be seen as a positive, actually raised an important red flag: the company needed discipline around which workloads to delegate to the agents.
The workload classification framework
With developers excited and adoption growing fast, it became clear that EY needed well-defined criteria for what agents could handle on their own and what still required human oversight. Newman’s team created a framework that separated tasks into two distinct categories:
High-autonomy tasks — where agents shine
- Code review — agents run fast, consistent scans, catching standard violations and potential bugs
- Documentation — generating and updating technical documentation from existing code
- Bug fixing — identifying and resolving known defects with a high success rate
- Greenfield features — developing new capabilities that don’t depend on complex legacy code
Complex tasks — where humans are still essential
- Large-scale refactoring — structural changes that impact multiple systems simultaneously
- Architecture decisions — design choices that define long-term technical direction
- Cross-system integrations — connections involving multiple APIs, databases, and cross-cutting dependencies
This classification might seem simple at first glance, but in practice it solves a problem many companies face when adopting AI agents: the temptation to delegate everything and deal with the consequences later. With clear categories, every team member knows exactly when to trust the agent and when to take the wheel. This prevents both overuse and underuse of the technology.
The new role of the developer: from executor to orchestrator
Perhaps the most profound change EY drove wasn’t technological at all — it was a mindset shift. The company also redefined developer roles. Instead of writing all the code themselves, engineers started acting as orchestrators, directing agents to the right databases and repositories for each task.
This transition required training, mentoring, and a lot of organizational patience. The engineers who adapted fastest were the ones who already had a more systemic view of software development — professionals who understood the why behind each engineering standard, not just the how. For these professionals, agents became incredible force multipliers, enabling a single developer to orchestrate work equivalent to four or five people.
Newman described this moment as a leap to what he calls the horizon development model. In this model, the company operates with semi-autonomous agent execution at scale, a team of orchestrators instead of executors, and full integrations with the universe of context. It’s a fundamental change in how engineering teams organize themselves and deliver value.
The concrete results and what they actually mean
With security guardrails in place and code repository integration complete, EY measured efficiency gains ranging from 15% to 60% across different professional profiles during the initial adoption phase. The bigger gains, 4x to 5x, came as the process matured over the following months.
Newman was careful to acknowledge that it’s hard to attribute those productivity gains exclusively to code agents. The improvements came from a combination of trial and error alongside cultural and behavioral changes across the development teams. That honesty matters because it avoids the oversimplified narrative that you just plug in an AI tool and productivity magically multiplies. In reality, the tool is just one piece of the puzzle.
The productivity results are impressive precisely because they don’t depend on artificial conditions. They were observed in real production environments, with all the quality, security, and compliance requirements that audit and finance operations demand. When the agent already knows the rules of the game before writing a single line of code, the output comes out usable on the first try most of the time. This eliminates rework, speeds up delivery cycles, and frees up senior engineers to focus on architecture decisions and innovation instead of getting stuck in routine code review.
Lessons for anyone looking to follow the same path
EY’s experience leaves some clear lessons for organizations thinking about adopting AI agents in software development at scale:
First, invest in standards before investing in tools. Agents without context are just generic code generators. Having well-organized repositories, up-to-date standards documentation, and clearly defined compliance rules is a prerequisite for any AI agent to generate code that’s actually usable. Without that, you’re just automating the production of rework.
Second, treat adoption as a cultural journey. The 18 to 24 months EY invested before reaching the most impressive results weren’t wasted time — they were the foundation that made everything possible. Starting with assistive tools, letting developers get comfortable at their own pace, and using adoption data to guide decisions are practices that reduce resistance and improve the quality of usage.
Third, classify your workloads. Not every development task is the same, and not every task should be delegated to an agent. Having a clear framework that defines where autonomy works and where human oversight is needed prevents disasters and builds gradual trust in the technology.
Fourth, measure adoption with data, not opinions. EY’s decision to evaluate three platforms simultaneously and let developers themselves signal their preference is a great example of how tooling decisions can be made pragmatically instead of politically.
The future of software development at enterprise scale isn’t about replacing developers with AI. It’s about creating an environment where humans and AI agents work together intelligently, each doing what they do best. EY showed that this is possible — but they also showed that the path requires patience, strategy, and a whole lot more than just flipping a switch on a tool. 🚀
