Stripe’s Minions and the new era of autonomous coding agents
The Minions have arrived, and no, we’re not talking about those yellow movie characters 😄
Stripe, one of the largest payment companies in the world, has built autonomous coding agents that are transforming the way software is developed internally.
The number that grabs your attention right away is this: over 1,300 pull requests generated per week, fully automated, without a single line of code written by a human.
But hold on, that doesn’t mean engineers got laid off.
Every piece of code produced by these agents goes through human review before it hits production.
What actually changed is something else entirely: engineers stopped writing repetitive code and shifted their focus to what truly matters — reviewing, validating, and making strategic decisions.
It sounds like the future, but it’s already happening right now, inside an infrastructure that processes over 1 trillion dollars in payments per year.
So the question is: how exactly does this system work, and what does it mean for software development going forward?
That’s exactly what we’re going to dig into here. 🚀
What are Stripe’s Minions?
The Minions are, in practice, autonomous agents developed in-house by Stripe to execute coding tasks independently. The name is playful, but the concept behind it is pretty sophisticated. These agents are powered by large language models, the well-known LLMs, and were configured to understand the context of Stripe’s code repository, identify tasks that can be automated, and generate working solutions on top of that.
This isn’t just a fancy autocomplete or a tool that suggests lines of code while you type. The Minions operate end-to-end on certain tasks, from reading the problem all the way to opening a complete pull request, ready to be reviewed by a human engineer.
What sets this approach apart from other AI tools on the market is precisely the level of autonomy involved. While tools like GitHub Copilot work as assistants that react to what the developer is doing, and AI-based code editors like Cursor still rely on constant human supervision, the Minions act proactively. They receive a task, plan the steps needed to solve the problem, navigate the existing codebase, make changes that are consistent with the code standards, and deliver a structured result. This type of execution is called one-shot, because the agent receives a single instruction and delivers the complete result without any intermediate interventions.
Cameron Bernhardt, Engineering Manager at Stripe, shared in a LinkedIn post that the Minions evolved from a concept to generating over a thousand pull requests per week, noting that all code is reviewed by humans, but that the agents are producing end-to-end changes with increasing autonomy.
It’s also important to understand that the Minions weren’t created to replace engineers, but to absorb the more mechanical and repetitive parts of their work. Inside a company the size of Stripe, with a massive codebase and hundreds of engineers working in parallel, there’s an enormous number of tasks that, while necessary, eat up time and cognitive energy without really requiring creativity or human judgment. These are exactly the tasks the agents took over, freeing up engineers to think about more complex problems, architecture, security, and user experience.
Where the Minions came from: the origin in the Goose project
The Minions didn’t come out of nowhere. The system evolved from an internal fork of Goose, one of the first widely used coding agents, developed by Block. Stripe’s engineering team took that foundation, adapted it to the company’s internal LLM infrastructure, and refined the system to meet the specific requirements of the Minions.
Meanwhile, interactive tools like Cursor and Claude Code are still being used inside Stripe for workflows that still require direct human oversight. In other words, the Minions didn’t replace those tools — they filled a complementary space in the company’s software development ecosystem, handling tasks that can run fully autonomously.
The decision to build on an existing foundation instead of starting from scratch shows an interesting maturity from the team. Rather than reinventing the wheel, they leveraged what already worked and invested their energy into what truly differentiated the solution: deep integration with Stripe’s internal environment, including CI/CD systems, proprietary repositories, and company-specific code standards.
How the autonomous agents generate pull requests in practice
The process starts with identifying a task. And here’s one of the coolest details about the system: that task can come from multiple sources. An engineer can trigger a Minion directly through a message on Slack, or the task can originate from a bug report, a feature request, or any other source that describes what needs to be done. This flexibility in how tasks are fed into the system makes it much more accessible and integrated into everyday workflows.
From there, the autonomous agent accesses the repository, analyzes the context around the problem, and starts planning the solution. This planning involves understanding which files need to be modified, what dependencies exist, what code patterns are used in the existing codebase, and how the proposed change will fit in without breaking anything. This level of contextual reasoning is what makes modern large language models so powerful for this kind of application.
Once the plan is set, the agent executes the changes. It writes the code, creates or updates automated tests when needed, adjusts documentation if applicable, and organizes everything into a well-structured pull request. That PR doesn’t just land on the team in a messy state — it comes with a clear description of what was done, why it was done, and what decisions were made along the way. This makes the reviewing engineer’s job much easier because they don’t have to guess the reasoning behind the changes.
The concept of blueprints: the recipe behind the autonomy
One of the most important elements in the Minions architecture is the concept of blueprints. In the context of the Minions, they work as workflows defined in code that specify how tasks are broken down into subtasks.
Stripe’s engineers describe blueprints as a collection of agent skills interwoven with code, ensuring efficiency while maintaining adaptability. In practice, each blueprint combines deterministic routines — fixed and predictable steps — with flexible agent loops, where the LLM makes decisions based on context. This blend is what allows the Minions to handle both standardized tasks and situations that require some degree of adaptation.
Think of blueprints like baking recipes, but where some steps are fixed, like preheating the oven, and others depend on the chef’s judgment, like adjusting the seasoning. This balance between rigidity and flexibility is essential for the agents to operate autonomously without compromising the quality of the final result.
Reliability backed by CI/CD and automated tests
A system that generates over a thousand pull requests per week at a company that processes trillions of dollars can’t afford to have failures. That’s why the Minions’ reliability is reinforced by CI/CD pipelines, automated tests, and static code analysis. Every change generated by an agent goes through these filters before it even reaches an engineer for review.
This means that when a human sits down to review a pull request generated by a Minion, they already know the code compiled, the tests passed, and there are no obvious standard violations. This pre-screening significantly reduces review time and increases confidence in the process as a whole.
Stripe’s engineers also noted that the Minions perform best on well-defined tasks, such as configuration adjustments, dependency updates, and targeted refactors. This makes sense because these are exactly the tasks where success criteria are clearest and most measurable, making both execution by the agent and validation by the human easier.
The role of large language models in this equation
Large language models are the heart of this entire system. Without them, the Minions would just be traditional automation scripts, limited to very narrowly defined tasks and unable to handle any variation or ambiguity. What LLMs bring to the table is the ability to understand context in both natural language and code simultaneously, make inferences about what needs to be done based on vague or incomplete descriptions, and adapt the solution to the style and conventions of a particular codebase.
Stripe hasn’t publicly revealed which specific model or models power the Minions, but the described behavior is consistent with what the most advanced language models available today are capable of. The use of techniques like retrieval-augmented generation, where the agent pulls relevant information from within the repository before acting, and chain-of-thought reasoning, where the model breaks the problem into smaller steps before solving each one, are fundamental to how these agents deliver coherent results in a codebase as large and complex as Stripe’s.
Another relevant point is the continuous learning capability that can be built into these systems. As engineers review the pull requests generated by the agents and make corrections or suggestions, that feedback can be used to adjust the models’ behavior over time. Not necessarily in real time, but in periodic update cycles that make the agents progressively more aligned with the team’s expectations. This creates a continuous improvement loop that, over the long term, tends to increase both the quality and the autonomy of the agents within the software development workflow.
What this means for software engineers
The narrative that AI will replace developers keeps coming up, but what Stripe is showing in practice points to a different direction. What’s happening there is a redistribution of responsibilities, not an elimination of roles. Engineers are still central to the process, but the type of work they do has changed. Instead of spending hours writing code for routine tasks like maintenance, dependency updates, standardized refactoring, or simple bug fixes, they now dedicate that time to reviewing what the autonomous agents produced and making decisions about what does or doesn’t go to production.
This work model requires a slightly different skill set from what was valued before. Being able to write code fast still matters, but it becomes almost secondary compared to the ability to read code critically, identify logic flaws, assess security risks, and understand the systemic impact of a change. In other words, review and judgment skills take center stage. Engineers who develop a strong ability to work in partnership with agents — knowing how to guide them, evaluate them, and correct them — tend to become far more productive than those who resist this new workflow.
On top of that, there’s a clear impact on project delivery speed. With agents absorbing the volume of repetitive tasks, teams can move forward on multiple fronts simultaneously without needing to scale headcount proportionally. For companies operating at a global scale that need to keep critical systems running with extremely high availability, like Stripe, this ability to scale software development without inflating the team is a concrete, measurable competitive advantage.
Reliability in a high-stakes environment
A detail that can’t be overlooked is the context in which the Minions operate. The code managed by these agents supports over 1 trillion dollars in annual payment volume and runs in an ecosystem with complex dependencies involving financial institutions, regulatory frameworks, and compliance obligations. This isn’t a side project or a lab experiment. This is real production, in one of the most critical payment infrastructures in the world.
This context raises the bar on the quality of code generated by the agents. Any mistake, no matter how small, can have real financial consequences. That’s why the combination of mandatory human review with automated validation through CI/CD pipelines and tests creates a robust safety net that allows the Minions to operate at speed without compromising system integrity.
The reliability and correctness of the generated code remain at the core of the entire autonomous agent deployment strategy at this scale, and that’s a point Stripe’s team consistently emphasizes in their communications about the project.
A trend that goes well beyond Stripe
The Minions system reflects a broader trend in agent-driven software development, where LLM-based agents are deeply integrated into development environments, version control systems, and CI/CD pipelines to produce production-quality code with minimal oversight.
Stripe isn’t the only company exploring this direction, but it’s among those doing it at the largest scale and with the most transparency about results. The volume of over 1,300 weekly pull requests generated by agents is a data point the entire industry is watching closely, because it proves this approach works in production, inside a company that can’t afford instability in its system. Stripe’s experience suggests that autonomous coding agents can significantly boost developer productivity while maintaining rigorous quality controls.
For smaller teams and startups, the most valuable takeaway might not be replicating exactly what Stripe did, but understanding the principle behind the strategy: identify which tasks within the software development workflow are repetitive, well-defined, and have clear success criteria, and start automating them with the help of large language models. It doesn’t have to be a sophisticated system right from the start. Even automating simple tasks, like generating unit tests or updating documentation, already frees up valuable time for the team to focus on what truly differentiates the product.
The future is collaboration between humans and agents
What becomes clear looking at what Stripe built with the Minions is that the future of software development isn’t human or machine. It’s human and machine, working in complementary layers, where each does what it does best. The agents handle the volume, the consistency, and the speed. The engineers handle the judgment, the creativity, and the accountability.
The fact that all pull requests contain zero lines of human-written code, yet go entirely through human review, perfectly illustrates this balance. It’s not total, unchecked automation. It’s automation with governance, where the machine executes and the human validates. This partnership, when properly calibrated, has the potential to profoundly transform how software is built in the years ahead. 🤖
