Anthropic built a test marketplace where AI agents negotiate with each other using real money
AI agents just proved they can close deals better than most people expected.
Anthropic ran an experiment that very few knew existed: an internal marketplace where artificial intelligence agents acted as real buyers and sellers, with actual products and real money on the line. The initiative was called Project Deal, and the results surprised even the people running it.
The setup was simple, but the impact was huge.
A total of 69 Anthropic employees took part in the experiment, each with a 100-dollar budget — paid in gift cards — to spend buying items from their coworkers. Everything was mediated by AI agents representing both the buyer and seller side.
The company itself acknowledged that the test was only a pilot experiment with a self-selected group of participants, but was still impressed by the overall performance. And it is not hard to see why 👀
Throughout the entire test, 186 transactions were recorded, moving more than 4,000 dollars in negotiated value.
But the most revealing number is not in the dollar amounts themselves — it is in what happened behind the scenes of those negotiations.
Four different marketplaces to figure out what works
One of the most interesting parts of Project Deal is that Anthropic did not limit itself to creating a single marketplace. The company actually ran four separate marketplaces, each using different configurations and models. Only one of them was considered the real marketplace, where all participants were represented by the company’s most advanced model and where negotiations would actually be honored after the experiment ended. The other three served as study environments to compare behaviors and outcomes under varying conditions.
This multi-layered approach was key for Anthropic to extract deeper insights into the dynamics between AI agents in commercial scenarios. By isolating variables like model capability and the initial instructions given to agents, the team was able to identify patterns that would have been invisible in a single, uniform test. It was precisely this separation that revealed one of the most impactful findings of the entire experiment.
When users were represented by more advanced models, they achieved objectively better results in negotiations. In other words, the more capable agent managed to extract more value for the person it represented, whether by negotiating lower prices when buying or securing more favorable conditions when selling. That alone would already be a significant finding, but what came next added an extra layer of complexity to the picture.
The invisible problem of disparity between agents
This is where things get really thought-provoking. Anthropic found that despite the clear performance gap between more and less advanced models, the users themselves did not notice the disparity. People being represented by a less capable agent had no idea they were coming out at a disadvantage in negotiations. The company pointed out that this raises a real possibility of quality gaps emerging between agents, where people on the losing end may simply not realize they are at a disadvantage.
This discovery has massive implications for the future of AI agents in commercial environments. If a consumer hires an AI agent service to negotiate on their behalf and that agent is inferior to the one on the other side of the table, the negotiation is already tilted from the start — and worst of all, nobody notices. This creates a scenario where the quality of the AI model you use could become a determining factor in your negotiating power, almost like a silent competitive advantage.
For anyone following the artificial intelligence market, this finding puts an important discussion on the table about fairness and transparency in AI-mediated negotiations. If the trend of using AI agents in commercial transactions keeps growing — and everything suggests it will — there will need to be mechanisms that make these capability differences visible, so users can make informed decisions about which agent to use.
Initial instructions made little difference
Another curious piece of data that came out of Project Deal has to do with the initial instructions given to the agents. Anthropic revealed that the guidance provided at the beginning of negotiations did not appear to significantly affect the likelihood of a sale being completed or the final negotiated prices. In other words, regardless of how the user configured their agent’s initial behavior, the results tended to converge toward a similar pattern.
This is particularly interesting because it goes against a common intuition around using language models. A lot of people believe that the way you instruct an AI agent — the famous prompt — is the decisive factor in getting good results. In the context of Project Deal, however, what actually made the difference was the intrinsic capability of the model, not the direction it received before starting to negotiate.
This finding has direct implications for companies and developers building solutions based on AI agents. It suggests that investing in the quality and sophistication of the underlying model may be more effective than spending time refining prompts and behavior scripts, at least in the context of autonomous negotiations. Of course, this conclusion needs to be tested at larger scales and in more diverse scenarios, but as a starting point, it is valuable information.
What was actually being traded in this marketplace?
Unlike a controlled and artificial environment, the Project Deal marketplace operated with everyday items that the employees themselves put up for sale — things like used electronics, clothing, books, accessories, and other personal belongings. Each seller set the initial price for their product, and the AI agents stepped in to represent both those looking to buy and those looking to sell, handling negotiations end to end without humans needing to intervene at every step. It was exactly this level of autonomy that Anthropic wanted to put to the test.
What stood out from the very beginning was how naturally the agents handled situations that typically require human judgment. They had to assess whether a price was fair, whether there was room for a discount, how to present a counteroffer without pushing the other side away, and how to close the deal at the right moment. This kind of reasoning involves much more than following a fixed script, and the agents demonstrated an adaptive ability that surprised the team behind the project.
On top of that, the fact that real money was involved completely changed the dynamic of the experiment. When there are concrete consequences, participants take the process seriously, and Anthropic employees were no different. They had 100 dollars to spend and wanted to make good purchases. This created genuine pressure on the AI agents, which needed to perform well enough to earn user trust and successfully complete transactions.
How did the AI agents perform in negotiations?
Agent performance throughout Project Deal was the heart of the whole thing. With 186 transactions recorded and over 4,000 dollars in negotiated value, the numbers speak for themselves, but what became even more evident was the quality of the interactions. The agents did not just complete negotiations — they conducted strategic conversations, identified the right moment to concede on a point and hold firm on another, and calibrated their tone depending on the context of each negotiation. This level of sophistication was not expected at this scale, even by the organizers themselves.
One of the most interesting aspects was how the AI agents handled deadlocks. In situations where buyer and seller were far apart on price, the agents found creative ways to break the impasse — whether by suggesting alternative conditions, highlighting specific product features, or simply adjusting their approach to make the proposal more appealing. This shows that agent reasoning goes well beyond simple transactional logic and is starting to resemble something that, until recently, would have been considered uniquely human.
Overall performance also stood out for its consistency. These were not just a few isolated success stories. The high volume of completed transactions, combined with the satisfaction reported by participating employees, indicated that the agents managed to maintain a high standard throughout the entire testing period. For Anthropic, this represented an important validation that their AI models are ready to operate in more complex scenarios with a greater degree of responsibility.
What Project Deal reveals about the future of AI agents
Project Deal was not just a fun internal experiment. It served as a real-world thermometer for understanding how far AI agents can go when placed in environments with genuine human variables. And the result pointed in a clear direction: these agents are evolving fast, and the ability to operate in marketplaces with real autonomy, handling transactions from start to finish, is no longer a future promise. It is a tested and documented reality.
For anyone following the tech and artificial intelligence market, this experiment raises very relevant questions about how AI agents can be integrated into commercial platforms. Imagine an e-commerce marketplace where agents negotiate shipping terms, volume discounts, or delivery timelines directly with suppliers, without the human team needing to step in on every detail. Or service platforms where agents help customers find the best package within their available budget, guiding the conversation in a natural and personalized way. Project Deal showed that the foundation for this kind of infrastructure already works.
Beyond that, the finding about model disparity adds an important layer of reflection. If in the future AI agents are widely used in commercial negotiations, the choice of model behind the agent could become just as relevant as choosing a good lawyer or broker. The difference is that while we know how to evaluate the competence of a human professional, we still lack clear benchmarks for measuring an AI agent’s capability in real negotiation contexts. Project Deal started filling that gap.
The numbers that stick
When you step back and look at what Project Deal produced in concrete terms, it is impossible not to recognize the significance of what was achieved. An internal marketplace that moved over 4,000 dollars in negotiated value, with 186 completed transactions among 69 participants, using AI agents as real-time intermediaries — that is a result that goes well beyond what most experiments in the space have managed to document so far. We are not talking about simulations or hypothetical scenarios. Every negotiation happened with real money and real people on the other side.
These numbers also shed interesting light on the discussion around performance in economic environments. The transaction completion rate and the financial volume involved suggest that the AI agents created enough perceived value for buyers to feel comfortable finalizing their purchases. In other words, the agents passed the hardest test of all: the trust test. And when an AI agent can earn a human’s trust in a negotiation with real money on the line, that says a lot about how mature this technology has become.
Project Deal also opens the door to a broader conversation about how the next generation of marketplaces will be designed. If AI agents can operate at this level of autonomy and efficiency in a controlled environment, the implications for large-scale commercial platforms are enormous. Anthropic’s experience serves as a reference model, showing that it is possible to build ecosystems where AI agents actively participate in the economic process — not just as support tools, but as the main players in negotiations 🚀
