Share:

Leanstral: Mistral AI’s open-source foundation for trustworthy vibe-coding with formal proofs

Leanstral just dropped, and it’s changing a conversation that the tech community had been putting off for a while now. And this time, we’re not talking about yet another generic model that writes pretty code nobody can actually verify works.

AI code generation has evolved dramatically over the past few years. AI agents have already proven themselves as highly capable tools when it comes to creating code. But one problem kept nagging like a pebble in your shoe: who guarantees that what the AI produced is actually correct? In critical areas like mission-essential software, cutting-edge mathematical research, or systems where a failure can be incredibly costly, human review was still the biggest bottleneck in the process. The time and specialized expertise needed to manually verify AI-generated code became the main obstacle to engineering velocity. Experts had to spend hours checking every line, and that slowed everything down. Projects fell behind, bugs slipped through, and trust in AI-generated code remained questionable for decision-makers.

Mistral AI’s vision for solving this deadlock is ambitious: a new generation of coding agents that don’t just execute their tasks, but also formally prove that their implementations are correct against rigorous specifications. Instead of humans debugging machine-generated logic, they simply state what they want, and the agent takes care of building it and proving it built it right. The concrete result of this first major step is Leanstral, the first open-source code agent specifically designed to work with Lean 4. 🎯

What is Lean 4 and why it matters so much in this story

To understand the impact of Leanstral, it helps to contextualize what Lean 4 is and why Mistral chose this language as its foundation. Lean 4 is a proof assistant capable of expressing extremely complex mathematical objects, like perfectoid spaces, as well as software specifications, such as properties of Rust fragments. It’s not just a proof language; it’s also a full-fledged functional programming language, which means the same code you write to implement a function can come with formal proofs about that function’s behavior, in the same file and the same project.

Unlike traditional testing, where you check whether a program works in a few specific cases, a formal proof uses mathematics to demonstrate that the code behaves correctly in absolutely all possible cases. There’s no room for surprises, no edge case that slips through, no bug hiding in some hypothetical scenario. The proof either exists or it doesn’t, and when it does exist, you have a guarantee that no conventional test can offer.

This approach is especially valuable in contexts where failures carry serious consequences: embedded systems in aircraft, code managing financial infrastructure, critical algorithms in medical devices, or any software where a mistake can mean enormous losses or real risk. Historically, creating these proofs was extremely manual, slow work that required professionals with highly specialized training in mathematical logic. That’s why, despite being a technique known for decades, formal proofs never made it into mainstream software development. This is exactly the gap where Leanstral fits in surgically.

What makes Leanstral different from other systems

Mistral AI is quick to point out that Leanstral is not just another wrapper on top of a large generalist model, nor a system focused solely on solving isolated mathematical problems. It was designed to be highly efficient, with only 6 billion active parameters thanks to a sparse architecture, and trained to operate on realistic formal repositories. This distinction is critical, because working with real-world Lean repositories is far more complex than solving a standalone theorem. It involves understanding dependencies between files, navigating imported libraries, dealing with different language versions, and respecting the project’s context as a whole.

Receive the best innovation content in your email.

All the news, tips, trends, and resources you're looking for, delivered to your inbox.

By subscribing to the newsletter, you agree to receive communications from Método Viral. We are committed to always protecting and respecting your privacy.

Leanstral rests on three main pillars:

  • Open and accessible: The model weights are released under an Apache 2.0 license. On top of that, it’s accessible in agent mode within Mistral Vibe and through a free API endpoint. Mistral also plans to release a detailed technical report covering the training approach and a new evaluation suite called FLTEval, designed to push evaluations beyond the usual focus on competition mathematics.
  • Efficient and powerful: Using a highly sparse architecture optimized for proof engineering tasks, Leanstral leverages parallel inference with Lean acting as a perfect verifier. This makes it both performant and cost-efficient compared to much larger closed-source competitors.
  • Updatable via MCP: Leanstral supports arbitrary MCPs through Vibe and was specifically trained to achieve peak performance with the commonly used lean-lsp-mcp.

Evaluation: numbers that speak volumes

One of the most impressive aspects of the Leanstral launch is its benchmark results. Rather than using traditional evaluations based on isolated math problems, Mistral created FLTEval, which reflects realistic proof engineering scenarios. The benchmark evaluates the ability to complete all formal proofs and correctly define new mathematical concepts in each pull request of the FLT project, which is significantly more challenging and representative of the real world.

Leanstral vs. open-source models

The results against other open-source models are quite striking. Leanstral-120B-A6B demonstrates a significant efficiency advantage over its open-source peers, which are much larger in size. Models like GLM5-744B-A40B and Kimi-K2.5-1T-32B struggle to scale, with their FLTEval scores plateauing at approximately 16.6 and 20.1, respectively. Leanstral surpasses both with just a single pass.

Even Qwen3.5-397B-A17B, the strongest open-source competitor in testing, needs 4 passes to reach a score of 25.4. In contrast, Leanstral hits a superior score of 26.3 with only 2 passes (half the compute investment) and continues scaling linearly, reaching 29.3 at the same cost level. Considering that Leanstral runs on just 6B active parameters versus tens of billions for the competition, these numbers are remarkable. 📊

Leanstral vs. the Claude family

The comparison with Anthropic’s models is where Leanstral’s cost-efficiency argument gets really impressive. Leanstral at 2 passes hits a score of 26.3 on FLTEval, beating Claude Sonnet 4.6 by 2.6 points, while costing only $36 to run, compared to $549 for Sonnet. That’s more than 15 times cheaper for a better result.

At 16 passes, Leanstral reaches a score of 31.9, comfortably beating Sonnet by 8 points. Claude Opus 4.6 still leads in raw quality with a score of 39.6, but that comes at a staggering cost of $1,650, which is 92 times more expensive than running Leanstral. Claude Haiku 4.5, which costs $184, scores 23.0, falling below Leanstral pass@2 at just $36.

Worth noting that in the benchmarks, the Mistral team used Mistral Vibe as the scaffold without any evaluation-specific modifications, which makes the results even more representative of the real-world performance anyone can expect when using the tool day to day. 💰

Real-world use cases

Benchmark numbers are great, but what really convinces people is seeing the tool solve actual problems. Mistral shared two case studies showing Leanstral in action outside the controlled evaluation environment.

Solving migration issues between Lean versions

When breaking changes show up in a new Lean version, migrating code can be a monumental headache. The team fed Leanstral a real question from the Proof Assistants Stack Exchange about a script that mysteriously stopped compiling on Lean 4.29.0-rc6, a version so recent that the model wasn’t even trained on it.

The problem involved a rewrite tactic (rw) that suddenly failed when trying to pattern-match involving a simple type alias, originally written as def T2 := List Bool. Instead of guessing at a generic solution, Leanstral rolled up its sleeves: it built test code to recreate the failing environment, diagnosed the underlying issue with definitional equality, and correctly identified that since def creates a rigid definition requiring explicit unfolding, it was actively blocking the rw tactic from seeing the structure it needed to match.

The proposed fix was simple and elegant: swap def for abbrev. Since abbrev creates a transparent alias that is immediately definitionally equal to the original type, the rw tactic went back to working perfectly in the proof. Leanstral completed the task and even explained the logic behind the solution clearly to the user.

Reasoning about programs and translating between proof languages

In the second case, the team copied definitions written in Rocq (formerly known as Coq) from a Princeton University course material and asked Leanstral to convert them to Lean. The agent handled the conversion successfully, including implementing custom notation, which is a non-trivial task requiring deep understanding of both languages. Even more impressive: when given only property statements in Rocq without the proofs, Leanstral managed to translate them into Lean and prove those properties from scratch. This kind of cross-language reasoning between different proof languages is something very few systems in the world can do autonomously. 🧠

How to start using Leanstral right now

Leanstral is available today for anyone to use, and Mistral has provided multiple access paths to serve different user profiles:

Tools we use daily

  • Zero setup on Mistral Vibe: Leanstral has been integrated directly into Mistral Vibe for immediate vibe-coding and proofs, with no configuration needed. To activate it, just use /leanstral. Then press Shift+Tab until the model shows up as Leanstral, or use vibe --agent lean.
  • Labs API: The model can be accessed via a free or near-free API endpoint, using the identifier labs-leanstral-2603. Mistral is keeping this endpoint highly accessible for a limited time to collect realistic feedback and observability data that will feed into the next generation of verified code models.
  • Download the weights: The Apache 2.0-licensed model can be downloaded and run on your own infrastructure, giving you full control over how and where you run Leanstral.

The impact of open-source for the AI and development community

The decision to release Leanstral as an open-source solution under the Apache 2.0 license is no minor detail; it’s a philosophical and strategic choice with massive practical implications. When an AI tool focused on formal verification is closed and proprietary, it stays locked inside a controlled ecosystem, with access limited by costs or usage restrictions. Independent researchers can’t study how it works, can’t identify limitations, and can’t adapt the tool for use cases the original company didn’t anticipate. Open-source tears down all those barriers at once.

For development teams working on projects that demand high reliability, this opens up concrete possibilities. A startup building financial software can integrate Leanstral into its pipeline without paying for an expensive service. A university teaching formal methods can use the agent as an educational tool and contribute improvements to the codebase. A company that needs to adapt the tool for a specific domain can do so without depending on a closed API or a commercial contract.

Beyond that, opening the code creates a trust dynamic that’s particularly important when the subject is software correctness verification. If you’re going to use a tool to guarantee your code is mathematically correct, it makes perfect sense to want to inspect how that tool works under the hood. Leanstral’s transparency isn’t just a bonus; it’s a fundamental part of the tool’s value proposition. A verification system you can’t audit has inherently limited credibility. With the code available, the community can verify, critique, improve, and trust with far more confidence. 🔍

What Leanstral means for the future of vibe-coding

The term vibe-coding, which has been gaining traction over the past few months, describes a way of programming where the developer focuses on the intent of what they want to build and lets the AI handle the implementation. It’s an exciting way to work, but until now it carried an implicit risk: if you trust the AI to write the code, how can you be sure it’s doing what it should? Leanstral answers that question by adding a formal verification layer to the process. It’s no longer vibe-coding in the dark; it’s vibe-coding with mathematical proof that things are correct.

This concept of trustworthy vibe-coding is what Mistral is positioning as the next natural evolution of AI code generation. Instead of choosing between speed and trust, the pitch is to have both. The developer describes what they want, the agent implements and proves, and Lean’s verifier ensures everything is correct. If the proof passes, the implementation is mathematically correct. No debate, no guesswork, no hours of code review trying to hunt down subtle bugs.

Leanstral is still a young tool, and it’s natural for it to evolve significantly over the coming months as more people use, test, and contribute. But what Mistral AI has put on the table with this launch is already enough to shift the conversation about the role of AI agents in software verification. The combination of open-source, competitive performance at a dramatically lower cost than competitors, native integration with Lean 4, and a well-designed agent architecture creates a solid foundation for formal proofs to finally start leaving the lab and entering everyday development. And that, let’s be honest, was something long overdue. ✅

Picture of Rafael

Rafael

Operations

I transform internal processes into delivery machines — ensuring that every Viral Method client receives premium service and real results.

Fill out the form and our team will contact you within 24 hours.

Related publications

Performance and Growth: Nvidia, AI Agents, and Data Centers

Nvidia accelerates revenue with data centers, GB300 NVL72, and Rubin; efficiency and AI Agents demand drive record growth and profit.

AI and Copyright: Supreme Court Denies Copyright Protection for Artistic Creation

Supreme Court rejected the AI-generated art case; in the US only humans can hold authorship — a direct impact on

AI Reveals the Identity of Anonymous Social Media Users

Vulnerable anonymity: how modern AI unmasks social media profiles and why this threatens your online privacy.

Receba o melhor conteúdo de inovação em seu e-mail

Todas as notícias, dicas, tendências e recursos que você procura entregues na sua caixa de entrada.

Ao assinar a newsletter, você concorda em receber comunicações da Método Viral. A gente se compromete a sempre proteger e respeitar sua privacidade.

Rafael

Online

Atendimento

Calculadora Preço de Sites

Descubra quanto custa o site ideal para seu negócio

Páginas do Site

Quantas páginas você precisa?

4

Arraste para selecionar de 1 a 20 páginas

📄

⚡ Em apenas 2 minutos, descubra automaticamente quanto custa um site em 2026 sob medida para o seu negócio

👥 Mais de 0+ empresas já calcularam seu orçamento

Fale com um consultor

Preencha o formulário e nossa equipe entrará em contato.