Share:

The revolution of code agents has reached an important new chapter with the launch of Leanstral by Mistral AI. These agents have already proven they can generate extremely high-level code, but when it comes to safety-critical software or cutting-edge math, the same bottleneck always shows up: human review.

In areas like advanced math research, verification of language properties like Rust, or the development of systems that simply cannot fail, the cost is not just in writing code, but especially in proving that it is correct. And today, that burden still falls heavily on experts manually reviewing every detail.

Leanstral’s proposal is to tackle exactly that point: instead of only generating code, it acts as an agent prepared to interact with Lean 4 and work inside real formal repositories, helping to prove that the code meets strict specifications. Less time hunting down subtle bugs, more time defining what the system actually needs to do.

Leanstral: open-source agent focused on Lean 4

Leanstral is the first open-source code agent designed specifically for Lean 4, a proof assistant used to describe complex mathematical objects and formal software specifications. With it, you can handle everything from concepts like perfectoid spaces to properties of Rust code snippets, using frameworks that are already well known in the community.

Unlike systems that simply wrap gigantic general-purpose models or only go after isolated math competition problems, Leanstral was built to operate in realistic proof engineering scenarios, inside large formal projects such as entire repositories.

Some key points of the proposal:

  • Open and accessible: Leanstral’s model weights are released under the Apache 2.0 license, which allows commercial use, study, and modification. It is also available as an agent inside Mistral Vibe and through a free API endpoint, designed for broad experimentation.
  • Efficiency with sparse architecture: with about 6 billion active parameters, Leanstral uses a highly sparse architecture, optimized for proof engineering tasks. Instead of betting on massive models, Mistral focuses on targeted performance.
  • Trained for the real world: the model was trained to operate on real formal repositories, not just isolated problems. That makes a big difference in practical usefulness when you need to handle an entire PR, not just a competition exercise.
  • MCP integration: Leanstral supports MCPs via Mistral Vibe and was trained to perform at its best with lean-lsp-mcp, which is widely used to integrate Lean with modern tooling.

On top of that, Mistral announced a new evaluation suite, FLTEval, which aims to move away from an exclusive focus on competitive math problems and instead simulate scenarios closer to the day-to-day work of people dealing with formal proofs in live projects.

How Leanstral is evaluated in practice

Instead of using only benchmarks made up of isolated math questions, Leanstral’s performance was measured in a much more demanding context: completing all formal proofs and correctly defining new mathematical concepts in each pull request of the FLT project.

Receive the best innovation content in your email.

All the news, tips, trends, and resources you're looking for, delivered to your inbox.

By subscribing to the newsletter, you agree to receive communications from Método Viral. We are committed to always protecting and respecting your privacy.

In this scenario, it was compared against:

  • Leading commercial code agents such as Claude Opus 4.6, Sonnet 4.6, and Haiku 4.5;
  • Large open-source models such as Qwen3.5 397B-A17B, Kimi-K2.5 1T-A32B, and GLM5 744B-A40B.

Comparison with gigantic open-source models

In the Leanstral-120B-A6B version, the model shows a very clear efficiency advantage over much larger peers. On FLTEval, models like GLM5-744B-A40B and Kimi-K2.5-1T-32B hit a ceiling around 16.6 and 20.1 points, respectively.

Leanstral, however, can surpass those numbers with just one inference pass. And when compared to Qwen3.5-397B-A17B — the strongest open-source competitor in the test — the efficiency contrast becomes even clearer:

  • Qwen needs 4 passes to reach a score of 25.4;
  • Leanstral hits 26.3 with only 2 passes and continues to scale almost linearly, reaching 29.3 at the same total cost.

In other words, it does more with less, which is crucial when we talk about cost, latency, and the feasibility of running the model in controlled or on-premise environments.

Leanstral vs the Claude family

When placed side by side with the Claude family, Leanstral appears as a very high cost-benefit option for formal proof and code engineering scenarios.

In tests using Mistral Vibe as the support structure (with no special tuning just for the benchmark), the results were as follows:

Model Cost (US$) FLTEval score
Haiku 184 23.0
Sonnet 549 23.7
Opus 1,650 39.6
Leanstral 18 21.9
Leanstral pass@2 36 26.3
Leanstral pass@4 72 29.3
Leanstral pass@8 145 31.0
Leanstral pass@16 290 31.9

Some highlights from this table:

  • With pass@2, Leanstral reaches 26.3 points, beating Sonnet by 2.6 points, at a cost of 36 dollars, versus 549 dollars for Sonnet on the same benchmark.
  • At pass@16, Leanstral gets to 31.9, ending up 8 points above Sonnet.
  • Claude Opus 4.6 still leads in quality with 39.6, but its price shoots up: around 1,650 dollars, which is 92 times more expensive than running Leanstral in an equivalent scenario.

For teams that need a lot of formal proof but cannot just burn budget on inference, that difference is a big deal.

Real-world use cases with Lean 4

Answering questions about changes in new Lean versions

One of the practical tests run with Leanstral was based on a real problem reported on Proof Assistants Stack Exchange. The question described code that worked on previous versions of Lean but stopped compiling on version 4.29.0-rc6. That version is recent enough that it would not have been included in the model’s training data, which makes the test even more interesting.

The error involved a rewrite tactic, rw, which was no longer able to match patterns with a simple type alias defined like this: def T2 := List Bool. Instead of just guessing a solution, Leanstral set up a test code snippet that reproduced the failure environment, analyzed how the definition behaved, and traced the issue down to definitional equality.

It correctly identified that using def creates a definition that needs to be explicitly unfolded, which was getting in the way of rw seeing the right structure for the pattern. The suggested fix was to replace def with abbrev, which creates a transparent alias, immediately equal to the original type from the checker’s point of view.

With that change, the rw tactic once again matches expressions like (L2 n).length in the proof. Leanstral not only reaches the fix, it also explains the reason in a clear way, acting like a technical assistant that understands the subtleties of Lean’s logical core.

Reasoning about programs and translating proofs

Another interesting example came from an experiment with definitions in Rocq (Coq), taken from a classic Princeton Semantics text. These definitions describe a simple imperative language and its properties.

In the test, those definitions were copied over and Leanstral was instructed to convert them to Lean 4. It managed to perform that translation successfully, including implementing custom notations equivalent to the ones used in the original environment.

Beyond that, the model was able to take just the property statements in Rocq (without the proofs) and write Lean proofs for those same properties of the language. In other words, it understood not only the syntax but also the semantics of the statements and was able to reconstruct proofs in this new formal environment.

Tools we use daily

This kind of capability opens doors for gradual migration of formal codebases from one tool to another, and also helps in teaching formal verification, where students can explore different languages without having to rewrite everything from scratch.

Usage modes and access to Leanstral

Leanstral was released in a way that covers several usage profiles, from people who just want to try quick proving sessions to teams that want to run the model on their own infrastructure.

  • Integration in Mistral Vibe: the model is available as an agent mode inside Vibe, ready to use, with no heavy setup. The idea is to enable vibe coding and proving sessions in Lean through direct commands.
  • Labs API: Mistral offers a free or near-free API endpoint with the identifier labs-leanstral-2603. This channel was designed to gather real usage feedback and observability data, helping guide the next generations of models focused on verified code.
  • Weights under Apache 2.0: for those who need maximum control, the model weights can be downloaded and run on your own hardware, whether in a private cloud or on-premise. That is essential for organizations working with sensitive data or strict compliance requirements.

Mistral also announced a technical report detailing the training approach, along with the aforementioned FLTEval, aimed at evaluation in scenarios closer to professional practice in formal proof.

Why this matters for the future of software engineering

Leanstral points to a new phase in how AI is used in software development. Instead of just speeding up typing or generating drafts that humans have to rewrite, it comes closer to an assistant that can operate inside complex formal environments, respect strict specifications, and deliver verifiable proofs.

For those working with critical code, this means being able to shift human effort away from repetitive review tasks toward the stages of formal requirements definition, architecture, and system design. Verification becomes more and more an automated process, with engineers focusing on stating what needs to be proved, not on manually writing every detail of the proof.

The fact that all of this comes in an open-source package, with broad access, transparent evaluation, and integration with modern tooling via MCP, reinforces an important trend: specialized, efficient, and verifiable AI models are gaining ground over huge generic solutions, especially in contexts where trust and predictability matter more than just having the biggest raw parameter count.

At the end of the day, Leanstral does not single-handedly solve all the challenges of AI-assisted formal proof, but it is a significant step toward a generation of code agents that not only write code, but also mathematically stand behind what they write. And for anyone living the day-to-day reality of tech, AI, and software engineering, that is a pretty concrete game-changer. 💻🔥

Picture of Rafael

Rafael

Operations

I transform internal processes into delivery machines — ensuring that every Viral Method client receives premium service and real results.

Fill out the form and our team will contact you within 24 hours.

Related publications

Performance and Growth: Nvidia, AI Agents, and Data Centers

Nvidia accelerates revenue with data centers, GB300 NVL72, and Rubin; efficiency and AI Agents demand drive record growth and profit.

AI and Copyright: Supreme Court Denies Copyright Protection for Artistic Creation

Supreme Court rejected the AI-generated art case; in the US only humans can hold authorship — a direct impact on

AI Reveals the Identity of Anonymous Social Media Users

Vulnerable anonymity: how modern AI unmasks social media profiles and why this threatens your online privacy.

Receba o melhor conteúdo de inovação em seu e-mail

Todas as notícias, dicas, tendências e recursos que você procura entregues na sua caixa de entrada.

Ao assinar a newsletter, você concorda em receber comunicações da Método Viral. A gente se compromete a sempre proteger e respeitar sua privacidade.

Rafael

Online

Atendimento

Calculadora Preço de Sites

Descubra quanto custa o site ideal para seu negócio

Páginas do Site

Quantas páginas você precisa?

4

Arraste para selecionar de 1 a 20 páginas

📄

⚡ Em apenas 2 minutos, descubra automaticamente quanto custa um site em 2026 sob medida para o seu negócio

👥 Mais de 0+ empresas já calcularam seu orçamento

Fale com um consultor

Preencha o formulário e nossa equipe entrará em contato.