31/03/2026 13 minutos de leituraPor Rafael

SHARE:

When artificial intelligence decides to do the heavy lifting in biomedical science

Artificial intelligence and medicine have always had a complicated relationship.

On one side, there is an absurd amount of biomedical data being generated every second — genomic sequencing, pathology images, clinical records, omics data. On the other, researchers who need weeks, sometimes months, to extract useful insights from all of it.

And caught in the middle of that equation, a massive barrier: most scientists who need these analyses the most do not have the technical background in programming or bioinformatics to run them on their own.

This gap has been around for a long time, and folks in the field are pretty tired of it. 😅

But then a publication in Nature Biomedical Engineering comes along and shakes things up considerably. BioMedAgent is a multi-agent framework based on LLMs that not only runs complex biomedical analyses autonomously but also learns and improves its own tools as it works. That is not a figure of speech. The system literally creates, tests, and refines its own analytical resources over time — a process the researchers call self-evolution.

And there is more: all the code is available on GitHub and the benchmarks have been published on open platforms like Hugging Face and Zenodo, putting this technology within reach of any research team in the world. In the next sections, we are going to break down how this works in practice, what the results show, and why this could be a game-changer for fields like oncology, genomics, and drug discovery. 🔬

What BioMedAgent is and how the multi-agent system works

BioMedAgent was developed by an international team led by researchers from the Chinese Academy of Sciences, the Macau University of Science and Technology, and the National Laboratory of Guangzhou, among other institutions. The work, published on March 30, 2026, introduces a multi-agent architecture in which different specialized artificial intelligence agents collaborate with each other to solve complex biomedical analysis tasks. Think of it this way: instead of a single model trying to handle everything on its own, there is a team of agents, each with a specific responsibility within the pipeline.

In practice, the framework includes agents dedicated to distinct functions. There is a Planner agent, responsible for interpreting the user request and mapping out the necessary steps. There is a Programmer agent, which generates the code for each step. And there is an Executor agent, which runs that code and verifies the results. This division of responsibilities, documented in the original paper with detailed workflow diagrams, is what allows the system to scale efficiently for tasks that would be impossible to solve with a single model.

What makes this design especially relevant is that the agents communicate with each other dynamically, adjusting the workflow as partial results come in. This means that if one agent detects an error in the generated code or an inconsistent result in the data, it can call on another agent to review that step before moving forward — without any human intervention. This internal verification loop is a significant technical advantage over previous approaches, where the researcher had to manually check every step of the process. The level of operational autonomy this provides is something the scientific community has been after for quite a while, especially in high-complexity biomedical analysis contexts.

Another important point is that the system was designed to use natural language as its primary interface. The researcher does not need to know how to write code in Python, R, or any other programming language. They just describe what they need in plain text, the way they would ask a colleague, and the framework takes care of the rest. This approach to technical democratization is directly aimed at the profile of biomedical researchers, who mostly have backgrounds in life sciences, not computer science. With this, BioMedAgent removes one of the main historical barriers between applied artificial intelligence and the labs that would benefit from it the most.

The BioMed-AQA benchmark and how performance was measured

No AI tool can be taken seriously without a robust benchmark to validate its capabilities. The BioMedAgent researchers understood this very well and created BioMed-AQA, a reference set with 327 open-ended questions covering different types of biomedical analysis tasks. These questions are classified into five categories: omics tasks (O), pathology (P), multimodal analyses (M), data simulations (S), and visualizations (V).

In addition to the open-ended questions, there is a complementary subset called BioMed-AQA-MCQ, with 172 multiple-choice questions, designed to allow automated and objective evaluations. The multiple-choice questions include both single-answer (73.26%) and multiple-answer (26.74%) items.

One of the innovations of the study is the use of an autoscoring agent, which automatically compares BioMedAgent results against reference answers. This agent achieved an AUC of 0.926 on the ROC curve, demonstrating high agreement with manual evaluations performed by human experts. This level of reliability in automated evaluation is essential for the system to consistently measure its own evolution across multiple rounds of learning.

All of these benchmarks, including questions, reference steps, and evaluation milestones, are publicly available on Hugging Face and Zenodo, allowing full replication by any research group.

Self-evolution: when AI improves itself

The concept of self-evolution is, without a doubt, the most intriguing and technically sophisticated aspect of the entire BioMedAgent project. What this means in practice is that the system does not just run analyses — it also learns from each execution to improve the tools it uses for the next ones. When the system faces a new task or encounters an unexpected result, it generates new analytical tools, tests those tools against available data, and if they perform well, incorporates them into its own repertoire for future use. It is a continuous cycle of creation, validation, and refinement that happens autonomously, without anyone needing to manually program each new capability.

This mechanism is supported by two main components described in the original paper:

  • LTU (Long-Term Tool Update): allows the system to continuously update and expand its repertoire of analytical tools based on accumulated experience.
  • CTC (Cross-Task Communication): enables knowledge gained from one task to be transferred to different tasks, increasing the overall efficiency of the system.

Additionally, the paper describes two memory update mechanisms — CMA (Cumulative Memory Addition) and IMF (Iterative Memory Fusion) — that determine how past experiences are integrated into the system across successive rounds of learning. In the CMA approach, new memories are simply added to the existing collection. In IMF, new memories are iteratively merged with previous ones, resulting in more consolidated and less redundant knowledge.

In the technical literature, this behavior comes close to the concept of self-improving systems, but with a relevant distinction: here, the improvement is not about the model itself but about the set of tools and scripts the model uses to execute domain-specific tasks. This is important because it means the system becomes increasingly capable of handling the types of analysis it encounters most frequently, creating a kind of progressive specialization driven by the actual data from the lab or institution using it. The more BioMedAgent is used, the more efficient and accurate it tends to become for that specific context, which is a huge advantage in longitudinal research or long-term projects.

The benchmarks published by the researchers show that this self-evolution capability results in measurable performance gains over time. Extended Data shows that the use of LTU generated statistically significant improvements in multimodal tasks (p = 1.477e-03) and that the overall success rate of the system increased significantly when LTU and CTC were used together. In tasks involving single-cell RNA sequencing data analysis and pathology image interpretation, the system showed progressive improvements in accuracy and efficiency metrics as more executions were performed across three rounds of learning.

This is not trivial — it is empirical evidence that the continuous learning mechanism is working as expected. And the fact that these benchmarks are available as open data on platforms like Hugging Face and Zenodo allows any research group to replicate the experiments and validate the results independently.

The IE algorithm and comparative results with other agents

Another fundamental component of BioMedAgent is the IE (Iterative Experience) algorithm, which operates during the planning and coding phases, coordinating the Planner, Programmer, and Executor agents. IE allows the system to iteratively refine both execution plans and generated code, using feedback from previous runs to avoid recurring errors.

The comparative results from the original paper are quite revealing. When tested on the BioMed-AQA benchmark with 327 questions, BioMedAgent with IE enabled showed significant gains over the mode without IE across virtually all task categories. The p-values obtained through two-tailed paired t-tests were extremely low for total tasks (p = 1.091e-22), omics tasks (p = 4.236e-09), simulations (p = 5.118e-07), and visualizations (p = 5.530e-09), indicating that the improvements are statistically robust and not the result of random variation.

The system was also compared with other LLM-based agents, including variations using GPT Function Call, demonstrating that the multi-agent architecture with self-evolution consistently outperforms simpler approaches in terms of analyzable scope and success rate.

Practical applications already being demonstrated

The original paper does not stop at presenting theoretical results. The researchers documented concrete practical applications of BioMedAgent in real biomedical research scenarios:

  • Identification of differentially expressed genes (DEGs): the system was compared with the official GEO2R online tool and demonstrated the ability to produce equivalent results autonomously.
  • Cell segmentation with resolution enhancement: BioMedAgent automatically built a complete workflow for pathology image processing, including model selection and result evaluation.
  • Single-cell transcriptomics data analysis: using tools like SCANPY and integration with libraries like Seurat, the system performed end-to-end analyses of single-cell RNA-seq data.
  • Functional gene enrichment: integration with tools like KOBAS-i for functional enrichment analysis and exploratory visualization of biological functions.

Each of these applications is documented with interactive chat details available on a dedicated web platform, where you can follow the entire process of planning, execution, and summarization for each benchmark question. 🧬

Open data and the real impact for the scientific community

Talking about open data in the context of biomedical science means talking about a deep cultural shift that has been underway for years but still faces resistance at many research centers. The decision by BioMedAgent developers to publish all code on GitHub and make benchmarks available on open platforms is not just a gesture of transparency — it is a statement of intent about how this kind of technology should expand.

When an artificial intelligence tool of this magnitude is accessible to a lab at a public university in Brazil just as much as to a well-funded research institute in the United States, the playing field truly changes. Teams with fewer resources gain access to analytical capabilities that were previously the privilege of those with the budget to hire data engineers and specialized computational scientists.

Beyond that, opening the benchmark data serves a direct scientific purpose: it allows the community to identify limitations of the system, propose improvements, and contribute new use cases that the original creators may not have anticipated. This collaborative model is exactly what accelerated progress in other areas of computing, such as the development of open-source language models and widely used libraries in the machine learning ecosystem. Applying that logic to the biomedical domain could generate a virtuous cycle where the more researchers use and contribute to BioMedAgent, the more robust and versatile the system becomes for the entire community — supercharging the self-evolution mechanism with a much greater diversity of real-world data and scenarios.

The practical impact of this is already starting to show up in areas like computational oncology, where analyzing large volumes of patient genomic data is a constant need, and in pharmacology, where drug discovery and repurposing depend on cross-referencing information from multiple heterogeneous sources. The study cites relevant work on identifying serum biomarkers for breast cancer through proteomics and bioinformatics, on biologically informed deep neural networks for prostate cancer discovery, and on computational approaches that accelerate drug discovery — all areas that could directly benefit from a system like BioMedAgent.

With this tool, tasks that would require weeks of work from a bioinformatics specialist can be completed in hours, with results that are documented, reproducible, and auditable. This is not just a matter of speed — it is a matter of making projects viable that simply would not happen otherwise due to a lack of technical resources. And that is exactly why this publication in Nature Biomedical Engineering is drawing so much attention.

The team behind the project and its funding

BioMedAgent was developed by a team of 22 researchers spread across several institutions in China and Macau. The three first authors with equal contribution are Dechao Bu, Jingbo Sun, and Kun Li. The project was jointly supervised by Kang Zhang, Runsheng Chen, and Yi Zhao.

Funding came from multiple sources, including the National Key R&D Program of China, the National Natural Science Foundation of China, the Ningbo Medical Research Program, the Beijing Natural Science Foundation, and the Macau Science and Technology Development Fund, among others. The researchers make a point of noting that the funders had no role in the study design, data collection and analysis, the decision to publish, or the preparation of the manuscript.

This diversity of funding and the transparency about editorial independence are important indicators of the seriousness of the work and the absence of declared conflicts of interest.

Why this matters beyond the labs

It is tempting to look at BioMedAgent as a purely technical tool, confined to the world of researchers with access to servers and biomedical datasets. But the implications go further. The combination of multi-agent architecture, self-evolution, and open data represents a model that can be replicated in other domains where there is a massive volume of data and a shortage of specialists capable of analyzing it efficiently. Precision agriculture, environmental monitoring, materials analysis for engineering — these are all areas facing structurally similar challenges to biomedicine and that could benefit from a framework built on this logic.

From the perspective of artificial intelligence development, BioMedAgent also raises interesting questions about the future of large language models applied to specialized domains. Instead of trying to build a single generalist model that knows everything about bioinformatics, genomics, and pathology, the multi-agent approach makes it possible to combine the general reasoning ability of LLMs with specialized tools that evolve through use. This balance between generalism and specialization is one of the central challenges in the field, and the publication offers concrete evidence that this architecture can work well in technically demanding contexts.

The study also connects directly with other recent work in the area, such as CellAgent for automated single-cell data analysis, the BioInformatics Agent (BIA), BioMaster, and CASSIA for cell annotation. What sets BioMedAgent apart from these initiatives is precisely the combination of multi-agent capability, tool self-evolution, and a comprehensive, publicly available benchmark for validation. This combination makes the framework more complete and easier to evaluate independently than the existing alternatives.

Finally, the timing of this publication is also worth noting. Global interest in artificial intelligence applications for healthcare has grown exponentially in recent years, and there is increasing pressure for solutions that are not only accurate but also accessible, transparent, and auditable. BioMedAgent delivers exactly that package: high technical capability, open-source code, public benchmarks, and an architecture that explains how it arrived at its results instead of simply delivering them as a black box. In a landscape where trust in AI systems applied to healthcare still needs to be built step by step, that makes all the difference. 🔬

Picture of Rafael

Rafael

Operations

I transform internal processes into delivery machines — ensuring that every Viral Method client receives premium service and real results.

Fill out the form and our team will contact you within 24 hours.

Related publications

AI SDR Agent on WhatsApp: How SMBs Can Cut Costs and Scale Sales

Respond 21x faster your leads and scale your sales operation with a fraction of the cost of expanding your sales

Robot Detects Unusual Browser Activity Using JavaScript and Cookies

Learn why sites require JavaScript and cookies for unusual activity and how to fix blocks with quick, simple steps

Productivity with Agentic Artificial Intelligence in execution and workflows.

Agentic AI: how to operationalize AI agents to improve workflows, metrics, and governance, turning pilots into real productivity gains.

Receive the best innovation content in your email.

All the news, tips, trends, and resources you're looking for, delivered to your inbox.

By subscribing to the newsletter, you agree to receive communications from Método Viral. We are committed to always protecting and respecting your privacy.

Rafael

Online

Atendimento

Calculadora Preço de Sites

Descubra quanto custa o site ideal para seu negócio

Páginas do Site

Quantas páginas você precisa?

4

Arraste para selecionar de 1 a 20 páginas

📄

⚡ Em apenas 2 minutos, descubra automaticamente quanto custa um site em 2026 sob medida para o seu negócio

👥 Mais de 0+ empresas já calcularam seu orçamento

Fale com um consultor

Preencha o formulário e nossa equipe entrará em contato.