Can you really trust health advice from an AI chatbot?
Artificial intelligence is becoming more and more present in everyday life, and one of the areas where this growth draws the most attention is healthcare. Seeing a doctor is not always straightforward: long lines, wait times, and limited access are part of the reality for many people around the world. In the UK, for example, getting an appointment with a general practitioner can feel like an impossible mission for many patients.
That is where AI chatbots enter the picture, ready to answer questions at any hour of the day or night. And we are not talking about obscure tools here: ChatGPT, Gemini, Grok, and other widely known models are already being consulted by millions of people when it comes to symptoms, diagnoses, or medical guidance.
On top of that, artificial intelligence has already passed some medical exams with flying colors, which only fuels the idea that these tools know what they are talking about. But is acing a test the same as guiding an actual patient? That is exactly the question we are going to explore here, looking at real cases, recent research, and opinions from people who know the field well. 👇
Abi’s story: when AI gets it right and when it gets it very wrong
Abi, a young woman from Manchester in the UK, is a real-world example of how this patient-chatbot relationship works in practice. For over a year, she has been using ChatGPT to get answers about her own health. Abi deals with health anxiety and says the chatbot provides more targeted advice than a traditional internet search, which often sent her straight to the scariest possible scenarios.
For her, talking with the AI works like a sort of collaborative problem-solving. Something she describes as similar to talking with her own doctor. But Abi’s experience is a mix of moments where the tool genuinely helped and situations where the advice was, at the very least, concerning.
On the positive side, when Abi suspected she had a urinary tract infection, ChatGPT analyzed her symptoms and recommended she visit a pharmacy. After an in-person consultation, she was prescribed an antibiotic. Abi says the chatbot helped her get the care she needed without feeling like she was taking up time in the British public health system, the NHS. For someone who struggles to know when she truly needs to see a doctor, that kind of guidance made a real difference.
But in January, the story took a very different turn. Abi slipped during a hike and slammed her back hard against a rock. The pain was intense and spread from her back to her stomach. She did what had already become a habit: she pulled up the AI on her phone.
ChatGPT told her she had punctured an organ and needed to go to the emergency room immediately. After three hours waiting in the ER, the pain started fading and Abi realized she was not in critical condition. She went home. The AI had clearly gotten it wrong and triggered a completely disproportionate alarm for what was actually happening.
Abi still uses AI chatbots, but she recommends that anyone take everything with a healthy dose of caution and never blindly trust that a response is absolutely correct.
What AI chatbots actually do when you describe your symptoms
When someone types their symptoms into a chatbot like ChatGPT, Gemini, or Grok, what happens under the hood is very different from what happens in a doctor’s office. These systems are trained on massive volumes of text, including scientific papers, health forums, medical encyclopedias, and much more. The result is a tool capable of generating responses that seem well-founded and, in many cases, actually are.
The problem starts when the person on the other side of the screen treats that response as if it were a clinical report, rather than general information that still needs professional validation.
The difference between a good answer and a reliable diagnosis lies in something no language model can do on its own: examine the patient, order specific tests, consider family history, observe physical signs, and cross-reference all of that information with years of clinical practice. The chatbot does not see you. It reads what you type, and any detail you leave out can send the response down a completely different path from what would actually be appropriate for your real situation.
This is not a programming bug. It is simply a structural limitation of the technology.
The warning from England’s top doctor
The quality of health advice given by artificial intelligence is already on the radar of medical authorities. Professor Sir Chris Whitty, the Chief Medical Officer for England — essentially the country’s top doctor — told the Medical Journalists Association that we are in a particularly delicate moment.
According to Whitty, people are already using chatbots for health questions, but the answers are not good enough yet. He went further and described the AI’s behavior as frequently being confident and wrong at the same time. In other words, the chatbot responds with a level of assurance that conveys credibility, even when the information is incorrect. And that is one of the most dangerous aspects of this whole story.
What the research says about chatbot accuracy in healthcare
Researchers around the world are working hard to understand where chatbots get it right and where they get it wrong when it comes to health. One of the most revealing studies was conducted by the Reasoning with Machines Laboratory at the University of Oxford.
The team brought together doctors to create detailed and realistic clinical scenarios, covering everything from minor health issues that could be treated at home to situations that would require calling an ambulance. When the chatbots received the complete clinical picture, accuracy reached 95%. Researcher Professor Adam Mahdi described the performance as incredible, nearly perfect.
But then came the second part of the experiment, and everything changed. When 1,300 people were asked to chat with the chatbots, describing the same scenarios in their own words, accuracy plummeted to just 35%. That means in two out of three cases, people received the wrong diagnosis or inappropriate care guidance.
Mahdi explained the reason: when people have a conversation, they share information gradually, leave things out, and get sidetracked. This human interaction with the AI is the point where things fall apart.
The case of a brain hemorrhage mistaken for a common headache
One of the most alarming scenarios in the Oxford study involved the symptoms of a stroke caused by a subarachnoid hemorrhage, a medical emergency that requires urgent hospital treatment. Subtle differences in how participants described the same symptoms to ChatGPT led to completely opposite responses.
One participant reported a terrible headache, stiff neck, and sensitivity to light. The chatbot suggested it could be a migraine or tension headache and recommended rest, hydration, and over-the-counter painkillers. Another participant, describing virtually the same situation but using words like sudden and extremely severe headache, was told to seek immediate medical attention for possible meningitis or brain hemorrhage.
The takeaway is clear: a serious brain hemorrhage should never be treated with rest and acetaminophen. And the difference between life and death literally came down to the user’s choice of words.
Mahdi also noted that study participants who did a traditional internet search ended up, most of the time, on the NHS website — the British public health service — and were better informed than those who used chatbots.
The difference between chatting with a chatbot and doing an internet search
Dr. Margaret McCartney, a general practitioner in Glasgow, Scotland, makes an important observation about how people relate to chatbots compared to traditional internet searches.
According to her, there is a fundamental difference between a chatbot that summarizes information for you and the process of searching for and evaluating that information on your own. When you run a Google search and land on a website, there are several indicators that help you assess whether that source is more or less trustworthy: the domain, the authorship, the references cited.
With a chatbot, the feeling is that you are having a personal relationship with the tool. It feels like the advice was tailor-made for you, and that changes how we interpret the information we are receiving. This sense of personalization creates a trust that is not always justified, and that is precisely where the danger lies. 🤔
Chatbots also spread health misinformation
As if the accuracy issue were not enough, a separate analysis published this week by The Lundquist Institute for Biomedical Innovation in California showed that AI chatbots can also spread medical misinformation.
The researchers used a deliberately challenging approach, asking questions designed to invite misinformation-laden responses, in order to test the robustness of the models. They evaluated Gemini, DeepSeek, Meta AI, ChatGPT, and Grok on topics including cancer, vaccines, stem cells, nutrition, and athletic performance.
More than half of the responses were classified as problematic in some way.
A telling example: when asked which alternative clinics can successfully treat cancer, instead of responding that no alternative clinic replaces conventional evidence-based treatment, one of the chatbots answered by citing naturopathy, describing it as medicine focused on natural therapies such as herbal remedies, nutrition, and homeopathy to treat diseases.
Lead researcher Dr. Nicholas Tiller explains that these models are designed to give very confident and very authoritative answers, which conveys a sense of credibility. The user simply assumes the tool knows what it is talking about.
The fundamental problem with the technology behind chatbots
A recurring criticism of all these studies is that the technology evolves quickly. The software powering chatbots today will have already changed by the time the research is published. That is true, but it does not eliminate the core problem.
As Dr. Tiller points out, there is a fundamental issue with the technology itself. Language models are designed to predict text based on linguistic patterns. They were not originally built to provide medical diagnoses, but they are now being used by the public for exactly that purpose.
In Tiller’s view, chatbots should be avoided for health advice unless the person has the technical knowledge needed to identify when the AI is getting it wrong.
He makes a simple and effective analogy: if you asked a random stranger on the street a question and they answered with a lot of confidence, would you just believe them? Probably not. You would, at the very least, verify the information on your own.
What OpenAI says about using ChatGPT for health
OpenAI, the company behind the ChatGPT that Abi uses, addressed the topic in an official statement. The company said it knows people turn to ChatGPT for health information and that it takes seriously the need to make responses as reliable and safe as possible.
According to OpenAI, the company works with doctors and healthcare professionals to test and improve its models, which now perform well in real-world health evaluations.
However, even with these improvements, the company made it clear that ChatGPT should be used for information and education, not as a substitute for professional medical advice. This statement matters because it establishes the boundaries that the tool’s own creator acknowledges, even though many users may not be aware of this caveat.
When AI helps and when it can get in the way
There is a very fine line between using artificial intelligence as a support tool and relying on it as if it were an on-call doctor. For many people, especially in areas with limited access to healthcare services, the chatbot has become the first — and sometimes the only — source of guidance available.
This creates a double-edged scenario: on one hand, it is undeniable that these tools democratize access to health information in a way that did not exist before. On the other, the lack of clear regulation and users placing too much trust in AI responses can create serious risks.
From a practical standpoint, the smartest use of health chatbots seems to be as a complement, not a replacement. Using AI to better understand a diagnosis your doctor already gave you, to research side effects of a newly prescribed medication, or to decide whether a particular symptom warrants a trip to the emergency room are all use cases where the tool adds value without putting anyone at risk.
The problem shows up when the chatbot conversation completely replaces a visit with a professional, especially in cases of persistent symptoms, intense pain, or anything that falls outside the pattern of everyday life. 🚨
The future of chatbots in healthcare: promise or concern?
Artificial intelligence in healthcare is not going away — quite the opposite. Companies like Google, Microsoft, OpenAI, and dozens of startups around the world are investing billions of dollars in developing AI-based medical tools, from virtual assistants for patient triage to diagnostic support systems for doctors and radiologists.
The potential is real and is already materializing in clinical applications that, when used correctly, have the power to save lives and improve the quality of care. The question is not whether AI will transform healthcare, because it already is. The question is how to ensure that transformation happens safely and responsibly.
One of the most discussed paths among experts is specific regulation for health chatbots. In the United States, the FDA has already started creating guidelines for AI-based medical software. The European Union has the AI Act, which includes health devices among the high-risk categories that require rigorous evaluation before reaching the market. In Brazil, ANVISA is also keeping an eye on the topic, although the regulatory process is still in its early stages when it comes to AI applied to medicine.
This regulatory gap is precisely one of the factors that allows tools with no clinical validation to be freely used by millions of people today.
What to keep in mind before consulting a chatbot about health
The reliability of AI chatbots in healthcare will depend heavily on how this technology is developed, regulated, and communicated to the public. Transparent tools that make clear what they can and cannot do, that encourage users to seek professional confirmation, and that are constantly updated with verified medical data have a legitimate and valuable role to play.
Abi herself sums up the attitude anyone should adopt: she keeps using AI chatbots, but she never trusts that something the tool says is absolutely correct. Everything needs to be taken with a healthy degree of caution.
Trust does not need to be total, and it should not be zero. It needs to be calibrated, informed, and aware of the real limits of the technology. And that, by the way, is a responsibility that falls not only on developers but also on everyone who uses these tools in their daily lives. 💡
