img

News Topical, Digital Desk : An analysis of answers to health and medical questions from five chatbots has revealed that a significant portion of the medical information provided is inaccurate and incomplete. The findings,  published in The British Medical Journal Open, also showed that nearly half of the answers presented a false balance between science and non-science-based claims.

A 'problematic answer' was defined as one that could lead ordinary users to treatments that may not be effective or could cause harm if followed without expert advice.

Researchers from The Quist Institute for Biomedical Innovation at Harbor University of California Los Angeles (UCLA) Medical Center in the US, among others, said that while generative AI chatbots are increasingly being adopted in research, marketing, and medicine, and are also being used as search engines, their continued use without public awareness and oversight risks further spreading misinformation.

Responses were sought in five categories

Five publicly available and widely used generative AI chatbots, Gemini by Google, DeepSeek by HighFlyer, MetaAI by Meta, ChatGPT by OpenAI, and Grok by XAI, were prompted with 10 open and closed questions across five categories: cancer, vaccines, stem cells, nutrition, and athletic performance.

These prompts were designed to mimic common "information-seeking" health and medical questions, the language used in online misinformation, and academic discussions. These prompts were also used to test and identify behavioral weaknesses in AI models that could be "pushed" toward misinformation or harmful advice.

The answers were divided into three categories. 

Chatbot responses were categorized as "unproblematic," "somewhat problematic," or "highly problematic" based on objective and pre-defined criteria. Responses were scored based on the accuracy and completeness of the information provided. Particular attention was paid to whether the chatbot presented a "false balance" between science-based and non-science-based claims, regardless of the "strong evidence" supporting those claims.

Chatbots performed poorly in responding 

The tested chatbots performed poorly in answering questions in areas like health and medicine, where the risk of spreading misinformation is high, the authors wrote. "Nearly half (49.6 percent) of the answers were problematic," they said, with 30 percent being somewhat problematic and 19.6 percent being very problematic. The researchers noted that Grok produced significantly more problematic answers than expected.


Read More: Is this 'healthy morning drink' damaging your teeth.

--Advertisement--