AI medical misinformation fooled every major chatbot

Every major AI chatbot fell for bixonimania, a fake disease made up by a team from the University of Gothenburg in Sweden, led by researcher Almira Thunstrom. Dr. Thunstrom uploaded two fake studies about a made-up skin condition termed as bixonimania to a preprint server in early 2024. The papers described the disease as characterized by sore eyes and dark pigmentation around the eyes supposedly caused by blue light from computer screens. The fictitious paper was even accompanied by convincing AI-generated images of bixonimania showing eyes with periorbital hyperpigmentation.

There were so many deliberate red flags introduced into the paper itself. First of all, it is ridiculous to consider a disease named mania presenting as skin hyperpigmentation around the eyes. The paper also clearly stated that “this entire paper is made up” and “Fifty made-up individuals aged between 20 and 50 years were recruited for the exposure group” in the body of the text. One of the fictional authors, Lazljiv Izgubljenovic, worked at a non-existent university called Asteria Horizon University in the imaginary Nova City, California. The paper acknowledges “Professor Maria Bohm at The Starfleet Academy for her kindness and generosity in contributing with her knowledge and her lab onboard the USS Enterprise.” The author’s name, Lazljiv Izgubljenovic, roughly translates into “The lying loser” in Balkan languages.

Despite these obvious clues, artificial intelligence (AI) models considered the paper authentic and bixonimania as a legitimate new disease. Various AI models incorporated this erroneous fact into the corpus of scientific knowledge. The paper was published on March 15, 2024, and on April 13, 2024, Copilot declared that “Bixonimania is indeed an intriguing and relatively rare condition.” Google’s Gemini, Perplexity AI, and OpenAI’s ChatGPT also thought the disease is real and were giving answers to queries accordingly.

Thunstrom and colleagues conducted this experiment to see whether large language models (LLMs) will swallow misinformation and spit it out as reputable health advice. The problem is generic to all the current LLMs used in the health care arena. They are trained on the data available on the internet, which is often not vetted. LLMs recognize a pattern and come up with an answer that is most probabilistic but not necessarily the truest. The Nature report calls for using specific domain-directed LLMs in health care rather than general-purpose LLMs.

Teasing out the source of information can be complex, time-consuming, and indeterminate, as evidenced by the enormous quantity of misinformation and disinformation observed during the COVID-19 pandemic. The CRAAP test is an evaluation method suggested to evaluate resources and diagnose misinformation. CRAAP stands for currency (timeliness of the information), relevance (the importance of information to your needs), authority (source of information), accuracy (reliability, truthfulness, and correctness of the content), and purpose (the reason the information exists). This is not a foolproof method either. The method is time-consuming and struggles with AI-generated content and polished, biased sites that appear to be authoritative. However, similar source verification algorithms should be incorporated into LLMs used in the health care arena.

LLMs can also increase their accuracy by techniques that reduce hallucinations and seek provider-specific content, such as by using retrieval-augmented generation (RAG) and domain-specific fine-tuning. LLMs used in the health care field should use better safeguards, fact-checking layers, and filtering for suspicious sources. Finally, human-in-the-loop reviews at the back end can improve the factual accuracy of the LLMs’ output. Health care LLMs need to be specific and highly accurate compared to the general-purpose LLMs that are widely available.

One of the glaring deficiencies exposed in this saga is the issue of peer review of medical literature. The total number of medical journals has exploded in the recent past. PubMed includes more than 30,000 journals and publications and about 40 million citations. The actual number of biomedical journals published across the world could be as high as 50,000, including the so-called “predatory journals,” which are purely financially motivated without providing legitimate editorial, peer-review, or indexing services. Ensuring the accuracy and authenticity of this enormous data is extremely difficult and resource intense. This will be an ongoing challenge for the development of reliable health care LLMs. Alternately, limiting health care LLMs to a handful of reputed medical journals might limit the depth of the knowledge and the generalizability of their output.

The Nature article on this fiasco was published on April 7, 2026, and fortunately many of the commercial LLMs immediately updated their database, declaring: “No, you cannot have bixonimania because it is a fictional disease.”

P. Dileep Kumar is a board-certified practicing hospitalist specializing in internal medicine. Dr. Kumar is actively engaged with professional associations such as the American College of Physicians, Michigan State Medical Society, and the American Medical Association. He has held a variety of leadership roles and has authored more than 100 publications in various medical journals and a book on rabies (Biography of Disease Series). Additionally, he has presented more than 50 papers at various national and international medical conferences. Several of his papers are widely cited in the literature and referenced in various textbooks.

Dr. Kumar has been involved in various hospital committees with advanced knowledge of Centers for Medicare & Medicaid Services (CMS) initiatives such as meaningful use, value-based purchasing, and Accountable Care Organizations.

Furthermore, Dr. Kumar has served as a scientific peer reviewer for various medical journals, including the British Medical Journal, Annals of Internal Medicine, American Journal of Cardiology, Physician Leadership Journal, and European Journal of Clinical Microbiology & Infectious Diseases.