Skip to content
  • About
  • Contact
  • Contribute
  • Book
  • Careers
  • Podcast
  • Recommended
  • Speaking
  • All
  • Physician
  • Practice
  • Policy
  • Finance
  • Conditions
  • .edu
  • Patient
  • Meds
  • Tech
  • Social
  • Video
    • All
    • Physician
    • Practice
    • Policy
    • Finance
    • Conditions
    • .edu
    • Patient
    • Meds
    • Tech
    • Social
    • Video
    • About
    • Contact
    • Contribute
    • Book
    • Careers
    • Podcast
    • Recommended
    • Speaking

Medical statistics errors: How bad data hurts clinicians

Gerald Kuo
Conditions
November 27, 2025
Share
Tweet
Share

Last winter, a senior nurse in our psychiatric unit told me, “The dashboard says we’re low-risk. But during night shifts, I don’t even feel safe walking to the bathroom.”

The monthly quality report on her desk said the same thing it had said for nearly a year: “Violence incidents: no significant difference among the three wards (p > .05).”

On paper, her ward looked normal. At the bedside, it was anything but.

Her unit cared for more high-acuity patients, had much higher turnover, and used restraints more frequently. Staff were not the problem. Patients were not the problem. The statistics were.

The mistake: treating event counts as if they were average scores

The reassuring report was based on a very common statistical error. The analyst used ANOVA, a method designed to compare averages, to compare counts of violent incidents.

In hospitals, there are two very different kinds of numbers:

  • Counts: How many times something happened (20 violent incidents, 7 falls, 6 code blues).
  • Means: How large something is on average (average documentation hours, average pain scores, average blood pressure).

Counts answer “how many.” Means answer “how much.” They are not interchangeable.

In our hospital, the three wards reported:

  • Ward A (psychiatric): 20 violence incidents
  • Ward B (medical): 7 incidents
  • Ward C (surgical): 6 incidents

To any clinician, the difference is obvious. But ANOVA does not see “20 vs. 7 vs. 6” the way we see it. It transforms them into averages per patient. If each ward cared for about 100 patients, the numbers become:

  • 0.20 incidents per patient
  • 0.07 incidents per patient
  • 0.06 incidents per patient

Once converted, the dramatic difference collapses into three small decimals. Because the event counts are low and because ANOVA is not designed for yes-or-no events, it easily concludes that the difference might be random. The official report then states: no significant difference.

It is like using a ruler to decide how many cats you have. The wrong tool makes very different groups appear the same. A chi-square test, which is designed for categorical counts, would almost certainly have flagged Ward A as truly higher risk.

But using the wrong method produced the wrong message: All wards are the same.

ADVERTISEMENT

The human consequences of “no significant difference”

Once the report was distributed, the consequences were immediate and painful.

  • Requests for additional staff from the psychiatric unit were denied. Leadership believed the ward’s risk was not statistically higher.
  • Concerns from frontline nurses were reframed as emotional rather than evidence-based.
  • Administrators felt confident in the p-value, thinking they were being fair.

Meanwhile, the gap between data and reality grew wider.

Nurses learned a frustrating lesson: The numbers on the slide deck do not describe the world they work in. Some left. Those who stayed carried the workload and the emotional weight.

Then the AI system arrived, trained on the same flawed numbers

Three months later, the hospital introduced an AI tool to predict agitation and violence. The idea was simple: train the model on past incidents, then flag high-risk patients.

But the AI learned from the same statistical misunderstanding that claimed all three wards had the same risk. To the algorithm, every ward looked similar.

The psychiatric ward soon became flooded with alerts. Medium-risk patients were labeled high-risk, while genuinely unstable patients were occasionally missed. A junior nurse told me, “When everyone is high-risk, no one is high-risk.”

Alert fatigue set in. A tool designed to increase safety was now undermining trust.

When AI overrules clinical instincts

During one busy evening, our 62-year-old attending physician checked the AI overlay on a newly admitted patient. The display showed a calm green label: low risk of agitation.

The charge nurse disagreed. She noticed the patient’s pacing, facial tension, and escalating voice. “I have a bad feeling about this,” she said.

Pressed for time and seeing the AI’s confident label, the attending sided with the model. 10 minutes later, the patient punched a resident in the face.

Afterward, the attending said quietly, “Maybe I’m getting old. Maybe the AI sees things I don’t.”

But the AI was not seeing more. It was repeating the wrong statistics it had been trained on. The harm was not only the physical injury. It was the self-doubt planted in a clinician with decades of experience.

A second problem: stopping at ANOVA and skipping post-hoc tests

Another mistake came from a different type of analysis.

When the hospital compared average documentation time across three departments, ANOVA was correctly used. The p-value was less than 0.01, showing a real difference. But the analysis stopped there. No one asked the next question: Exactly which departments differ from one another?

Post-hoc tests, such as Tukey’s test, answer that question. They can reveal findings such as:

  • Department Z documents significantly more than Departments X and Y.
  • Departments X and Y are not significantly different from each other.

Without that step, leadership responded with a blanket policy: “Everyone must reduce documentation time by 20 minutes.”

The department drowning in paperwork received no targeted help. The other two were forced to cut time they did not have, just to meet a number.

When results like this feed AI models that attempt to identify “inefficient” units, the algorithm quietly learns the same vague message: Everyone is part of the problem.

How these statistical choices affect clinicians

These mistakes do not stay inside spreadsheets. They show up as:

  • False reassurance
  • False alarms
  • Automation bias
  • Erosion of clinical judgment
  • Loss of trust in data and AI
  • Frontline fatigue

This is how bad statistics hurt good clinicians.

The solution is basic, not high-tech.

Protecting clinicians in the age of AI starts long before the algorithm. It begins with the data.

  • Use chi-square for event counts.
  • Use ANOVA for averages.
  • Follow ANOVA with post-hoc tests when appropriate.
  • Pair p-values with simple counts and percentages.
  • Recognize that “not significant” does not always mean “no difference.”
  • Teach clinicians just enough statistics to ask, “What exactly are we comparing?”
  • Make sure AI systems learn from correctly analyzed data.

This is not about turning clinicians into statisticians. It is about giving them trustworthy numbers.

AI does not erode clinical judgment; bad data does

When our statistics are wrong, our AI will be wrong. When AI is wrong, clinicians doubt themselves.

AI did not tell the psychiatric nurse her ward was safe. The misused ANOVA did. AI did not weaken the attending’s instincts. A long chain of statistical shortcuts did.

Protecting clinical judgment in the age of AI does not start with the algorithm. It starts with the numbers we feed into it, and with listening to the clinicians who knew something was wrong long before the p-value did.

Gerald Kuo, a doctoral student in the Graduate Institute of Business Administration at Fu Jen Catholic University in Taiwan, specializes in health care management, long-term care systems, AI governance in clinical and social care settings, and elder care policy. He is affiliated with the Home Health Care Charity Association and maintains a professional presence on Facebook, where he shares updates on research and community work. Kuo helps operate a day-care center for older adults, working closely with families, nurses, and community physicians. His research and practical efforts focus on reducing administrative strain on clinicians, strengthening continuity and quality of elder care, and developing sustainable service models through data, technology, and cross-disciplinary collaboration. He is particularly interested in how emerging AI tools can support aging clinical workforces, enhance care delivery, and build greater trust between health systems and the public.

Prev

Why food perfectionism harms parents

November 27, 2025 Kevin 0
…

Kevin

Tagged as: Hospital-Based Medicine

Post navigation

< Previous Post
Why food perfectionism harms parents

ADVERTISEMENT

More by Gerald Kuo

  • AI in medical imaging: When algorithms block the view

    Gerald Kuo
  • Protecting elder clinicians from violence

    Gerald Kuo

Related Posts

  • Medical errors? Sorry, not sorry.

    Iris Kulbatski, PhD
  • The criminalization of true medical errors is a step backwards for patient safety

    Michael Ramsay, MD
  • A new boon for Big Data and patient care

    Michael R. McGuire
  • Medical school gap year: Why working as a medical assistant is perfect

    Natalie Enyedi
  • End medical school grades

    Adam Lieber
  • Navigating mental health challenges in medical education

    Carter Do

More in Conditions

  • Why food perfectionism harms parents

    Wendy Schofer, MD
  • Autism prevalence surveillance: a reckoning, not a crisis

    Ronald L. Lindsay, MD
  • Our relationship with medicine: a triumph

    Joseph Shaw
  • Is direct primary care sustainable in a downturn?

    Dana Y. Lujan, MBA
  • How movement improves pelvic floor function

    Martina Ambardjieva, MD, PhD
  • How immigrant physicians solved a U.S. crisis

    Eram Alam, PhD
  • Most Popular

  • Past Week

    • Direct primary care in low-income markets

      Dana Y. Lujan, MBA | Policy
    • The Silicon Valley primary care doctor shortage

      George F. Smith, MD | Physician
    • Remote second opinions for equitable cancer care

      Yousuf Zafar, MD | Conditions
    • Why we fund unproven autism therapies

      Ronald L. Lindsay, MD | Physician
    • Why mocking food allergies in movies is a life-threatening problem [PODCAST]

      The Podcast by KevinMD | Podcast
    • Medical statistics errors: How bad data hurts clinicians

      Gerald Kuo | Conditions
  • Past 6 Months

    • Why you should get your Lp(a) tested

      Monzur Morshed, MD and Kaysan Morshed | Conditions
    • Rebuilding the backbone of health care [PODCAST]

      The Podcast by KevinMD | Podcast
    • Direct primary care in low-income markets

      Dana Y. Lujan, MBA | Policy
    • The dismantling of public health infrastructure

      Ronald L. Lindsay, MD | Physician
    • The flaw in the ACA’s physician ownership ban

      Luis Tumialán, MD | Policy
    • The psychological trauma of polarization

      Farid Sabet-Sharghi, MD | Physician
  • Recent Posts

    • Medical statistics errors: How bad data hurts clinicians

      Gerald Kuo | Conditions
    • Why food perfectionism harms parents

      Wendy Schofer, MD | Conditions
    • A husband’s story of end-of-life care at home

      Ron Louie, MD | Physician
    • Why being your own financial planner is costing you millions [PODCAST]

      The Podcast by KevinMD | Podcast
    • The H-1B crutch in rural health care

      Anonymous | Physician
    • Autism prevalence surveillance: a reckoning, not a crisis

      Ronald L. Lindsay, MD | Conditions

Subscribe to KevinMD and never miss a story!

Get free updates delivered free to your inbox.


Find jobs at
Careers by KevinMD.com

Search thousands of physician, PA, NP, and CRNA jobs now.

Learn more

Leave a Comment

Founded in 2004 by Kevin Pho, MD, KevinMD.com is the web’s leading platform where physicians, advanced practitioners, nurses, medical students, and patients share their insight and tell their stories.

Social

  • Like on Facebook
  • Follow on Twitter
  • Connect on Linkedin
  • Subscribe on Youtube
  • Instagram

ADVERTISEMENT

ADVERTISEMENT

  • Most Popular

  • Past Week

    • Direct primary care in low-income markets

      Dana Y. Lujan, MBA | Policy
    • The Silicon Valley primary care doctor shortage

      George F. Smith, MD | Physician
    • Remote second opinions for equitable cancer care

      Yousuf Zafar, MD | Conditions
    • Why we fund unproven autism therapies

      Ronald L. Lindsay, MD | Physician
    • Why mocking food allergies in movies is a life-threatening problem [PODCAST]

      The Podcast by KevinMD | Podcast
    • Medical statistics errors: How bad data hurts clinicians

      Gerald Kuo | Conditions
  • Past 6 Months

    • Why you should get your Lp(a) tested

      Monzur Morshed, MD and Kaysan Morshed | Conditions
    • Rebuilding the backbone of health care [PODCAST]

      The Podcast by KevinMD | Podcast
    • Direct primary care in low-income markets

      Dana Y. Lujan, MBA | Policy
    • The dismantling of public health infrastructure

      Ronald L. Lindsay, MD | Physician
    • The flaw in the ACA’s physician ownership ban

      Luis Tumialán, MD | Policy
    • The psychological trauma of polarization

      Farid Sabet-Sharghi, MD | Physician
  • Recent Posts

    • Medical statistics errors: How bad data hurts clinicians

      Gerald Kuo | Conditions
    • Why food perfectionism harms parents

      Wendy Schofer, MD | Conditions
    • A husband’s story of end-of-life care at home

      Ron Louie, MD | Physician
    • Why being your own financial planner is costing you millions [PODCAST]

      The Podcast by KevinMD | Podcast
    • The H-1B crutch in rural health care

      Anonymous | Physician
    • Autism prevalence surveillance: a reckoning, not a crisis

      Ronald L. Lindsay, MD | Conditions

MedPage Today Professional

An Everyday Health Property Medpage Today
  • Terms of Use | Disclaimer
  • Privacy Policy
  • DMCA Policy
All Content © KevinMD, LLC
Site by Outthink Group

Leave a Comment

Comments are moderated before they are published. Please read the comment policy.

Loading Comments...