Skip to content
  • About
  • Contact
  • Contribute
  • Book
  • Careers
  • Podcast
  • Recommended
  • Speaking
  • All
  • Physician
  • Practice
  • Policy
  • Finance
  • Conditions
  • .edu
  • Patient
  • Meds
  • Tech
  • Social
  • Video
    • All
    • Physician
    • Practice
    • Policy
    • Finance
    • Conditions
    • .edu
    • Patient
    • Meds
    • Tech
    • Social
    • Video
    • About
    • Contact
    • Contribute
    • Book
    • Careers
    • Podcast
    • Recommended
    • Speaking

Medical statistics errors: How bad data hurts clinicians

Gerald Kuo
Conditions
November 27, 2025
Share
Tweet
Share

Last winter, a senior nurse in our psychiatric unit told me, “The dashboard says we’re low-risk. But during night shifts, I don’t even feel safe walking to the bathroom.”

The monthly quality report on her desk said the same thing it had said for nearly a year: “Violence incidents: no significant difference among the three wards (p > .05).”

On paper, her ward looked normal. At the bedside, it was anything but.

Her unit cared for more high-acuity patients, had much higher turnover, and used restraints more frequently. Staff were not the problem. Patients were not the problem. The statistics were.

The mistake: treating event counts as if they were average scores

The reassuring report was based on a very common statistical error. The analyst used ANOVA, a method designed to compare averages, to compare counts of violent incidents.

In hospitals, there are two very different kinds of numbers:

  • Counts: How many times something happened (20 violent incidents, 7 falls, 6 code blues).
  • Means: How large something is on average (average documentation hours, average pain scores, average blood pressure).

Counts answer “how many.” Means answer “how much.” They are not interchangeable.

In our hospital, the three wards reported:

  • Ward A (psychiatric): 20 violence incidents
  • Ward B (medical): 7 incidents
  • Ward C (surgical): 6 incidents

To any clinician, the difference is obvious. But ANOVA does not see “20 vs. 7 vs. 6” the way we see it. It transforms them into averages per patient. If each ward cared for about 100 patients, the numbers become:

  • 0.20 incidents per patient
  • 0.07 incidents per patient
  • 0.06 incidents per patient

Once converted, the dramatic difference collapses into three small decimals. Because the event counts are low and because ANOVA is not designed for yes-or-no events, it easily concludes that the difference might be random. The official report then states: no significant difference.

It is like using a ruler to decide how many cats you have. The wrong tool makes very different groups appear the same. A chi-square test, which is designed for categorical counts, would almost certainly have flagged Ward A as truly higher risk.

But using the wrong method produced the wrong message: All wards are the same.

ADVERTISEMENT

The human consequences of “no significant difference”

Once the report was distributed, the consequences were immediate and painful.

  • Requests for additional staff from the psychiatric unit were denied. Leadership believed the ward’s risk was not statistically higher.
  • Concerns from frontline nurses were reframed as emotional rather than evidence-based.
  • Administrators felt confident in the p-value, thinking they were being fair.

Meanwhile, the gap between data and reality grew wider.

Nurses learned a frustrating lesson: The numbers on the slide deck do not describe the world they work in. Some left. Those who stayed carried the workload and the emotional weight.

Then the AI system arrived, trained on the same flawed numbers

Three months later, the hospital introduced an AI tool to predict agitation and violence. The idea was simple: train the model on past incidents, then flag high-risk patients.

But the AI learned from the same statistical misunderstanding that claimed all three wards had the same risk. To the algorithm, every ward looked similar.

The psychiatric ward soon became flooded with alerts. Medium-risk patients were labeled high-risk, while genuinely unstable patients were occasionally missed. A junior nurse told me, “When everyone is high-risk, no one is high-risk.”

Alert fatigue set in. A tool designed to increase safety was now undermining trust.

When AI overrules clinical instincts

During one busy evening, our 62-year-old attending physician checked the AI overlay on a newly admitted patient. The display showed a calm green label: low risk of agitation.

The charge nurse disagreed. She noticed the patient’s pacing, facial tension, and escalating voice. “I have a bad feeling about this,” she said.

Pressed for time and seeing the AI’s confident label, the attending sided with the model. 10 minutes later, the patient punched a resident in the face.

Afterward, the attending said quietly, “Maybe I’m getting old. Maybe the AI sees things I don’t.”

But the AI was not seeing more. It was repeating the wrong statistics it had been trained on. The harm was not only the physical injury. It was the self-doubt planted in a clinician with decades of experience.

A second problem: stopping at ANOVA and skipping post-hoc tests

Another mistake came from a different type of analysis.

When the hospital compared average documentation time across three departments, ANOVA was correctly used. The p-value was less than 0.01, showing a real difference. But the analysis stopped there. No one asked the next question: Exactly which departments differ from one another?

Post-hoc tests, such as Tukey’s test, answer that question. They can reveal findings such as:

  • Department Z documents significantly more than Departments X and Y.
  • Departments X and Y are not significantly different from each other.

Without that step, leadership responded with a blanket policy: “Everyone must reduce documentation time by 20 minutes.”

The department drowning in paperwork received no targeted help. The other two were forced to cut time they did not have, just to meet a number.

When results like this feed AI models that attempt to identify “inefficient” units, the algorithm quietly learns the same vague message: Everyone is part of the problem.

How these statistical choices affect clinicians

These mistakes do not stay inside spreadsheets. They show up as:

  • False reassurance
  • False alarms
  • Automation bias
  • Erosion of clinical judgment
  • Loss of trust in data and AI
  • Frontline fatigue

This is how bad statistics hurt good clinicians.

The solution is basic, not high-tech.

Protecting clinicians in the age of AI starts long before the algorithm. It begins with the data.

  • Use chi-square for event counts.
  • Use ANOVA for averages.
  • Follow ANOVA with post-hoc tests when appropriate.
  • Pair p-values with simple counts and percentages.
  • Recognize that “not significant” does not always mean “no difference.”
  • Teach clinicians just enough statistics to ask, “What exactly are we comparing?”
  • Make sure AI systems learn from correctly analyzed data.

This is not about turning clinicians into statisticians. It is about giving them trustworthy numbers.

AI does not erode clinical judgment; bad data does

When our statistics are wrong, our AI will be wrong. When AI is wrong, clinicians doubt themselves.

AI did not tell the psychiatric nurse her ward was safe. The misused ANOVA did. AI did not weaken the attending’s instincts. A long chain of statistical shortcuts did.

Protecting clinical judgment in the age of AI does not start with the algorithm. It starts with the numbers we feed into it, and with listening to the clinicians who knew something was wrong long before the p-value did.

Gerald Kuo, a doctoral student in the Graduate Institute of Business Administration at Fu Jen Catholic University in Taiwan, specializes in health care management, long-term care systems, AI governance in clinical and social care settings, and elder care policy. He is affiliated with the Home Health Care Charity Association and maintains a professional presence on Facebook, where he shares updates on research and community work. Kuo helps operate a day-care center for older adults, working closely with families, nurses, and community physicians. His research and practical efforts focus on reducing administrative strain on clinicians, strengthening continuity and quality of elder care, and developing sustainable service models through data, technology, and cross-disciplinary collaboration. He is particularly interested in how emerging AI tools can support aging clinical workforces, enhance care delivery, and build greater trust between health systems and the public.

Prev

Why food perfectionism harms parents

November 27, 2025 Kevin 0
…
Next

Divorced during residency: a story of clarity

November 27, 2025 Kevin 0
…

Tagged as: Hospital-Based Medicine

Post navigation

< Previous Post
Why food perfectionism harms parents
Next Post >
Divorced during residency: a story of clarity

ADVERTISEMENT

More by Gerald Kuo

  • Technology for older adults: Why messaging apps are a lifeline

    Gerald Kuo
  • Why home-based care fails without integrated medication and nutrition

    Gerald Kuo
  • The role of operations research in health care crisis management

    Gerald Kuo

Related Posts

  • Medical errors? Sorry, not sorry.

    Iris Kulbatski, PhD
  • The criminalization of true medical errors is a step backwards for patient safety

    Michael Ramsay, MD
  • A new boon for Big Data and patient care

    Michael R. McGuire
  • Medical school gap year: Why working as a medical assistant is perfect

    Natalie Enyedi
  • End medical school grades

    Adam Lieber
  • Navigating mental health challenges in medical education

    Carter Do

More in Conditions

  • A school nurse’s story of trauma and nurse burnout

    Debbie Moore-Black, RN
  • SNF discharge planning: Why documentation is no longer enough

    Rafiat Banwo, OTD
  • How honoring patient autonomy prevents medical trauma

    Sheryl J. Nicholson
  • Why fear-based approaches fail in chronic illness care

    Bridgette Johnson, PhD, RN
  • Scrotal pain in young men: When to seek urgent care

    Martina Ambardjieva, MD, PhD
  • Technology for older adults: Why messaging apps are a lifeline

    Gerald Kuo
  • Most Popular

  • Past Week

    • How environmental justice and health disparities connect to climate change

      Kaitlynn Esemaya, Alexis Thompson, Annique McLune, and Anamaria Ancheta | Policy
    • Examining the rural divide in pediatric health care

      James Bianchi | Policy
    • Whole-body MRI screening: political privilege or future of care?

      Michael Brant-Zawadzki, MD | Physician
    • Medical brain drain leaves vulnerable communities without life-saving care [PODCAST]

      The Podcast by KevinMD | Podcast
    • A school nurse’s story of trauma and nurse burnout

      Debbie Moore-Black, RN | Conditions
    • The Dr. Google debate: Building a doctor-patient partnership

      Santina Wheat, MD, MPH | Physician
  • Past 6 Months

    • Why patient trust in physicians is declining

      Mansi Kotwal, MD, MPH | Physician
    • Is primary care becoming a triage station?

      J. Leonard Lichtenfeld, MD | Physician
    • The blind men and the elephant: a parable for modern pain management

      Richard A. Lawhern, PhD | Conditions
    • How environmental justice and health disparities connect to climate change

      Kaitlynn Esemaya, Alexis Thompson, Annique McLune, and Anamaria Ancheta | Policy
    • Psychiatrists are physicians: a key distinction

      Farid Sabet-Sharghi, MD | Physician
    • Catching type 1 diabetes before it becomes life-threatening [PODCAST]

      The Podcast by KevinMD | Podcast
  • Recent Posts

    • A school nurse’s story of trauma and nurse burnout

      Debbie Moore-Black, RN | Conditions
    • WISeR Medicare pilot: the new “AI death panel”?

      Arthur Lazarus, MD, MBA | Physician
    • Ghost networks in health care: Why physicians are suing insurers

      Timothy Lesaca, MD | Physician
    • SNF discharge planning: Why documentation is no longer enough

      Rafiat Banwo, OTD | Conditions
    • How honoring patient autonomy prevents medical trauma

      Sheryl J. Nicholson | Conditions
    • Regulatory red tape threatens survival of rare disease patients [PODCAST]

      The Podcast by KevinMD | Podcast

Subscribe to KevinMD and never miss a story!

Get free updates delivered free to your inbox.


Find jobs at
Careers by KevinMD.com

Search thousands of physician, PA, NP, and CRNA jobs now.

Learn more

Leave a Comment

Founded in 2004 by Kevin Pho, MD, KevinMD.com is the web’s leading platform where physicians, advanced practitioners, nurses, medical students, and patients share their insight and tell their stories.

Social

  • Like on Facebook
  • Follow on Twitter
  • Connect on Linkedin
  • Subscribe on Youtube
  • Instagram

ADVERTISEMENT

ADVERTISEMENT

  • Most Popular

  • Past Week

    • How environmental justice and health disparities connect to climate change

      Kaitlynn Esemaya, Alexis Thompson, Annique McLune, and Anamaria Ancheta | Policy
    • Examining the rural divide in pediatric health care

      James Bianchi | Policy
    • Whole-body MRI screening: political privilege or future of care?

      Michael Brant-Zawadzki, MD | Physician
    • Medical brain drain leaves vulnerable communities without life-saving care [PODCAST]

      The Podcast by KevinMD | Podcast
    • A school nurse’s story of trauma and nurse burnout

      Debbie Moore-Black, RN | Conditions
    • The Dr. Google debate: Building a doctor-patient partnership

      Santina Wheat, MD, MPH | Physician
  • Past 6 Months

    • Why patient trust in physicians is declining

      Mansi Kotwal, MD, MPH | Physician
    • Is primary care becoming a triage station?

      J. Leonard Lichtenfeld, MD | Physician
    • The blind men and the elephant: a parable for modern pain management

      Richard A. Lawhern, PhD | Conditions
    • How environmental justice and health disparities connect to climate change

      Kaitlynn Esemaya, Alexis Thompson, Annique McLune, and Anamaria Ancheta | Policy
    • Psychiatrists are physicians: a key distinction

      Farid Sabet-Sharghi, MD | Physician
    • Catching type 1 diabetes before it becomes life-threatening [PODCAST]

      The Podcast by KevinMD | Podcast
  • Recent Posts

    • A school nurse’s story of trauma and nurse burnout

      Debbie Moore-Black, RN | Conditions
    • WISeR Medicare pilot: the new “AI death panel”?

      Arthur Lazarus, MD, MBA | Physician
    • Ghost networks in health care: Why physicians are suing insurers

      Timothy Lesaca, MD | Physician
    • SNF discharge planning: Why documentation is no longer enough

      Rafiat Banwo, OTD | Conditions
    • How honoring patient autonomy prevents medical trauma

      Sheryl J. Nicholson | Conditions
    • Regulatory red tape threatens survival of rare disease patients [PODCAST]

      The Podcast by KevinMD | Podcast

MedPage Today Professional

An Everyday Health Property Medpage Today
  • Terms of Use | Disclaimer
  • Privacy Policy
  • DMCA Policy
All Content © KevinMD, LLC
Site by Outthink Group

Leave a Comment

Comments are moderated before they are published. Please read the comment policy.

Loading Comments...