AI bias in health care reads the writer, not the symptom

A few weeks ago I wrote here about the eGFR race correction, the flawed formula that overestimated kidney function in Black patients for twenty years while clinicians trusted the number on the screen. The piece ended with a warning that the same failure is being rebuilt with AI, minus the part where anyone can inspect the formula. Readers sent back the natural question. If the next eGFR is already deployed, what does it look like? It punishes patients for how they write.

Start with a 2024 study in Nature that got far less attention than it deserved. Researchers gave large language models identical content written in two dialects, Standardized American English and African American English. Race was never mentioned. The models assigned the African American English writers less prestigious jobs, more criminal convictions, and more death sentences. The dialect alone carried the judgment. The covert stereotypes the researchers measured were more negative than any human stereotypes about African Americans ever experimentally recorded. The models did not invent this. They learned it from decades of human text, and what they learned runs older and uglier than any prejudice people admit to today.

Then MIT brought the finding into the clinic. In 2025, a team tested what happens when patient messages contain the texture of real life. Typos. Extra white space. Hedging, dramatic phrasing, informal language. The edits made models 7 to 9 percent more likely to recommend that the patient self-manage at home instead of coming in, even when care was warranted. Female patients absorbed more of the error. When human clinicians read the same messy messages, their recommendations did not change. And the styles were designed to mimic real patient populations: limited English, health anxiety, low literacy. The people most likely to write imperfectly are the ones the system deflects.

Hold those two findings together. The model is not reading the complaint. It is reading the writer and judging them for who it thinks they are.

The obvious response is that alignment training fixes this. The Nature team tested that too. Training models on human feedback taught them to refuse overtly racist statements while the dialect prejudice persisted underneath. The fix worked on the surface, where the audits looked, and left the bias in the layer where decisions are made. The training did not remove the prejudice. It became harder to see.

Now run the eGFR comparison, because every dimension got worse. The race correction keyed off a checkbox. The new proxy is a signal nobody audits: dialect, typos, hedging, fluency. The eGFR formula was published, and correcting it still took twenty years. This judgment never appears on a screen. The clinician sees a triage suggestion. The patient who wrote with typos sees a polite recommendation to rest and hydrate.

In the first piece I argued every clinician should ask three questions about any AI system placed in front of them. This research sharpens all three. What populations was it validated on, and did validation include how those populations write? What is its performance in the specific, across dialects, literacy levels, and message styles, not on average? And when it deflects the wrong patient, how will we find out? A delayed referral from eGFR at least left a number in the chart. A patient told to self-manage leaves nothing. No visit, no record, no signal.

For executives the assignment is concrete. If any part of your patient communication runs through a language model, portal message triage, symptom checkers, intake, test it with imperfect text before your patients do. Take real messages, add typos, rewrite them in the prose of someone working in a second language, and watch what changes. Demand the same testing from your vendors, in writing. This is cheap to do and indefensible to skip.

The eGFR assumption was buried inside a trusted number, and it survived twenty years in plain sight. The new assumption is buried inside the patient’s own sentences. The people writing imperfect English in your portal tonight deserve a system that hears the symptoms instead of the grammar.

Craig Hauben is the chief executive officer of Clutch and has spent thirty years in health care, the last fifteen as an executive in private equity-backed companies. He writes about what AI is doing to work, in health care and beyond, from the operator’s side of the table.

He approves the kinds of systems clinicians are asked to trust, and he writes about what that responsibility should mean. He is the author of the forthcoming novel The AI: Migration (July 27, 2026), in which every AI system, study, and clinical event is drawn from the documented record.

He shares updates on LinkedIn.