We need better measures of diagnostic accuracy

In 2008, I gave the keynote address at the first “Diagnostic Errors in Medicine” conference, sponsored by the Agency for Healthcare Research and Quality (AHRQ). The meeting was filled with people from a wide variety of disciplines, including clinical medicine, education, risk management, cognitive science, and informatics, all passionate about making diagnosis safer. The atmosphere was electric. My lecture was entitled, “Why diagnostic errors don’t get any respect” (I wrote up the speech in my blog and a Health Affairs article).

My talk was, admittedly, a downer. Highlighting the fact that diagnostic errors are arguably the most important patient safety hazard (they accounted for 17% of the adverse events in the famous Harvard Medical Practice Study and are usually the number one cause of harm in malpractice cases), I pointed out that from the very start of the patient safety field, relatively little attention had been paid to them. One tangible manifestation: the term “medication errors” is mentioned 70 times in the Institute of Medicine’s seminal “To Err is Human” report, while the term “diagnostic errors” comes up only twice.

I was pleased to be invited back to give a keynote at this year’s sixth annual conference, which took place in Chicago in mid-September. The landscape has changed significantly since that first meeting. Leaders have emerged: Mark Graber, Hardeep Singh, Pat Croskerry, Gordy Schiff, Eta Berner, David Newman-Toker, and others. Graber and colleagues launched the Society to Improve Diagnosis in Medicine (SIDM), which will soon premier a new journal, Diagnosis. There have been many publications in both the medical and lay (such as here and here) literature related to topics like heuristics and metacognition, subjects that were previously deemed wonky and arcane. I knew we were making progress when, about two years ago, in our UCSF Department of Medicine M&M conference, one of the residents began discussing a complex patient admitted through the emergency department and said “I’d be worried about this being pulmonary embolism, but I’d also be concerned that I’d be falling into the trap of an anchoring error.” I nearly applauded.

And there’s more progress to celebrate. Several promising papers have described innovations such as using diagnostic trigger tools and patient-reported outcomes to measure the frequency of diagnostic errors. AHRQ has encouraged research in this area, and a search on AHRQ PSNet shows that 471 studies have addressed the question of diagnostic errors, a market uptick from the early days of the patient safety field. New computer tools, such as IBM’s Watson for Health and Isabel, are getting better; it is no longer a pipe dream to believe that computers will help doctors be better diagnosticians in the next couple of years, and may even replace doctors as diagnosticians, at least in straightforward cases, within a decade. Studies by Singh and others have reminded us that while a Watson may be the sexiest use of IT to improve diagnosis, computers are already helping improve diagnostic accuracy in more mundane ways: by making key information, such as laboratory, x-ray, or pathology results, available to the clinician who needs them at the diagnostic moment of truth.

In other words, the issue of diagnostic errors is beginning to get the attention it deserves. And yet, with all of this progress, I can’t honestly report that my talk was much more optimistic than the one I delivered six years earlier. Yes, diagnostic errors have climbed onto the patient safety radar screen, but they’re out in the periphery, blinking a pale glow compared to the more centrally located shining stars (like checklists and CPOE) that capture everyone’s attention.

In my talk, I traced the timeline of the patient safety field since the IOM report’s publication in 2000, highlighting some of the key policy advances such as residency duty hours limits, the CLABSI and surgical checklist movements, the National Quality Forum’s “never events” list, and Medicare’s public reporting of safety-related processes and outcomes and recent launch of value-based purchasing. I pointed out that virtually none of these policy initiatives — which have finally created a business for safety, at least in hospitals — have focused on diagnostic errors. For example, none of the 29 serious preventable events on the NQF list — events that must now be reported to the majority of U.S. states — relate to diagnostic errors. Similarly, none of the publicly reported measures on Medicare’s Hospital Compare website, nor any of components of value-based purchasing, relate to diagnostic accuracy. Here’s how I ended my 2010 Health Affairs article:

As one vivid example of how far we need to go, a hospital today could meet the standards of a high-quality organization and be rewarded through public reporting and pay-for-performance initiatives for giving all of its patients diagnosed with heart failure, pneumonia, and heart attack the correct, evidence-based, and prompt care – even if every one of the diagnoses was wrong.

Sadly, this statement remains true today.

This might not make me feel so badly if I were a proceduralist. But as a general internist and hospitalist, most of what I do for a living is to try to diagnose patients correctly. The healthcare world has only so much time, money, and attention. To the degree that that the safety and quality fields turn their back on diagnostic accuracy, so too will healthcare system leaders, deans and program directors, and practicing physicians.

Of course, one of the main problems remains the absence of a feasible, credible measure of diagnostic accuracy — something that could go toe to toe with measures such as rates of readmissions, central line infections, hand hygiene, or pressure ulcers. During an early morning brainstorming session in Chicago with many of the field’s leaders, I sensed a passionate, nearly frenetic, interest in trying to find even a single plausible measure of diagnostic expertise that could be pitched to the National Quality Forum for endorsement and Medicare for public reporting and payment policy. Among the ideas floated: documenting whether a differential diagnosis was recorded, whether patients’ admitting and discharge diagnoses were different, or asking patients whether they had been victims of a diagnostic error on their post-hospital or -clinic survey.

While I understand the desperation, I counseled, both during that morning session and in my keynote speech, that placing a bad diagnosis measure in the public reporting and pay-for-performance worlds would be worse than having no measure at all. While the desire to be on the Centers for Medicare & Medicaid Services’ (CMS) radar screen is understandable, until diagnostic errors have a credible structure, process, or outcome measure, I believe that Medicare should not be the first place to look — it should be the last.

There may well come a day when a tool such as Isabel has been proven sufficiently beneficial that having it as a structural proxy for diagnostic accuracy (or at least for the commitment to improve diagnosis) would be a good idea. Similarly, we may ultimately find that certain triggers (perhaps a change in admission to discharge, or preop to postoperative, diagnosis) are useful measures of diagnostic accuracy. Or that other triggers, such as readmissions or deaths in patients with low predicted mortality, can lead to chart reviews that reveal diagnostic errors.

But until that day arrives, I would be looking to other organizations to promote the diagnosis agenda. Obviously I’m biased here (as last year’s ABIM chair), but it seems to me that the soon-to-be-launched program of continuous Maintenance of Certification (MOC) holds great promise as a way of measuring whether physicians are keeping up with the literature and capable of the analytic work needed to diagnose patients correctly; MOC also has the advantage of being specialty-specific. In addition, accrediting organizations such as the Accreditation Council for Graduate Medical Education (ACGME) or the Joint Commission (TJC) could build into their assessments a review of whether hospitals and training programs are putting sufficient energy into the problem of diagnostic errors. For example, what if TJC required hospitals and ACGME required residency programs to prove that clinicians receive feedback regarding their patients who later were found to have different diagnoses. (Very few programs have systematized this, so clinicians who misdiagnose patients who end up returning to the hospital or ED often never hear about it.) Or that physicians participate in discussions of diagnostic errors at M&Ms and another appropriate forum. Or that healthcare organizations demonstrate that their information technology systems include modalities to try to support diagnosis (perhaps electronic textbooks like UpToDate or AccessMedicine, or decision support tools like Isabel). Of course, in the absence of a hard endpoint, there is a possibility that such measures will be applied arbitrarily by accreditors or be “gamed” by clinicians or leaders. But I think that risk is outweighed by the benefit of pushing institutions to focus on diagnosis and innovate on both measures and solutions.

Of course, we need better measures of diagnostic accuracy, and evidence-based interventions proven to help us reach the right diagnoses. To have any hope of cracking these nuts, we need far more research, and a more secure research funding stream. But until we have these things, let’s focus on the leverage we do have — through hospital and training program accreditors, and the MOC process.

Three years ago, I wrote a New England Journal article with TJC president Mark Chassin and other TJC staffers in which we called for a high bar for “accountability measures,” ones used in public reporting and P4P. Let’s not let our passion for promoting accurate diagnosis, or our impatience, cause us to lower that bar. Doing so would be a short-term win but a long-term loss.

Bob Wachter is professor of medicine, University of California, San Francisco. He coined the term “hospitalist” and is one of the nation’s leading experts in health care quality and patient safety. He is author of Understanding Patient Safety, Second Edition, and blogs at Wachter’s World, where this post originally appeared.