The hidden risks and rewards of AI scribes in medicine

For decades physicians have dreamed of a simple technological solution to one of modern medicine’s most corrosive burdens: documentation. The electronic health record promised efficiency but delivered something quite different: extra burden. Physicians now spend hours each evening completing notes, clicking boxes, and reconciling fields, what has become known as “pajama time.” The keyboard has often replaced eye contact. The screen has displaced the patient. Ambient AI scribes arrived with the promise of reversing this trajectory, though, as I discussed in a prior essay, the gains may be more limited in specialties such as oncology and psychiatry, where nuanced conversations and complex clinical reasoning resist easy transcription. In principle, the concept of employing AI scribes is elegant. Software listens to the conversation between doctor and patient, transcribes it, and generates a structured clinical note that the physician edits and signs. The physician can remain focused on the patient rather than the computer.

Early data suggest the benefits may be real. One large implementation across more than 2.5 million encounters reported dramatic reductions in documentation time, saving physicians an estimated 15,700 hours of note-writing in a single year while improving physician satisfaction and patient interaction. For many clinicians, the technology feels like a long-awaited reprieve. Yet as with nearly every technological advance in medicine, the early narrative is more complicated than the promotional headlines suggest. A growing body of research suggests that while AI scribes may alleviate clerical burden, they also introduce new categories of risk, particularly in the subtle terrain where documentation and clinical reasoning intersect. One recent analysis of real-world clinician feedback identified multiple types of patient safety concerns arising from AI-generated notes. These included incorrect medication names and dosages, fabricated or inaccurate medical history, misattribution of statements between patient and clinician, and omissions of key discussion points about diagnoses or treatment decisions. Even more concerning, some reports described what clinicians increasingly recognize as a familiar phenomenon in generative AI: hallucinations. The scribe occasionally inserted diagnoses, exam findings, or clinical details that had never been discussed during the encounter. Such errors are not necessarily common, but their presence is noteworthy (pun intended).

A transcription error in a grocery list is inconsequential. A transcription error in a clinical record can alter treatment decisions, billing codes, or medicolegal documentation. The technology’s promise therefore sits alongside a fundamental clinical reality: Physicians remain responsible for every word in the chart. Interestingly, physician experience with AI scribes reflects a pattern that historians of technology would recognize instantly. Early adopters report both enthusiasm and unease. Qualitative interviews of physicians using ambient AI scribes reveal overwhelmingly positive reactions regarding workload and patient engagement. Many clinicians report improved eye contact, more natural conversations, and reduced cognitive strain during visits. Yet those same interviews reveal persistent dissatisfaction with accuracy, note construction, and editing requirements. Physicians frequently report transcription mistakes, misheard words, or stylistic issues that require substantial revision before a note can be finalized. The result is an ironic paradox: The technology designed to eliminate documentation burden sometimes shifts that burden into a different form, editing rather than writing.

This pattern is not unique to AI scribes. It reflects a broader truth about early technological adoption in medicine. The first generation of any new tool tends to produce two reactions simultaneously: optimism and overconfidence. Optimism is understandable. Physicians who have struggled for years with inefficient documentation systems are understandably eager for anything that restores time and attention to patient care. The possibility of reclaiming the human side of medicine is deeply appealing. Overconfidence, however, can be more subtle. When a technology performs well most of the time, users may begin to trust it more than they should. Automation bias, the human tendency to defer to machine-generated outputs, has been documented across multiple domains of clinical decision support. The same risk applies to documentation systems. If an AI-generated note appears polished and complete, the temptation is to skim rather than scrutinize. That temptation may be particularly strong at the end of a long clinic day.

But a chart is not merely a record of a conversation. It is a clinical artifact that influences downstream care: Consultants read it, nurses rely on it, pharmacists verify medications through it, and lawyers examine it when outcomes are questioned. In other words, the note is not just a summary. It is infrastructure. There is also a deeper concern that goes beyond accuracy. Documentation has historically been part of the thinking process itself. As one recent JAMA commentary observed, dictation once forced physicians to organize their thoughts aloud, to transform scattered observations into a coherent narrative of clinical reasoning. In that sense, the consult note was not merely documentation but a cognitive pause in which understanding took shape. As AI-generated documentation becomes more automated, some of that reflective work may no longer occur. This is where the conversation about AI scribes must mature. The question is not whether the technology is useful. By most accounts, it clearly is. Reducing physician documentation burden is a worthy goal, and restoring face-to-face attention between doctors and patients may be one of the most meaningful improvements digital health can deliver. The more important question is how we integrate these tools safely. That integration requires humility about the limits of current systems. It requires rigorous monitoring for errors that may emerge only after widespread deployment. It requires thoughtful workflow design so that physicians remain active editors rather than passive signers of machine-generated notes. Most importantly, it requires preserving a core principle of medical professionalism: Responsibility cannot be automated. Technology can assist. It can augment. It can streamline. But the final act of clinical documentation, the moment when the note becomes part of the patient’s permanent medical record, remains a human responsibility.

Arthur Lazarus is a former Doximity Fellow, a member of the editorial board of the American Association for Physician Leadership, and an adjunct professor of psychiatry at the Lewis Katz School of Medicine at Temple University in Philadelphia. He is the author of several books on narrative medicine and the fictional series Real Medicine, Unreal Stories. His latest book, a novel, is JAILBREAK: When Artificial Intelligence Breaks Medicine.