Can generative artificial intelligence help clinicians better manage patient messages?

In today’s digitized, on-demand world, patients frequently use portals to send their physicians questions and requests. Today, physicians receive 57 percent more patient messages than before the pandemic. They spend the highest proportion of their inbox time on these messages, often responding after hours.

While messaging is an essential care access point, high volume strains the thinly–stretched health care workforce and may contribute to burnout. Furthermore, when misused, messaging can jeopardize patient safety.

Some health systems have responded by charging patients for messages. Yet charging generates minimal revenue and only reduces volume marginally. As volume continually increases, provider organizations must find ways to manage messages more effectively, efficiently, and sustainably.

Large language models (LLMs) – machine learning algorithms that recognize and generate human language – a form of generative artificial intelligence, could be part of the solution. In late 2022, OpenAI released ChatGPT, an LLM consumer product with an easy-to-use conversational interface. It quickly captured the public’s imagination, becoming the fastest-growing consumer application in history and pushing many businesses to consider incorporating similar technology to boost productivity and improve their services.

Here, we draw on our medical, operational, computer science, and business backgrounds to consider how health care provider organizations may apply LLMs to better manage patient messaging.

How LLMs can add value to patient messaging

Microsoft and Google are incorporating LLMs into their email applications to “read” and summarize messages, then draft responses in particular styles, including the user’s own “voice.” We believe health care providers may harness similar technologies to improve patient messaging, just as some are starting to do for patient result messages, hospital discharge summaries, and insurance letters.

LLMs could add value at each step of the typical messaging workflow.

Step One: The patient composes and sends the message. Often these messages are incomplete (lacking enough detail for staff or clinicians to respond fully), inappropriate (urgent or complex issues that clinical teams cannot manage asynchronously), or unnecessary (the information is already easily accessible online).

LLMs can help by “reading” messages before patients send them and then providing appropriate self-service options (e.g., links to activities or information) and instructions (e.g., directing those who report alarming symptoms to seek immediate care). LLMs may also ask patients to clarify elements of the message (e.g., asking those reporting a rash to define its qualities and upload a photo), thereby reducing multiple back-and-forth messages.

Step two: The message routes to an individual or group inbox. One challenge is routing messages to the right team member. Another is that individuals must open each message individually to determine whether they or someone else should handle it.

LLMs can help by filtering out messages that do not need a human response (e.g., messages such as “Thank you, doc!”). For other messages, LLMs may add priority (e.g., urgent vs. routine) and request type (e.g., clinical vs. non-clinical) labels to help users quickly identify which messages they should – and should not – manage, and when.

Step three: Health care workers review the message. Often this requires switching between the inbox message and other electronic health record windows to review medications, results, and prior clinical notes.

Here, LLMs can empower humans by summarizing the message, highlighting essential items to address, and displaying applicable contextual information (e.g., relevant test results, active medications, and sections of clinic notes) within the message window.

Step four: Health care workers respond.

LLMs can draft a response written at the patient’s appropriate reading level. These responses can link to sources within the patient’s medical record and from the published medical literature. When indicated, LLMs can also add information to support clinical decisions and pend potential message-related orders, such as prescriptions, referrals, and tests. Human health care workers would review and edit the draft and confirm, delete, or edit any pending orders.

In sum, LLMs can make messaging more efficient, while also improving message quality and content. In a recent study comparing physician and ChatGPT-generated responses to patient questions, human evaluators rated the chatbot-generated responses as higher quality and more empathetic.

Integrating LLMs into patient messaging workflows

To apply LLM technology to patient messaging, health care provider organizations and their technology partners must develop, validate, and integrate clinical LLM models into electronic health records (EHR)-based clinical workflows.

To start, they can fine-tune existing LLMs (such as GPT4 from OpenAI) for clinical use by inputting hundreds of thousands of historical patient messages and associated responses, then instructing the LLM to find pertinent patient information and provide properly formatted responses.

Next, they would validate the fine-tuned LLM to ensure it reached a sufficient performance. While there currently are no agreed-upon validation methods, options include both retrospective performance on a test set of previously unseen (i.e., not included in the fine-tuning set) patient messages and responses, as well as prospective performance on a set of new incoming messages.

Once validated, the fine-tuned LLM would be integrated into EHR using application programmatic interfaces (APIs), and, through iterative testing and feedback, designed into end users’ messaging workflows.

What would have seemed unrealistic just a few months ago is quickly becoming feasible. Through an Epic and Microsoft partnership, several U.S. academic health systems are working to apply LLMs to patient messaging.

Challenges and opportunities

Patients and clinicians may not be ready to accept LLM-assisted patient messaging. Most Americans feel uncomfortable about their health care providers relying on AI. Similarly, most clinicians rate their EHRs – their primary technology tool – unfavorably and may feel skeptical that AI will help them do their jobs better.

Health care organizations may use human-centered design methods to ensure their messaging solutions benefit patients and clinicians. They must routinely measure what matters – including message turnaround time, response quality, workforce effort, patient satisfaction, and clinician experience – and use the results to improve continuously.

LLMs are imperfect and can omit or misrepresent information. Clinicians will remain responsible for providing care that meets or exceeds accepted clinical standards. They must therefore review, verify, and, when indicated, edit LLM-generated messages.

Our regulatory systems must also quickly evolve to enable safe, beneficial innovation. Though these models augment clinicians rather than automate care, the FDA may still evaluate these models as medical devices, requiring developers to validate each software component. This may be impossible for LLMs built on closed-source models (e.g., GPT-4) that do not disclose how they were developed, trained, or maintained.

Technological innovations routinely bring benefits with unanticipated side effects. Patient portal messaging increases care access but often overwhelms clinical teams. As message volume continuously grows, LLMs may be the best way to alleviate the workforce burden and enhance service quality. Health care provider organizations must proceed deliberately to develop safe, reliable, trustworthy solutions that improve messaging while minimizing new side effects of their own.

Spencer D. Dorn and Justin Norden are physician executives.