An artificial intelligence program created explanations of heart test results that were in most cases accurate, relevant, and easy to understand by patients, a new study finds.
The study addressed the echocardiogram (echo), which uses sound waves to create pictures of blood flowing through the heart’s chambers and valves. Echo reports include machine-generated numerical measures of function, as well as comments from the interpreting cardiologist on the heart’s size, the pressure in its vessels, and tissue thickness, which can signal the presence of disease. In the form typically generated by doctors, the reports are difficult for patients to understand, often resulting in unnecessary worry, say the study authors.
To address the issue, NYU Langone Health has been testing the capabilities of a form of artificial intelligence (AI) that generates likely options for the next word in any sentence based on how people use words in context on the internet. A result of this next-word prediction is that such generative AI “chatbots” can reply to questions in simple language. However, AI programs — which work based on probabilities instead of “thinking” and may produce inaccurate summaries — are meant to assist, not replace, human providers.
In March 2023, NYU Langone requested from OpenAI, the company that created the chatGPT chatbot, access to the company’s latest generative AI tool, GPT4. NYU Langone Health licensed one of the first “private instances” of the tool, which freed clinicians to experiment with AI using real patient data while adhering to privacy rules.
Coming out of that effort and publishing online July 31 in the Journal of the American College of Cardiology (JACC) Cardiovascular Imaging, the current study analyzed one hundred doctor-written reports on a common type of echo test to see whether GPT4 could efficiently generate human-friendly explanations of test results.Five board-certified echocardiographers evaluated AI-generated echo explanations on five-point scales for accuracy, relevance, and understandability, and either agreed or strongly agreed that 73% were suitable to send to patients without any changes.
All AI explanations were rated either “all true” (84%) or mostly correct (16%). In terms of relevance, 76% of explanations were judged to contain “all of the important information,” 15% “most of it,” 7% “about half,” and 2% “less than half.” None of the explanations with missing information were rated as “potentially dangerous,” the authors say.
“Our study, the first to evaluate GPT4 in this way, shows that generative AI models can be effective in helping clinicians to explain echocardiogram results to patients,” said corresponding author Lior Jankelson, MD, PhD, associate professor of medicine at the NYU Grossman School of Medicine and Artificial Intelligence Leader for Cardiology at NYU Langone. “Fast, accurate explanations may lessen patient worry and reduce the sometimes overwhelming volume of patient messages to clinicians.”
The federal mandate for the immediate release of test results to patients through the 21st Century Cures Act in 2016 has been linked to dramatic increases in number of inquiries to clinicians, say the study authors. Patients receive raw test results, do not understand them, and grow anxious while they wait for clinicians to reach them with explanations, the researchers say.
Ideally, clinicians would advise patients about their echocardiogram results the instant they are released, but that is delayed as providers struggle to manually enter large amounts of related information into the electronic health record.
“If dependable enough, AI tools could help clinicians explain results at the moment they are released,” said first study author Jacob Martin, MD, a cardiology fellow at NYU Langone. “Our plan moving forward is to measure the impact of explanations drafted by AI and refined by clinicians on patient anxiety, satisfaction, and clinician workload.”
The new study also found 16% of the AI explanations contained inaccurate information. In one error, the AI echocardiogram report stated that “a small amount of fluid, known as a pleural effusion, is present in the space surrounding your right lung.” The tool has mistakenly concluded that the effusion was small, an error known in the industry as an AI “hallucination.” The researchers emphasized that human oversight is important to refine drafts from AI, including correcting any inaccuracies before they reach patients.
The research team also surveyed participants without clinical backgrounds who were recruited to get the perspective of lay people on the clarity of AI explanations. In short, they were well received, said the authors. Non-clinical participants found 97% of AI-generated rewrites more understandable than the original reports, which reduced worry in many cases.
“This added analysis underscores the potential of AI to improve patient understanding and ease anxiety,” Martin added. “Our next step will be to integrate these refined tools into clinical practice to enhance patient care and reduce clinician workload.”
Along with Martin and Jankelson, NYU Langone study authors were Muhamed Saric, Alan Vainrib, Daniel Bamira, Samuel Bernard, Richard Ro, Theodore Hill, and Larry Chinitz in the Leon H. Charney Division of Cardiology; Jonathan Austrian and Yindalon Aphinyanaphongs in the Medical Center Information Technology (MCIT); Hao Zhang and Vidya Koesmahargyo in the Center for Healthcare Innovation & Delivery Science in the Department of Population Health, and Mathew Williams in the Department of Cardiothoracic Surgery.