Stanford Medicine researchers have built an artificial intelligence tool that can read thousands of doctors’ notes in electronic medical records and detect trends, providing information that physicians and researchers hope will improve care.
Typically, experts seeking answers to questions about care need to pore over hundreds of medical charts. But new research shows that large language models — AI tools that can find patterns in complex written language — may be able to take over this busywork and that their findings could have practical uses. For instance, AI tools could monitor patients’ charts for mentions of hazardous interactions between drugs or could help doctors identify patients who will respond well or poorly to specific treatments.
The AI tool, described in a study that published online Dec. 19 in Pediatrics, was designed to figure out from medical records if children with attention deficit hyperactivity disorder received appropriate follow-up care after being prescribed new medications.
“This model enables us to identify some gaps in ADHD management,” said the study’s lead author, Yair Bannett, MD, assistant professor of pediatrics.
The study’s senior author is Heidi Feldman, MD, the Ballinger-Swindells Endowed Professor in Developmental and Behavioral Pediatrics.
The research team used the tool’s insights to pinpoint tactics that could improve how doctors follow up with ADHD patients and their families, Bannett noted, adding that the power of such AI tools could be applied to many aspects of medical care.
A slog for a human, a breeze for AI
Electronic medical records contain information such as lab results or blood pressure measurements in a format that’s easy for computers to compare among many patients. But everything else — about 80% of the information in any medical record — is in the notes that physicians write about the patient’s care.
Although these notes are handy for the next human who reads a patient’s chart, their freeform sentences are challenging to analyze en masse. This less-organized information must be categorized before it can be used for research, typically by a person who reads the notes looking for specific details. The new study looked at whether researchers could employ artificial intelligence for that task instead.
The study used medical records from 1,201 children who were 6 to 11 years old, were patients at 11 pediatric primary care practices in the same health care network, and had a prescription for at least one ADHD medication. Such medications can have disruptive side effects, such as suppressing a child’s appetite, so it is important for doctors to inquire about side effects when patients are first using the drugs and adjust dosages as necessary.
The team trained an existing large language model to read doctors’ notes, looking for whether children or their parents were asked about side effects in the first three months of taking a new drug. The model was trained on a set of 501 notes that researchers reviewed. The researchers counted any note that mentioned either the presence or absence of side effects (e.g., either “reduced appetite” or “no weight loss”) as indicating that follow-up had happened, while notes with no mention of side effects were counted as meaning follow-up hadn’t occurred.
These human-reviewed notes were used as what’s known in AI as “ground truth” for the model: The research team used 411 of the notes to teach the model what an inquiry about side effects looked like, and the remaining 90 notes to verify that the model could accurately find such inquiries. They then manually reviewed an additional 363 notes and tested the model’s performance again, finding that it classified about 90% of the notes correctly.
Once the large language model was working well, the researchers used it to quickly evaluate all 15,628 of the notes in the patients’ charts, a task that would have taken more than seven months of full-time work without AI.
From analysis to better care
From the AI analysis, the researchers picked up information they would not have detected otherwise. For instance, the AI saw that some of the pediatric practices frequently asked about drug side effects during phone conversations with patients’ parents, while other practices did not.
“That is something you would never be able to detect if you didn’t deploy this model on 16,000 notes the way we did, because no human will sit and do that,” Bannett said.
The AI also found that pediatricians asked follow-up questions about certain medications less often. Kids with ADHD can be prescribed stimulants or, less commonly, non-stimulant medications such as some types of anti-anxiety drugs. Doctors were less likely to ask about the latter category of drugs.
The finding offers an example of the limits of what AI can do, Bannett said: It could detect a pattern in patient records but not explain why the pattern was there.
“We really had to talk to pediatricians to understand this,” he said, noting that pediatricians told him they had more experience managing the side effects of the stimulants.
The AI tool may have missed some inquiries about medication side effects in its analysis, the researchers said, because some conversations about side effects may not have been recorded in patients’ electronic medical records, and some patients received specialty care — such as with a psychiatrist — that was not tracked in the medical records used in this study. The AI tool also misclassified a few physician notes on side effects of prescriptions for other conditions, such as acne medication.
Guiding the AI
As scientists build more AI tools for medical research, they need to consider what the tools do well and what they do poorly, Bannett said. Some tasks, such as sorting through thousands of medical records, are ideal for an appropriately trained AI tool.
Others, such as understanding the ethical pitfalls of the medical landscape, will require careful human thought, he said. An editorial that Bannett and colleagues recently published in Hospital Pediatrics explains some of the potential problems and how they might be addressed.
“These AI models are trained on existing health care data, and we know from many studies over the years that there are disparities in health care,” Bannett said. Researchers need to think through how to mitigate such biases both as they build AI tools and when they put them to work, he said, adding that with the right cautions in place, he is excited about the potential of AI to help doctors do their jobs better.
“Each patient has their own experience, and the clinician has their knowledge base, but with AI, I can put at your fingertips the knowledge from large populations,” he said. For instance, AI might eventually help doctors predict based on a patient’s age, race or ethnicity, genetic profile, and combination of diagnoses whether the individual is likely to have a bad side effect from a specific drug, he said. “That can help doctors make personalized decisions about medical management.”
The research was supported by the Stanford Maternal and Child Health Research Institute and the National Institute of Mental Health (grant K23MH128455).