Keeping a human in the loop: Managing the ethics of AI in medicine

Spread the love

Artificial intelligence (AI) — of ChatGPT fame — is increasingly used in medicine to improve diagnosis and treatment of diseases, and to avoid unnecessary screening for patients. But AI medical devices could also harm patients and worsen health inequities if they are not designed, tested, and used with care, according to an international task force that included a University of Rochester Medical Center bioethicist.

Jonathan Herington, PhD, was a member of the AI Task Force of the Society for Nuclear Medicine and Medical Imaging, which laid out recommendations on how to ethically develop and use AI medical devices in two papers published in the Journal of Nuclear Medicine. In short, the task force called for increased transparency about the accuracy and limits of AI and outlined ways to ensure all people have access to AI medical devices that work for them — regardless of their race, ethnicity, gender, or wealth.

While the burden of proper design and testing falls to AI developers, health care providers are ultimately responsible for properly using AI and shouldn’t rely too heavily on AI predictions when making patient care decisions.

“There should always be a human in the loop,” said Herington, who is assistant professor of Health Humanities and Bioethics at URMC and was one of three bioethicists added to the task force in 2021. “Clinicians should use AI as an input into their own decision making, rather than replacing their decision making.”

This requires that doctors truly understand how a given AI medical device is intended to be used, how well it performs at that task, and any limitations — and they must pass that knowledge on to their patients. Doctors must weigh the relative risks of false positives versus false negatives for a given situation, all while taking structural inequities into account.

When using an AI system to identify probable tumors in PET scans, for example, health care providers must know how well the system performs at identifying this specific type of tumor in patients of the same sex, race, ethnicity, etc., as the patient in question.

“What that means for the developers of these systems is that they need to be very transparent,” said Herington.

According to the task force, it’s up to the AI developers to make accurate information about their medical device’s intended use, clinical performance, and limitations readily available to users. One way they recommend doing that is to build alerts right into the device or system that informs users about the degree of uncertainty of the AI’s predictions. That might look like heat maps on cancer scans that show whether areas are more or less likely to be cancerous.

To minimize that uncertainty, developers must carefully define the data they use to train and test their AI models, and should use clinically relevant criteria to evaluate the model’s performance. It’s not enough to simply validate algorithms used by a device or system. AI medical devices should be tested in so-called “silent trials,” meaning their performance would be evaluated by researchers on real patients in real time, but their predictions would not be available to the health care provider or applied to clinical decision making.

Developers should also design AI models to be useful and accurate in all contexts in which they will be deployed.

“A concern is that these high-tech, expensive systems would be deployed in really high-resource hospitals, and improve outcomes for relatively well-advantaged patients, while patients in under-resourced or rural hospitals wouldn’t have access to them — or would have access to systems that make their care worse because they weren’t designed for them,” said Herington.

Currently, AI medical devices are being trained on datasets in which Latino and Black patients are underrepresented, meaning the devices are less likely to make accurate predictions for patients from these groups. In order to avoid deepening health inequities, developers must ensure their AI models are calibrated for all racial and gender groups by training them with datasets that represent all of the populations the medical device or system will ultimately serve.

Though these recommendations were developed with a focus on nuclear medicine and medical imaging, Herington believes they can and should be applied to AI medical devices broadly.

“The systems are becoming ever more powerful all the time and the landscape is shifting really quickly,” said Herington. “We have a rapidly closing window to solidify our ethical and regulatory framework around these things.”

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31