Some artificial intelligence tools for health care may get confused by the ways people of different genders and races talk, according to a new study led by CU Boulder computer scientist Theodora Chaspari.
The study hinges on a, perhaps unspoken, reality of human society: Not everyone talks the same. Women, for example, tend to speak at a higher pitch than men, while similar differences can pop up between, say, white and Black speakers.
Now, researchers have found that those natural variations could confound algorithms that screen humans for mental health concerns like anxiety or depression. The results add to a growing body of research showing that AI, just like people, can make assumptions based on race or gender.
“If AI isn’t trained well, or doesn’t include enough representative data, it can propagate these human or societal biases,” said Chaspari, associate professor in the Department of Computer Science.
She and her colleagues published their findings July 24 in the journal Frontiers in Digital Health.
Chaspari noted that AI could be a promising technology in the healthcare world. Finely tuned algorithms can sift through recordings of people speaking, searching for subtle changes in the way they talk that could indicate underlying mental health concerns.
But those tools have to perform consistently for patients from many demographic groups, the computer scientist said. To find out if AI is up to the task, the researchers fed audio samples of real humans into a common set of machine learning algorithms. The results raised a few red flags: The AI tools, for example, seemed to underdiagnose women who were at risk of depression more than men — an outcome that, in the real world, could keep people from getting the care they need.
“With artificial intelligence, we can identify these fine-grained patterns that humans can’t always perceive,” said Chaspari, who conducted the work as a faculty member at Texas A&M University. “However, while there is this opportunity, there is also a lot of risk.”
Speech and emotions
She added that the way humans talk can be a powerful window into their underlying emotions and wellbeing — something that poets and playwrights have long known.
Research suggests that people diagnosed with clinical depression often speak more softly and in more of a monotone than others. People with anxiety disorders, meanwhile, tend to talk with a higher pitch and with more “jitter,” a measurement of the breathiness in speech.
“We know that speech is very much influenced by one’s anatomy,” Chaspari said. “For depression, there have been some studies showing changes in the way vibrations in the vocal folds happen, or even in how the voice is modulated by the vocal tract.”
Over the years, scientists have developed AI tools to look for just those kinds of changes.
Chaspari and her colleagues decided to put the algorithms under the microscope. To do that, the team drew on recordings of humans talking in a range of scenarios: In one, people had to give a 10 to 15 minute talk to a group of strangers. In another, men and women talked for a longer time in a setting similar to a doctor’s visit. In both cases, the speakers separately filled out questionnaires about their mental health. The study included Michael Yang and Abd-Allah El-Attar, undergraduate students at Texas A&M.
Fixing biases
The results seemed to be all over the place.
In the public speaking recordings, for example, the Latino participants reported that they felt a lot more nervous on average than the white or Black speakers. The AI, however, failed to detect that heightened anxiety. In the second experiment, the algorithms also flagged equal numbers of men and women as being at risk of depression. In reality, the female speakers had experienced symptoms of depression at much higher rates.
Chaspari noted that the team’s results are just a first step. The researchers will need to analyze recordings of a lot more people from a wide range of demographic groups before they can understand why the AI fumbled in certain cases — and how to fix those biases.
But, she said, the study is a sign that AI developers should proceed with caution before bringing AI tools into the medical world:
“If we think that an algorithm actually underestimates depression for a specific group, this is something we need to inform clinicians about.”