With help from an artificial language network, MIT neuroscientists have discovered what kind of sentences are most likely to fire up the brain’s key language processing centers.
The new study reveals that sentences that are more complex, either because of unusual grammar or unexpected meaning, generate stronger responses in these language processing centers. Sentences that are very straightforward barely engage these regions, and nonsensical sequences of words don’t do much for them either.
For example, the researchers found this brain network was most active when reading unusual sentences such as “Buy sell signals remains a particular,” taken from a publicly available language dataset called C4. However, it went quiet when reading something very straightforward, such as “We were sitting on the couch.”
“The input has to be language-like enough to engage the system,” says Evelina Fedorenko, Associate Professor of Neuroscience at MIT and a member of MIT’s McGovern Institute for Brain Research. “And then within that space, if things are really easy to process, then you don’t have much of a response. But if things get difficult, or surprising, if there’s an unusual construction or an unusual set of words that you’re maybe not very familiar with, then the network has to work harder.”
Fedorenko is the senior author of the study, which appears today in Nature Human Behavior. MIT graduate student Greta Tuckute is the lead author of the paper.
Processing language
In this study, the researchers focused on language-processing regions found in the left hemisphere of the brain, which includes Broca’s area as well as other parts of the left frontal and temporal lobes of the brain.
“This language network is highly selective to language, but it’s been harder to actually figure out what is going on in these language regions,” Tuckute says. “We wanted to discover what kinds of sentences, what kinds of linguistic input, drive the left hemisphere language network.”
The researchers began by compiling a set of 1,000 sentences taken from a wide variety of sources — fiction, transcriptions of spoken words, web text, and scientific articles, among many others.
Five human participants read each of the sentences while the researchers measured their language network activity using functional magnetic resonance imaging (fMRI). The researchers then fed those same 1,000 sentences into a large language model — a model similar to ChatGPT, which learns to generate and understand language from predicting the next word in huge amounts of text — and measured the activation patterns of the model in response to each sentence.
Once they had all of those data, the researchers trained a mapping model, known as an “encoding model,” which relates the activation patterns seen in the human brain with those observed in the artificial language model. Once trained, the model could predict how the human language network would respond to any new sentence based on how the artificial language network responded to these 1,000 sentences.
The researchers then used the encoding model to identify 500 new sentences that would generate maximal activity in the human brain (the “drive” sentences), as well as sentences that would elicit minimal activity in the brain’s language network (the “suppress” sentences).
In a group of three new human participants, the researchers found these new sentences did indeed drive and suppress brain activity as predicted.
“This ‘closed-loop’ modulation of brain activity during language processing is novel,” Tuckute says. “Our study shows that the model we’re using (that maps between language-model activations and brain responses) is accurate enough to do this. This is the first demonstration of this approach in brain areas implicated in higher-level cognition, such as the language network.”
Linguistic complexity
To figure out what made certain sentences drive activity more than others, the researchers analyzed the sentences based on 11 different linguistic properties, including grammaticality, plausibility, emotional valence (positive or negative), and how easy it is to visualize the sentence content.
For each of those properties, the researchers asked participants from crowd-sourcing platforms to rate the sentences. They also used a computational technique to quantify each sentence’s “surprisal,” or how uncommon it is compared to other sentences.
This analysis revealed that sentences with higher surprisal generate higher responses in the brain. This is consistent with previous studies showing people have more difficulty processing sentences with higher surprisal, the researchers say.
Another linguistic property that correlated with the language network’s responses was linguistic complexity, which is measured by how much a sentence adheres to the rules of English grammar and how plausible it is, meaning how much sense the content makes, apart from the grammar.
Sentences at either end of the spectrum — either extremely simple, or so complex that they make no sense at all — evoked very little activation in the language network. The largest responses came from sentences that make some sense but require work to figure them out, such as “Jiffy Lube of — of therapies, yes,” which came from the Corpus of Contemporary American English dataset.
“We found that the sentences that elicit the highest brain response have a weird grammatical thing and/or a weird meaning,” Fedorenko says. “There’s something slightly unusual about these sentences.”
The researchers now plan to see if they can extend these findings in speakers of languages other than English. They also hope to explore what type of stimuli may activate language processing regions in the brain’s right hemisphere.