Microbial communities are thought to contain keystone species, which can disproportionately affect the stability of the communities, even if only present in low abundances. Identifying these keystone species can be challenging, especially in the human gut, since it is not feasible to isolate them through systematic elimination.
Researchers led by a team at Brigham and Women’s Hospital, a founding member of the Mass General Brigham healthcare system, have designed a new data-driven keystone species identification (DKI) framework that uses machine learning to resolve this difficulty.
Using a deep-learning model trained on real human gut microbiome data from a curated metagenomic database, the investigators were able to simulate the removal of any species in any gut microbiome sample. This “thought experiment” enabled them to calculate the “keystoneness” or the relative essentiality of each species in each community.
The scientists found that the predicted keystone species varied across communities. Some scored low median keystoneness across all samples, and were unlikely to be essential to any community. By contrast, those species with high median scores were likely to be keystone in some communities, but not in others. Similar results were also observed from human oral microbiome and environmental microbiomes. These results imply that the notion of keystone microbial species is community specific or context dependent.
Many human gut microbial species are known to have essential functions such as breaking down complex starches or maintaining healthy intestinal environments. The authors were able to use their DKI framework to identify potential keystone species involved in such functions, including one that aids digestion in formula-fed infants and adults.
“Our DKI framework demonstrates the power of machine learning in tackling a fundamental problem in community ecology,” said Yang-Yu Liu, PhD, of the Channing Division of Network Medicine at Brigham and Women’s Hospital. “Our DKI framework can be adapted to facilitate future data-driven work on complex microbial communities.”