Led by Helmholtz Munich, scientists have developed an accessible software solution specifically designed for the analysis of complex medical health data. The open-source software called ‘ehrapy’ enables researchers to structure and systematically examine large, heterogeneous datasets. The software is available to the global scientific community to use and further develop.
Ehrapy is intended to fill a critical gap in the analysis of health data, says Lukas Heumos, one of the main developers and a scientist at the Institute of Computational Biology at Helmholtz Munich and the Technical University of Munich (TUM): “Until now, there have been no standardized tools for systematically and efficiently analyzing diverse and complex medical data. We’ve changed that with ehrapy.” The team behind ehrapy comes from biomedical research and has extensive experience in analyzing complex scientific datasets. “The healthcare sector faces similar challenges in data analysis as those working in laboratories,” noted Heumos at the start of the ehrapy project.
Exploratory Approach — Hypothesis-Free Analysis
Together with many other contributors, Heumos has used his expertise in scientific software development to create a solution for analyzing patient data: “Ehrapy can uncover new patterns and generate insights without needing to analyze the data based on a specific assumption or hypothesis.” This exploratory approach, says Heumos, is a unique feature of ehrapy.
Ehrapy allows researchers to sort, group, and analyze large, heterogeneous, and complex datasets without any pre-existing hypotheses. This opens up new insights that can then be explored further. Heumos explains: “The exploratory approach brings fresh perspectives to health data analysis. Due to their complexity and heterogeneity, these data are often not analyzed as effectively as they could be.” Ehrapy thus opens new avenues for making health data more useful for medical research and practice.
The Long-Term Goal: Routine Use in Clinical Practice
Ehrapy was designed as open-source software from the beginning. “It was important to us to make the software available to the scientific community from day one,” emphasizes Heumos. The software is available as a Python package on GitHub, an online platform for software development, and can be used and further developed by researchers worldwide.
Currently, ehrapy focuses on efficiently and quickly analyzing research datasets, such as those stored in large health research centers. “Routine use in clinical practice is a long-term goal, but for now, we are concentrating on providing the research community with a powerful tool,” says Heumos.
In the future, the team plans to provide standardized databases for electronic health records (EHRs). These databases will enable better integration and analysis of large volumes of medical data. Additionally, this will facilitate the development of EHR atlases that can serve as reference datasets for contextualizing and annotating new datasets.
A Long Journey
“Ehrapy enables comprehensive data analysis across systems, which can be a key step for future AI systems in medicine. I therefore hope for a relatively quick adoption at various sites,” says Prof. Fabian Theis, Director of the Institute of Computational Biology at Helmholtz Munich and TUM Professor: “Establishing such technologies in medicine is a lengthy process that can take decades. Our goal is to bridge the gap between biomedical research and practical application in medicine.” Theis further explains that the development team is focusing on exploratory data analysis methods in a holistic form to more easily reveal hidden connections. “We are also trying to support academic and commercial players in the healthcare sector.”
Ehrapy on GitHub: https://github.com/theislab/ehrapy