Genome variation data on more than 7,000 malaria parasites from 28 endemic countries is released today in Wellcome Open Research. It has been produced by MalariaGEN, a data-sharing network of groups around the world who are working together to build high-quality data resources for malaria research and disease control.
This open data release represents the world’s largest resource of genomic data on malaria parasite evolution and drug resistance. It provides benchmark data on parasite genome variation that is needed in the search for new drugs and vaccines, and in the development of surveillance tools for malaria control and elimination.
Malaria is a major global health problem causing an estimated 409,000 deaths in 2019, with 67 per cent of deaths occurring in children under five years of age. This data resource focuses on Plasmodium falciparum, the species of malaria parasite that is responsible for the most common and deadliest form of the disease.
The Malaria Genomic Epidemiology Network (MalariaGEN) provides researchers and control programmes in malaria-endemic countries with access to DNA sequencing technologies and tools for genomic analysis. Founded in 2005, MalariaGEN now has partners in 39 countries, each leading their own studies into different aspects of malaria biology and epidemiology, with the common goal of finding ways to improve malaria control.
This latest publication represents the work of 49 partner studies at 73 locations in Africa, Asia, South America and Oceania, who together contributed 7,113 samples of P. falciparum for genome sequencing. At the Wellcome Sanger Institute, each sample was analysed for over 3 million genetic variants and the data were carefully curated before returning to partners for use in their own research. This paper brings together the data from all the partner studies to provide an open data resource for the wider scientific community.
Dr. Richard Pearson, co-author from the Wellcome Sanger Institute, said: “We have created a data resource that is ‘analysis ready’ for anyone to use, including those without specialist genetics training. Each annotated dataset sample includes key features that are relevant to malaria control, such as resistance to six major antimalarial drugs, and whether it carries particular structural changes that cause diagnostic malaria tests to fail. Like the Human Genome Project was a resource for the analyses of human genome sequence data, we hope this will be one of the main resources for malaria research.”
One of MalariaGEN’s core principles is to provide clear attribution and recognition of all the groups that have contributed to a data resource. In this dataset, each sample is listed against the partner study that it belongs to, with a description of the scientific aims of the study and the local investigators that led the work.
Professor Dominic Kwiatkowski, co-author from the Wellcome Sanger Institute and the Big Data Institute at the University of Oxford, said: “It has been a huge privilege to collaborate with our MalariaGEN partners around the world to build this data resource. We are proud to see these genomic data being used in publications by our colleagues in malaria-endemic studies and others in the malaria research community. We hope that the new features in this data release will make it accessible to an even wider audience, and our team is now hard at work to produce the next version.”
Professor Abdoulaye Djimde, co-author from the University of Science, Techniques and Technologies of Bamako, Mali, said: “A quantitative assessment of how malaria parasites respond to public health interventions is key for a successful and sustainable elimination campaign. Over time, this openly available resource will facilitate research into the malaria parasite’s evolutionary processes, which will ultimately inform effective and sustainable malaria control and elimination strategies that will be key in ending this devastating disease.”
Wellcome Trust Sanger Institute