Tracking the firings of individual neurons is like trying to discern who is saying what in a football stadium full of screaming fans. Until recently, neuroscientists have had to tediously track each neuron by hand.
“People spent more time analyzing their data to extract activity traces than actually collecting it,” says Dmitri Chklovskii, who leads the neuroscience group at the Center for Computational Biology (CCB) at the Flatiron Institute in New York City.
A breakthrough software tool called CaImAn automates this arduous process using a combination of standard computational methods and machine-learning techniques. In a paper published in the journal eLife in January, the software’s creators demonstrate that CaImAn achieves near-human accuracy in detecting the locations of active neurons based on calcium imaging data.
CaImAn (an abbreviation of calcium imaging analysis) has been freely available for a few years and has already proved invaluable to the calcium imaging community, with more than 100 labs using the software. The latest iteration of CaImAn can run on a standard laptop and analyze data in real time, meaning scientists can analyze data as they run experiments. “My lab is excited about being able to use a tool like this,” says Duke University neuroscientist John Pearson, who was not involved in the software’s development.
CaImAn is the product of an effort initiated by Chklovskii within his group at CCB. He brought on Eftychios Pnevmatikakis and later Andrea Giovannucci to spearhead the project. Their aim was to help tackle the enormous datasets produced by a method called calcium imaging.
That technique involves adding a special dye to brain tissue or to neurons in a dish. The dye binds to the calcium ions responsible for activating neurons. Under ultraviolet light, the dye lights up. Fluorescence only occurs when the dye binds to a calcium ion, allowing researchers to visually track a neuron’s activity.
Analyzing the data gathered via calcium imaging poses a significant challenge. The process generates a flood of data — up to 1 terabyte an hour of flickering movies — that rapidly becomes overwhelming. “One experimenter can fill up the largest commercially available hard drive in one day,” says Michael Häusser, a neuroscientist at University College London whose team tested CaImAn.
The data are also noisy. Much like mingling voices, fluorescent signals from different neurons often overlap, making it difficult to pick out individual neurons. Moreover, brain tissue jiggles, adding to the challenge of tracking the same neuron over time.
Pnevmatikakis, now a research scientist at the Flatiron Institute’s Center for Computational Mathematics, first began developing the basic algorithm underlying CaImAn as a postdoc in Liam Paninski’s lab at Columbia University.
“It was elegant mathematically and did a decent job, but we realized it didn’t generalize well to different datasets,” Pnevmatikakis says. “We wanted to transform it into a software suite that the community can use.” That was partly why he was drawn to the neuroscience group at Flatiron, which develops new tools for analyzing large datasets.
Pnevmatikakis later began working with Giovannucci, then a postdoc at Princeton University, on applying the algorithm to tracking the activity of cerebellar granule cells, a densely packed, rapid-firing group of neurons. “Existing analysis tools were not powerful enough to disentangle the activity of this population of neurons and implied that they were all doing the same thing,” says Giovannucci, who joined the CCB neuroscience group for three years to help develop the software for broader use. “The algorithm subtracts the background voices and focuses on a few,” revealing that individual granule cells do indeed have distinct activity patterns.
Further work at the Flatiron Institute honed CaImAn’s abilities and made the software easier for researchers to use for a variety of experiments without extensive customization.
The researchers recently tested CaImAn’s accuracy by comparing its results with a human-generated dataset. The comparison proved that the software is nearly as accurate as humans in identifying active neurons but much more efficient. Its speediness allows researchers to adapt their experiments on the fly, improving studies of how specific bundles of neurons contribute to different behaviors. The human dataset also revealed high variability from person to person, highlighting the benefit of having a standardized tool for analyzing imaging data.
In addition to benchmarking accuracy, the researchers used the human-annotated results as a training dataset, developing machine-learning-based tools to enhance the CaImAn package. They have since made this dataset public, so that the community can use it to further extend CaImAn or to create new tools.