Science has the potential to tackle some of the world’s biggest problems, from curbing new pandemics and climate change to understanding why brain circuits go awry. But science is not without its thorns. In the fierce competition for new discoveries and prestigious publications and awards, research teams can work for years in secrecy, collecting large amounts of data and conducting complex analyzes that are extremely difficult to verify. Without access to the complete research environment and data, “it’s almost impossible for one group to replicate another group’s work,” says Howard Hughes Medical Institute (HHMI) researcher at the University of California, San Francisco (UCSF). Lauren Frank says.
Additionally, the fact that science is a “highly competitive and fractured profession is one of the things that slows it down,” says Kristen Rattan.External link, opens in new tab, founder of Strategy for Open Science (Stratos), working with HHMI on data sharing strategies. This is especially problematic as researchers tackle increasingly difficult problems. Many of today’s complex problems “can no longer be solved by one graduate student working in isolation for her four or five years,” says Bodo Stern, director of strategic initiatives at HHMI.
Also, the process of publishing those results can take years, and the basic format hasn’t changed much over the centuries. “The way we publish science is outdated,” Stern says. “The format of today’s article is very similar to the format of previous articles. Nature Article from 1869. ”
As such, there is now a growing movement to change the very culture of science, replacing some of that cut-throat competition with friendly cooperation and encouraging the widespread sharing of datasets and data analyzes prior to publication. Masu. Changing the culture of science is notoriously difficult, but staunch open science advocates are making real progress. Stratos, which Lattan founded in 2019, works with his HHMI researchers like Frank to help interpret vast amounts of environmental and space data coming from satellites, telescopes and more at NASA. We have been exploring the idea of a “collaboration hub”, which was first developed in . sensor.
The U.S. government and federal agencies have also expressed support for open science and collaboration. In fact, the Biden administration’s Office of Science and Technology Policy (OSTP) has declared 2023 the Year of Open Science.External link, opens in new tab.In a landmark 2022 memoExternal link, opens in new tab, OSTP required agencies to “make publications and supporting data from federally funded research available to the public.” Meanwhile, the National Institutes of Health has introduced even more stringent data sharing requirements. “The U.S. policy landscape has changed extraordinary,” Rattan said.
Make your data more accessible
However, policy changes are only part of the story. Scientists also need to focus not only on publishing their data, but also on devising ways to make those data and how they are analyzed more accessible and understandable to potential collaborators. Fortunately, ambitious new examples of such data sharing tools now exist. On January 26, 2024, after five years of core software engineering work, Frank’s lab released a preprint.External link, opens in new tab He describes a new “reproducible and shareable data analysis framework for neuroscience research” that his team has dubbed “Spyglass.”External link, opens in new tab”
The Spyglass framework brings together all the data Frank’s lab collects from electrode arrays inserted into rat brain regions involved in behavior, learning, and imagination, along with detailed information about each animal’s second-by-second behavior. Saved in standardized files. A format called Neurodata Without BordersExternal link, opens in new tab (NWB). Spyglass then uses software code (written in the open source language Python) that allows for both the sharing and analysis of not only the raw data but also the results from all steps of a typically highly complex analysis. Offers. As explained in the preprint, “Spyglass also provides ready-to-use pipelines for analyzing behavioral and electrophysiological data, as well as extensive documentation and tutorials to train new users.”
Spyglass is available to anyone without having to understand NWB or download the software through a cloud-based data sharing hub commissioned by HHMI. This is a “real leap forward in data sharing” that will benefit not only the neuroscience community, but also Frank’s lab and his direct collaborators, Rattan said. A link to this hub is available as a preprint, allowing anyone with sufficient computing power to perform their own analysis, modify parameters, and evaluate the results. Using a new, standardized approach to data collection and analysis “allows him to work two to three times faster than before,” Frank explains. “This was a huge upfront investment, and we’re now starting to see the results,” Stern added.
The story of Spyglass begins in the late 1990s. At the time, Lauren Frank was a graduate student trying to understand how animal behavior was related to patterns of neural activity in the animal’s brain. At the time, all members of the lab developed their own data processing methods, which could vary from experiment to experiment. Frank thought there had to be a better way. “I tried to come up with something that could be used across multiple subjects,” he recalls.
The first code he wrote was “awkward,” but “not terrible,” he says. Over the next few decades, he and his graduate students rewrote the software to make it more general, combining the data into chunks to determine which brain areas to study and how neurons behave in response to animal behavior. It is now possible to show whether a fire will ignite. “That system served us well for many years,” Frank recalls, but it had significant limitations. Although we were able to organize the initial data, we were unable to add to or collate data obtained at different times or from different researchers. He also said the many steps in the analysis from raw data to final results were not properly tracked. By 2016, “it became clear that that wasn’t enough,” Frank says.
Around the same time in the late 1990s, the field of neuroscience realized that it had a major problem with reproducibility. Experiments generated vast amounts of data, and analyzes typically involved so many hidden steps that it took years to understand. These challenges make it difficult or impossible for one laboratory to build on the work of another, creating too much overlap and hindering progress in the field. “We’re spending millions of dollars for someone to collect very complex data, but the next researcher still has to create their own data,” Stern says.
So in 2019, Frank told his lab members that he was going to build a new system. Once ready, anything that does not use the new His NWB format and new system will no longer be published. Frank’s declaration did not go well at first. “People weren’t excited,” he recalls. Graduate students are competing to accumulate enough data and results to earn a Ph.D., and postdocs are required to produce a large number of papers to advance their careers. So why take the big risk of spending precious time writing complex software? ” says Frank.
Then the coronavirus disease (COVID-19) pandemic occurred. With the lab temporarily closed, Frank was able to spend time at home writing his code for software and planning the basic structure of what would later become the Spyglass framework.He also found an enthusiastic partner in Lee KyuhyunExternal link, opens in new tabLee joined the Frank lab at the end of 2020. In Lee’s own doctoral program, he recalls, data was organized in an “idiosyncratic” way, with each of the many steps of analysis relying on different software tools. “It was a little confusing,” he says.
Frank’s vision of a clear, unified framework was therefore particularly appealing to Lee. Due to the coronavirus lockdown, Lee is unable to enter the lab and is conducting a unique experiment that examines the hippocampus and visual cortex of rats to show how they can imagine being in different places. Unable to start. And the idea of data sharing seemed both challenging and rewarding. “One of the reasons he spent so much time on this piece was because he really enjoyed it,” Lee says.
Although Lee is the lead author on the Spyglass paper, he hastened to add that he received great support from his colleagues, including co-lead author Eric DeNovelis and Frank. “I’ve never met a principal investigator who is still writing code,” Lee said. Additionally, with strong support from HHMI, Frank was able to hire two of his software engineers and dedicate valuable time from research to developing new tools. “We are fortunate to receive funding from HHMI,” Frank explains. “It gives us the ability to do more difficult things.”
game changer
The new framework is now used by all researchers in Frank’s lab and “has changed the game in terms of what can be done,” he says. “We can take these complex data streams, numbers that estimate what the animal is thinking, and relate them to the animal’s behavior,” he says. “Then you can run standard analyzes in a day that used to take weeks.”Additionally, Lee says, “We are getting serious about having our data and code in a shareable state. and have more confidence in their data and results.”
For the broader research community, the Spyglass framework is available to anyone in a cloud-based data sharing environment.The data hub is “the first step in opening new doors in science,” said Rattan, who is working with the hub-building technology nonprofit 2i2c.External link, opens in new tab, to test how such data sharing can be done more broadly across many different sectors. Such systems essentially create a virtual representation of a lab’s entire work, allowing others to compare or combine their own data or even mine the data for new discoveries, she said. I will explain.
Ratan and the 2i2c team envision the creation of a “data lake” or “data bazaar” that pools the work of many researchers, increasing the potential for new scientific breakthroughs. “We believe this type of collaborative environment will be critical to future scientific advances,” says Frank.
Of course, just because data sharing is technically possible does not mean it will happen. “Data sharing and cultural change in science is difficult,” Frank says. Researchers are understandably concerned that their hard-earned data will be used by competitors to steal their information. You may also want to have the data at hand so you can squeeze out the maximum number of papers. Alternatively, he suggests that people may be embarrassed to share their code if it’s not in good condition or well-documented. “It’s like showing your dirty laundry.”
But Stern, Rattan, Frank, and many others are hopeful that a combination of carrots and sticks can successfully steer scientific culture toward a new era of cooperation. Stick is a new policy that requires data sharing. The carrot is the potential rewards from the collaboration itself or from creating data sharing tools like Spyglass. These could include specifically targeted grants and scientific assessments, early publications, and even new prizes that reward particularly productive collaborations that accelerate the pace of scientific discovery. . Ultimately, “collaboration is expected to become the new competitive strategy,” Stern says.
And for HHMI, the Spyglass framework provides a data collection and sharing model that other HHMI teams can adopt. Equally important, Stern added, is an example of Frank taking a big risk to transform the way a lab operates. HHMI’s communications efforts typically focus on promoting new and exciting results from HHMI researchers. But Stern sees the Spyglass preprint as an important opportunity to highlight and encourage innovation in the way science itself is done. “We want to send a signal to scientists that we welcome exploring new approaches to data sharing,” he says.