Despite being nearly six years old, the National Institutes of Health’s BioData Catalyst cloud platform is only just beginning to gain traction.
It already holds nearly 4 petabytes of data and is preparing for a major expansion later this year as part of the NIH’s goal of democratizing health research information.
BioData Catalyst already provides access to clinical and genomic data, and NIH plans to add imaging data and other data types in the coming months, said Sweta Ladhwa, director of NIH’s Office of Scientific Solutions Delivery. He said he wanted to.


“We want to provide free access to our research community so that we can truly advance scientific results, treatments, and diagnostics that benefit public health and outcomes for Americans and indeed people around the world.” We really want to provide these resources,” Ladwa said during his talk. A recent panel discussion hosted by AFCEA Bethesda. An excerpt was published on “Ask the CIO.” “To do this, we need different skills, expertise and different actors. We need different people working together to make this resource available to the community. It’s also part of the larger NIH data ecosystem. We collaborate with other NIH institutes and centers that provide cloud resources.”
Lawda said the expansion of new datasets to the BioData Catalyst platform means the NIH can also provide new tools to assist with information mining.
“For example, with image data, we want to be able to leverage or incorporate tools related to machine learning. What image researchers are primarily looking at is processing these images to gain insights. So the tools around machine learning, for example, are things that we want to be part of the ecosystem and we’re really actively working on incorporating,” she said. . “Many tools are tied to data types, but they can also be workflows, pipelines, or applications that actually help researchers address use cases. And there’s a wealth of data out there. These use cases are all over the place because there are so many things you can do.”
In the case of the NIH, users in the research and academic communities drive both the dataset and related tools. Rauda said the NIH is trying to make it more accessible to the community.
NIH makes cloud storage easy
That’s why cloud services have and will continue to play an important role in this and other big data platforms.
“NIH’s Office of Data Science Strategy is negotiating rates with cloud vendors to make these cloud storages free to the community and at a discount to research institutes. Even if you do, you can actually take advantage of and benefit from the discounts that NIH has negotiated with these cloud vendors,” she said. “We’re really excited to be working with a multi-cloud vendor to put some of that cost back into cutting-edge science. We want to expand our capabilities with new technologies and really bring those resources to the community to advance science.”
Like the NIH, the Centers for Medicare and Medicaid Services spends a lot of time thinking about data and how to make it more useful to customers.
But for CMS, the data is about the federal health care market and tools to improve the knowledge of the public and agency employees.


Kate Weatherby, acting director of CMS’ Marketplace Innovation and Technology Group, said CMS is reviewing all data sources and streams to better understand their content and improve all of its websites and user experiences. said.
“We use this for performance analysis to make sure the system is up and there is access while we are doing open enrollment and providing coverage to people,” she said. said. “The other thing is that we use Google Analytics and use different types of test fields to make sure that the way we ask questions and the way we get information from people makes a lot of sense. I spend a lot of time on it.”
Weatherby said her office works closely with both the Enterprise and Policy departments to integrate data and ensure its value.
“The real question is, if you don’t really understand the data when you get it, in 10 years you’re going to be saying, ‘Why do we have this data?’ You look at it, and then you take the time from year to year to see if it’s still worth holding,” she said.
As CMSs move toward AI, such as generative AI, chatbots, and other tools, understanding the business, policy, and technical aspects of data becomes more important.
CMS to create a data lake
Wetherby said CMS must first understand the data before applying these tools.
“We need to understand why we ask such questions. What is the relationship between all this data? And how can it be improved? A little outdated Now that we have the data, what is the length of the data going to be? We need to look at that and think about whether it actually fits our use case and what direction we want to go in our future work. “No,” she said. “As a whole at CMS, we have been thinking hard about data, how we manage it, how we know what it will be used for. We want you to be really clear. We want it to be really usable, because we all know that it can be manipulated in ways that When you start, when we talk about generative AI, when we talk about chatbots, when we talk about predictive analytics, if the data isn’t right or the question isn’t right, it’s very easy for a computer to get the result that you’re really looking for. You won’t get it.”
Wetherby added that another important part of getting the data right is the user experience and how CMS can share that data across government.
As it ramps up its use of GenAI and other tools, CMS is creating a data lake to capture information from various centers and offices across the agency.
Weatherby said this allows agencies to establish proper governance and security around their data, as it includes multiple types of data, such as clinical information and billing information.
Copyright © 2024 Federal News Network. All rights reserved. This website is not directed to users within the European Economic Area.


