With advances in automation and new drug modalities, modern laboratories generate more data than ever before, but turning data into intelligence is a different story. Lab leaders want to make their data actionable or realize data intelligence from their data pools. They know that lab data can help improve their business. All you need to do is take advantage of it.
AI doesn’t automatically give you the right intelligence
Laboratory data is fertile ground for artificial intelligence (AI). Data-driven quality control can alert labs to instrument trends and deviations. Analyzing data can improve resource allocation and budgeting, and identify emerging patterns of poor data and process integrity. The data can be actionable. Generate powerful insights, shape decisions, and improve business outcomes. And with AI, you can extract better insights from your data than ever before. For example, AI pattern recognition can help monitor and optimize processes. But there are also risks. If an AI model is trained on biased or incorrectly contextualized data, it can produce biased results.
To reach digital maturity, labs must deploy the right intelligence to extract the right intelligence. You may have heard the phrase “garbage in, garbage out.” Bad data leads to bad results. “Intelligence in” means using high-quality data, but it also means leveraging human intelligence. Success requires AI and machine learning. (ML) requires both the right data and the right people asking the right questions.
Start with the right data
“Garbage in” includes data that has transcription errors or has been stripped of context. For example, this includes data that contains only the method and results of an experiment, ignoring the context of the experiment. However, in the context of AI/ML, it also includes data that is “garbage in.” It means there is insufficient data. Typically, when an experiment is performed in a laboratory and does not yield the desired results, the data is archived, but rarely retrieved for analytical review. However, with ML models, data from failed experiments can provide useful information about how parameters actually interact. Models become more accurate with large amounts of data on whether or not they achieve desired outcomes. Therefore, “intelligence” must include data from both successful as well as failed experimental runs and assays.
In addition to being accurate, high-quality data must also be complete, comprehensive, current, and unique. Complete data has no missing entries and includes metadata or related data. Second, the data must be comprehensive for the questions the lab is trying to ask. For example, attempting to develop a golden batch and only providing a dataset containing Laboratory Information Management System (LIMS) data can generate inaccurate and biased responses. LIMS may only contain partial data, so a more complete picture requires data from other sources throughout the lab. Data must also be up-to-date. Training an algorithm using old data can produce outdated answers. Finally, your data must be unique. For example, if values are accidentally duplicated, this can further skew the data.
Deliver the right data to the right people
Second, for good data to be useful, it must be available and understandable to both humans and machines. Data is often stored in different silos and formats. Even high-quality data can be difficult to obtain.
Many companies are starting to aggregate data from all their systems into a single data lake. This collection of structured and unstructured data can provide a single source for data-consuming algorithms. However, this approach is resource intensive and is no longer needed. New tools are designed to provide access to data regardless of its location, essentially “unsiloing” system architectures without IT intervention.
No matter where your data is stored, a well-designed data backbone adds a layer on top of your data to maintain its integrity and context from various sources. These architectures are often built on FAIR data principles that ensure data is searchable, accessible, interoperable, and reusable. Previously, trained IT professionals had to work collaboratively with subject matter experts to construct the complex queries needed to obtain the desired set of solutions. New tools are reaching a stage where anyone can learn how to construct meaningful queries without knowing how to program. By putting low-code and no-code tools in the hands of lab workers, you can accelerate process development and experimentation.
Today, AI/ML has become integral to powering low-code and no-code platforms, making it easier for non-technical users to perform advanced data analysis. The synergy between AI/ML and low-code/no-code tools ensures that high-quality data is accessible and actionable, allowing users with varying levels of expertise to contribute to data-driven decision-making. It will look like this.
“Intelligence in, intelligence out” means that the results are influenced not only by the quality of the data being analyzed, but also by the people seeking the answers. This is true even after the data backbone is established and optimized, but even up to that point. When designing a data backbone, human intelligence is key to ensuring that data is optimally captured, contextualized, stored, and accessed.
Ask the right people the right questions
Having the right people on a big data project often means filling all roles. Diverse perspectives help ensure you ask the right questions within your company. What data is important? Given these goals, how should the data be organized? These are all questions that can differ from lab to lab.
Bench scientists and technologists need to be involved from day one of new data strategies. They are often in the best position to understand the problem space and ensure that the right questions are being asked in the first place.
Business leaders and data experts are also critical to ensuring that the architecture captures data in a way that can be queried to answer business questions and achieve desired business outcomes.
The most successful labs understand the business needs of science and process development and often partner with industry experts who have data science skills and capabilities. In many cases, these external partners also serve as helpful training resources.
The industry is evolving in digital maturity, from wet experiments to in silico techniques, but knowledge gaps and communication can be obstacles. A shared foundation of digital literacy about how AI and ML models work is essential for all team members. That foundation must include a shared commitment to the importance of managing high-quality data. A shared vocabulary helps stakeholders communicate with each other and technology partners about data architecture and feasibility.
While AI tools are certainly democratizing access to insights, true data intelligence requires starting from the beginning with high-quality, well-organized data supported by knowledgeable and thoughtful humans in every part of the business. An intelligent approach is required all the way to the end.