How Building a Clinical Data Pipeline Will Dramatically Improve Data Quality

How Building a Clinical Data Pipeline Will Dramatically Improve Data Quality

Contributed Commentary By Sheila Rocchio

June 9, 2020 | Many clinical trials are utilizing new digital data streams, sources, and data types including eCOA, wearables, genomics, biomarkers, images and videos. With these richer datasets come the opportunity for new and multi-faceted insights into patient experiences with a new therapy if the data can be assembled and organized in near real-time and published to key stakeholders. Today, most trials using multiple sources are instead experiencing a lack of data visibility, access, and control due to these data often being unavailable until the very end of a trial where the focus is on reconciliation. This in turn may lead to cycle time delays with large pharma experiencing a 40% increase in time from LPLV to Database lock time according to a 2019 Tufts-eClinical Solutions Data Strategies Evolution Study* since 2017. In order to reduce the impact of delays associated with managing external data sources, life sciences companies can assemble data pipelines that automate the process of acquiring, transforming, and standardizing all their disparate data sources allowing researchers to benefit from faster insights garnered from holistic views of patients.

What is a Data Pipeline?

A data pipeline is like an energy pipeline that brings data to the places it needs to go without manual interventions. A data pipeline collects, organizes, transforms, and standardizes data before it is stored. With each new source used in clinical trials, there are new formats to contend with in order for information to be accurately stored and analyzed. With an effective data pipeline, data can be organized into a single system that checks for accuracy and quality upon ingestion and according to standards. The old ETL (extract, transform and load) paradigm was for data to be transformed prior to being loaded and with modern technologies this is no longer the case—data can be ingested in any form and transformed at later points when it makes sense to combine and normalize with other sources. Unlike old systems where data was processed in batches, modern data pipelines can constantly gather and clean data to reduce backlog and improve efficiencies.

Data Pipelines & Data Quality

Data pipelines are important for many types of enterprises, but they are imperative in clinical research due to the large amounts of data that must be collected from multiple data sources. Wearables and omics data streams are just some of the new sources that are becoming more commonplace in clinical trials. However, there are varying levels of trust in these sources because they are still relatively new. In a recent report, 22% of clinical trial providers were worried about the quality of data from wearables. Data pipelines can reduce human error by ensuring with automation that all data is valid, has a clear chain of custody, and is ready for real-time analysis and consumption.

Data quality and data pipelines are a little like the chicken and the egg: to achieve high-quality data, especially as sources grow exponentially, organizations need to have a solid data pipeline in place. However, to develop an effective data pipeline, high-quality data must be readily available. This interdependency is due to the fact that, in order for data pipelines to be scalable and effective, they also need to be dynamic and adaptable. Without streams of high-quality data, it is difficult to understand the objectives for the pipeline and what the future of the pipeline should look like. Future-proofing the data pipeline will be essential to ensuring that data meets the highest standards as enterprises and trials scale.

Life sciences organizations recognize that they need quality, usable data, but many do not have the time or assets available to build out and manage an effective pipeline. In a survey conducted by Tufts, 87% of companies said that time and resources were the top barriers for data strategy implementation, followed by technology modernization and regulatory concerns. As one of the building blocks of a strong data management plan, a well-defined pipeline can bring life sciences organizations closer to meeting their data quality goals. With a data strategy, data definitions and collection guidelines are clearly established, while data checks and systems are set up to ensure data is free of errors and in a usable formats. This, in turn, prepares the data to be leveraged for more powerful insights and analytics that can lead to more efficient clinical trials and accelerate the speed to market for much-needed drugs and therapies.

Sheila Rocchio is the Chief Marketing Officer of eClinical Solutions and oversees the company’s marketing and product management functions. She enjoys finding creative ways to tell customer stories and building products and services that help clinicians, data scientists and technologists do the challenging and important work of bringing new therapies to market. She can be reached at srocchio@eclinicalsol.com.