Curing Data Collection Woes Could Take a Village
By Deborah Borfitz
October 8, 2024 | The average clinical trial generates three times as much data as it did a decade ago, and yet there are no standards in place for how that information should be collected. That’s a big problem for many reasons, not the least of which is that drug developers may be missing key insights about the way interventions are working in individuals and across populations, according to Erin Erginer, senior director of the Pinnacle 21 product suite at Certara.
“The issue is that the science is growing a lot faster than organizations can keep up with,” says Erginer, who hopped the fence from the pharma industry to software three years ago. That was just before Certara’s acquisition of Pinnacle 21, whose founder and CEO, Max Kanevsky, now serves as Certara’s chief technology officer.
Pharma companies have been using electronic data capture (EDC) for decades to capture patient information, such as demographics and vital signs, and are therefore well versed in using such systems, she says. But the inflow today also includes genetic sequencing and histochemistry data, biomarkers and imaging data, and digital data from electronic clinical outcome assessments and a variety of ingestible, implantable, and wearable devices.
Since this bolus of data is coming in from so many disparate sources and systems that are each “speaking a different language,” things can get quickly get very complicated, says Erginer. The word choices used by clinicians in a patient-facing role are understandably quite different than those of study team members ingesting data from biospecimens and pathology reports, or biostatisticians using mathematical algorithms to make decisions about patient recruitment, treatment allocation, or outcome assessment.
All these different teams need a way to interpret, analyze, and report data in a way that’s meaningful to them, Erginer says. They also need to collaborate and have a “single source of truth so they understand the impacts of what they’re doing to the data on other teams that are around them.”
This is where Pinnacle 21 comes in. It provides a shared space for collaborating across the entire R&D lifecycle as well as building out data collection standards for both EDC and non-EDC data so study sponsors can establish data-gathering requirements ahead of time, she shares.
By building those study-specific requirements, companies can ensure that any data coming in abides by those rules, notes Erginer. Data can be validated against the standards, so any violations are immediately identified, and issues get resolved sooner and therefore more easily.
Certara, a public company since 2020, has a hand in more than 90% of all novel drugs that have been approved by the U.S. Food and Drug Administration (FDA) over the past decade, says Erginer. Most of the larger pharma organizations are using multiple Certara software products for drug development purposes to speed up the long journey from compound discovery to regulatory submission.
“I think one of our biggest advantages in this space is that... we like to play well with others,” Erginer says. “We know that pharma organizations are going to be using multiple different vendors and so we make sure [those technologies can be used alongside] whatever software we are building.”
Most of Certara’s competitors are in fact singular products, she says. “I am not familiar with another company that has the sort of breadth of portfolio that we have.” In addition to its many products and services, the company employs about 1,500 scientists and subject matter experts globally, most of them with a Ph.D. after their name.
Data Deluge
Erginer has worked in clinical spaces over the past 20 years, the last half of that time focused on multiple aspects of the R&D process, including data acquisition, management, and standardization. The common thread has been dealing with all the complicated clinical trial data being captured outside of the EDC, she says.
The quest continues at Certara, where she is helping build out solutions and technology for managing the increasing volume and complexity of data flowing into the clinical trial workspace at the same time everyone is learning the new science that it represents. The FDA expects pharma companies to demonstrate that data collected throughout the clinical trial lifecycle has not been transformed in a way that compromises its integrity, says Erginer, but the agency is silent on how that data should be collected in the first place.
“It is really up to individual pharma companies to determine what they are going to dictate as their process to get that data into their clinical trials... [so as] not to transform or change it,” she continues, and then find ways to efficiently get data into the right structures for decision-making purposes. This is in stark contrast to FDA submission requirements that have prompted nonprofit groups (e.g., CDISC) to help organizations collaborate on the development of robust standards.
The best sponsor companies can do for now is access technology and software that allows them to flexibly manage the typical 3.6 million data points generated by a clinical trial data in a way that reduces the burden on study teams and to work collaboratively in a centralized location, Erginer advises. They also need technology enabling collaboration with biospecimen technology vendors to better understand that data stream and establish some basic data collection norms.
It won’t be easy, since the FDA is not using its regulatory prowess to spur alignment and standardization across vendors, she adds. But collection standards would allow for faster data handoffs industrywide. Her hope, she adds, is that pharma companies will see the value in working collaboratively on standards to speed up the overall pace of drug development and ensure medicines are delivered to patients faster, “which is really what everyone’s aim should be.”
This will of course take a collective commitment of resources and volunteers to make happen, Erginer says. TransCelerate BioPharma has been approached about taking on the development of data collection standards as an initiative, but it has yet to happen.
Multifaceted Tool
Beyond data quality and compliance, and collaboration, Pinnacle 21 is designed to aid data flow, automation, and decision-making, says Erginer. For example, the technology presents data to different teams based on their needs and in a way that is interpretable for the intended purpose.
Data changes in terms of its structure, format, and terminology based on where it is collected because different teams—e.g., clinical, data management, data standards, clinical programming, and biostatistical—are all trying to understand the data in a different way, she adds. It is therefore important to be crystal clear about how the handoff of data from one structure to the next happened, and to ensure that no data are lost along the way and everyone understands the information at each point in the study lifecycle. The expertise of therapeutic area specialists is required to help “map that information ontology one source to the next.”
On the automation front, Pinnacle 21 most notably helps by, “taking work that is typically done via emails and spreadsheets and putting [those communications] into the cloud, so there is a clear, transparent, single source of truth,” says Erginer. “All the manual steps you’d normally be doing in handing off data one team to the next are all automatically done by the system... to ensure data integrity.”
Emails are prone to getting blocked or misdirected and can disappear entirely on studies going on for a decade, she continues. “If you bring those communications into the platform, everyone can see all the questions that were ever asked... in one place, so everyone can be part of that same conversation.” In lieu of long email chains, information gets organized into topics relevant to different roles and can be readily queried.
Decision-making improves when teams work together in one environment to build data collection requirements, Erginer says. “It means you are going to have much better control over that entire end-to-end flow... [and] you can find those issues before they actually happen in a clinical trial.” Importantly, the time it takes to make the critical go or no-go decisions in clinical trials moves a lot faster because the folks at the end of the ongoing workflow can start doing their calculations sooner—and, potentially, identify ways early on to trim the overall timeline and better guarantee study outcomes.
Practical AI Use Cases
In early 2023, Certara acquired Vyasa Analytics—now Certara AI—to start layering artificial intelligence into the Pinnacle 21 product portfolio. Erginer says she is currently exploring the use of natural language processing to read unstructured documents (e.g., study protocols) and extract out information that can automatically populate different parts of its software to reduce the burden of manual copying and pasting.
Pharma companies are intensely interested in “practical applications” of AI for clinical data management purposes, she says. Another likely first use case is generative AI for extracting information from tabular data in study protocols and turning it into medical writing documents required for regulatory submissions.
Additionally, Erginer says she is working on building capabilities within Pinnacle 21 to knit together information from disparate systems and present different views of the whole to key decision-makers. These are all projects in the early development stages, she notes, and areas Certara is intending to invest in as the company works to build a data automation platform from protocol to submission.