Dealing with Data Over-Collection in Clinical Trials
By Allison Proffitt
December 10, 2024 | Can you have too much of a good thing? Or perhaps, can you gather too much of a good thing? When it comes to clinical trials, it looks like the answer is yes.
“The industry has seen an exponential increase in the volume of data collected in clinical trials,” said Emily Botto, a senior research analyst at the Tufts Center for the Study of Drug Development (Tufts CSDD), during October’s SCOPE Europe event in Barcelona. When one considers all the talk of AI and what might be possible with more and more quality data, this growth in data gathering seems to make sense. Maybe there are discoveries just waiting to be found if we can only put more data into our systems.
But Botto reports an altogether different finding. Clinical trials are a system of people: investigators, participants, study coordinators, and site staff. And for all of these individuals, more is not more.
“We’ve found that this increase in data volume combined with the increase in clinical complexity have been tied to impacts on data quality, site burden, trial duration, and recruitment retention,” Botto said. “Additionally, Tufts CSDD found that around 26% of the data collected in Phase II trials and 23% of data collected in Phase III trials did not support core endpoints, which are defined as primary, key, secondary, and safety endpoints, and were not required.”
Last December, the Tufts Center for the Study of Drug Development published a study in Therapeutic Innovation & Regulatory Science to explore the rationale for and relevance of non-core protocol data (DOI: 10.1007/s43441-023-00595-1). They had both sponsors and FDA look at protocol data and rate various procedures as core or non-core. They found that FDA reviewers viewed far more datapoints as “non-core” than Sponsors did, even though sponsors indicated that many of these non-core procedures were administered due to perceived regulatory requirement and expectation.
Why the misalignment? Why are sponsors collecting so much more data than FDA thinks they need? And what are the impacts on the study community of participants and sites?
Tufts CSDD is planning an initiative with Transcelerate to explore data optimization, Botto said, but during a plenary panel at SCOPE Europe, she asked Paul Duffy, who leads MSD’s Clinical Site Partnership team, and Joachim Lovin, a DCT Specialist at Novo Nordisk, for their own experiences.
“We’ve seen that each clinical trial team is building the amount of data that they want. They’re all hungry for their data. They want to understand more and more what they’re seeing inside of the core principles of the protocols,” Duffy recounted.
“We’ve had to step in and say, ‘Whoa, slow down guys. The reality here is that the data that you’re asking for is really affecting the sites, how they’re able to deliver and how they’re able to gather the information. Also, it’s got a huge impact on patients.”
Duffy described the process as a runaway car. “We’re trying to run after it, slow it down, put it in first gear again, and work more slowly.”
Lovin agreed. “It’s a flood of data,” he said, particularly sensor data, and his relationship with clinical teams is changing. “I’m going from the guy who wanted to help them collect all the data to the guy that’s now going, ‘Hang on; let’s figure this out… Can [these data] actually support the end points that you’re looking for.’”
Both MSD and Novo Nordisk are actively addressing the runaway data volumes.
MSD is looking to clarify which data are core to protocols. “We’ve seen quite a large response to this—shall I say, quite a personal response to this—from our teams,” said Duffy, which has launched some difficult conversations. “They can feel safe in running these trials with less data, with less requirements,” he assures them. At the team level, this seems to resonate, but individually, “it’s been a more emotional journey,” he said.
Initial response from sites and patients, however, is very good, Duffy said. “The sites are much happier,” he reported.
Novo Nordisk is focusing their approach on building systems to identify data the company already has from a different trial or a different project. “We’ve been around for 100 years. We’ve probably collected a lot of this data before. Can we reuse that?” Lovin challenged. He also continues to push for the reasoning behind each data point. “Can it actually support the endpoints or what you’re looking for?”
Botto is sympathetic to the grand visions of what data could be used for, but she challenged the panelists to determine whether those goals were ever achieved.
“I have not actually seen that data being used in a positive way,” Duffy confessed. “I’m generally hearing about huge amounts of data being collected, and then it goes into a void.” Data rarely leads to new study areas, he said, and the collection is not done systematically. Lovin had to agree, though he did concede that that exploratory data collected for publications does lead to an uptick in publications.
Industry’s Take
Tufts CSDD is not the only group that has noticed this trend. Phesi, a software vendor that offers patient-centric data analytics, recently conducted its own assessment of trial complexity using proprietary data from its AI-driven Trial Accelerator platform.
The Phesi data included 146 protocols designed to capture 1,821 outcome measures, both primary and secondary outcomes. The study found that on average, over a third (35%) of outcome measures were not reported.
“While collecting a good amount of data is important for clinical trials, it has to be the right data at the right time,” writes Glen Li, Phesi’s founder and CEO in the study. “Many of the clinical trials analyzed were found to have redundant outcome measures.”
Those redundant outcome measures—non-core data as Tufts CSDD would call it—are burdensome for both study participants and sites.
“The analysis also showed a correlation between a lower number of outcome measures and better site enrollment, demonstrating that recruitment is improved when the trial protocol presents a smaller burden to the patients and investigator sites,” Li added. Phesi’s Trial Accelerator platform can help, he said, by assigning complexity scores to protocols.
“Investigators need to make sure they are only collecting the data they truly need. Being more precise with outcome measures makes it easier to select trial sites, recruit more patients in a shorter period of time, and collect better quality data from those patients,” Li wrote in his concluding remarks. “All these compounded benefits lead to a better return on investment, an increasingly pressing issue in the current economic environment for the pharmaceutical industry.”