EMD Serono Using Machine Learning To Optimize Enrollment

EMD Serono Using Machine Learning To Optimize Enrollment

By Deborah Borfitz

October 15, 2020 | Merck’s Sylvia Marecki and Omesan Nair teamed up for a presentation on harnessing machine learning to identify causal drivers of enrollment success in clinical trials at the recent Bio-IT World Conference & Expo Virtual. The Operational Design Center (ODC) at EMD Serono, a Merck KGaA affiliate, uses analytics to address study-related business questions such as the anticipated duration of enrollment and how to speed up the process.

“Drug development is inherently risky, expensive, and typically spans a wide time horizon,” says Marecki, and the clinical trial phase takes longer today than it did a decade ago.

The ODC employs a cloud-based Operational Design and Study Accelerator (ODeSA) analytics platform and the goal is to apply data science to the design and planning of studies and to do scenario modeling, says Marecki, a design analyst in the ODC. “If we rely only on correlations, it’s not possible to know if we’re relying on the most impactful dimensions of the study design or the operational plan. There are no off-the-shelf solutions that exist today that we can leverage to answer this question.”

Bayesian networks—alternatively known as belief or causal networks—are ideally suited for the job, she continues, and serve as graphical models for representing multivariate probability distributions. ODeSA embraces the approach via the Reverse Engineering, Forward Simulation (REFS) platform of artificial intelligence company GNS Healthcare to identify what matters under what conditions for study enrollment success.

The causal drivers are “not readily apparent,” Marecki continues. ODeSA explores both intrinsic factors (e.g., study design) and extrinsic factors (competitive landscape) that affect study enrollment rate and duration. All told, 1,000 intrinsic and extrinsic variables were examined across over 4,000 phase 1 through phase 3 clinical trials in the EMD Serono portfolio which had dates and durations related to the outcomes of interest available for inclusion in a data frame. About 3,500 of the trials were used for data training and 900 for the test set.

A consensus network was first used to tease out the 113 potential causal drivers across four outcomes to help make sense of the “hairball” of interconnectedness between the variables and identify predictive drivers, Marecki explains. Among the potentially actionable drivers identified were geography, number of sites, investigator history of participation and investigator performance. Also identified were drivers that are not directly actionable, including trial phase, drug mechanism of action, disease prevalence, publication “buzz,” and competing trials, whose links to actionable drivers remain under investigation.

The modeling assessments used during the test phase, when true observed outcomes were compared with the predicted ones, was covered by Nair, an ODC team member involved in operations analyses and automation of analytics processes supporting R&D operations. The data framework explained 40% to 77% of variance across the outcomes and had both good predictive performance and generalizability.

The validation use case was a phase 2/3 oncology study with known enrollment data over 7.5 months. Data was available for only 90 of the variables, he says, so a baseline was needed to properly compare interventions. Trial data fed into the REFS engine created a “background” against which interventions could be modeled in real-world settings for studies that are being planned or ongoing.

Across all 3,500 trials, simulations took 4.6 hours to run using a single processor but narrowing the list to trials more representative by phase and indication provided a more realistic reflection of the remaining variables, Nair points out. A simulation using five matched trials took only 48 seconds, leaving more time to ask questions using the predictive model.

Trial planners will be able to use the ODeSA platform to optimize a study’s geographic footprint, number of sites, investigators, comparator, and anticipated enrollment duration, says Nair, and they can probe differentiated and multivariate scenarios. For example, they might ask if reducing the number of countries and sites in a study would mitigate the negative effects of a potentially higher proportion of poorer performing investigators.

Work continues on addressing the unexplained variance for further model enhancement as well as operationalizing ODeSA, he says.

Editor’s Note: Even if you missed the start of the event, Bio-IT World Conference & Expo virtual is still live. Register now for on-demand presentations.