Right-Sizing Site Selection: Predicting Enrollment
Second installment in a four-part special report
By Deborah Borfitz
June 25, 2019 | Janssen is one of a growing number of life science companies actively building its data science capabilities to improve clinical trial operations. The centerpiece of these efforts is a feasibility intelligence platform that uses real-world evidence (RWE) and data analytics to inform site selection strategies for studies, according to Michelle Everill, senior director, head of global feasibility.
Data feeding the analytics platform will be housed in a centralized library intended to provide source material for an assortment of business-critical projects, says Sky Cheung, a senior scientist in Janssen's R&D data sciences group. Importantly, data scientists are including contextual information about the data—including where it came from, and how and by whom it was produced—to establish what can be done with the data and how best to interpret values.
The fledgling platform was put to the test last year for a phase 3 lupus trial when predictive models were developed for quantities Janssen believed could differentiate high-performing from low-performing sites, says Cheung. Among those factors were site enrollment, speed of recruitment, and probability of enrolling one or no patients. "For each of those quantities we found historical site performance, study design and complexity, and the geographic locations of sites were all influencers [of enrollment performance] at a variety of levels."
A variety of data sources were tapped, including Citeline, DrugDev, ClinicalTrials.gov, and Truven, says Cheung. The analysis identified 9,330 site-investigator pairs and 1,068 trials, 1,034 of which were external to Janssen.
All the predictive information was put into an enrollment simulation engine to determine which sites to select for the lupus study, based in part on how long it would take to finish enrollment and how many patients a site would enroll in the first year, says Cheung. "Cumulative enrollment figures… essentially told us how many patients we could expect to enroll collectively across all sites at any point of time after the start of the trial."
One curious observation in the data was that many of the sites involved with lupus trials sponsored by Roche tended not to be the one-or-none enrollers, says Cheung. "We haven't fully investigated the reasons why."
Inexplicably, CRO-recommended sites tended to be associated with sites that were going to perform poorly. The same trend has been seen in other models done in-house, says Everill, but speaks more to the data sources and process used to select sites. It could well be that datasets being utilized are incomplete or investigator sites are inappropriately prioritized.
To deal with variability in the quality of datasets, Janssen plans to add a variety of nontraditional data resources to its feasibility intelligence platform, says Everill. These include RWE and commercial data that companies haven't historically accessed—e.g., census, site preference and physician sentiment data—that it can then build on with other information such as the conferences investigators attend and articles they publish. "We're trying to take this much broader approach and overlay as much data as possible to understand our investigators and sites better." No database, technology or system currently exists that paints a holistic picture, she says.
Natural language processing is being used to extract meaningful terms and phrases from study protocols related to the primary outcomes and endpoints, including inclusion/exclusion criteria, says Cheung. "That's something we're folding into our algorithm development."
Culture Shift
Ultimately, Janssen will scale its data-driven site selection approach to every study where enough data is available, says Everill. Where data is in short supply—as would likely be the case with rare diseases and uncommon forms of cancer—it might instead create surrogates to get something "close to predictive." The surrogates might be a biomarker with some overlap with the patients of interest.
The approach has so far been demonstrated to work in a handful of non-overlapping disease areas beyond lupus where there has been sufficient data and historical trials are fairly well aligned with the protocol constraints of the upcoming trial, says Cheung. "So long as those conditions were met our site selection methodology was able to provide valuable information to trial teams."
The site selection modeling technique was recently validated for the lupus trial. Based on real-time enrollment data for some of the sites actively recruiting patients, site-level enrollment totals are consistent with the predictions made, says Cheung.
Outputs of the analytics platforms will all be easy for trial managers and project leads to digest because all relevant performance data and predictions are viewable in a series of interactive dashboards that brings the information to life with a few clicks, says Cheung. For the lupus trial, a dashboard was created to help ensure the patient pool had appropriate racial diversity, since African Americans are disproportionately affected by the disease.
Overlaying US census data with RWE showing where lupus patients are geographically concentrated, and where sites are located, Janssen was able to come up with a pictorial representation of the proximity of sites to areas with higher proportions of lupus patients and African Americans, explains Cheung. The study has been enrolling since last fall and, as of now, more than 26% of the participants in the US are African American—more than the other participating countries.
Everill says that she was pleasantly surprised that Janssen's "people-oriented culture" has not stopped people from enthusiastically embracing data science. "Showing them the strength of the output and predictions, and how it can help create leaner, faster trials and get drugs to patients quicker was really impactful."