AI Platform Aims to Translate Trial Results at the Point of Care
By Deborah Borfitz
March 18, 2025 | For medical oncologists, the clinical quandary in treating patients is whether a newly marketed agent or drug cocktail will work as well as it did for the average cancer patient in a landmark clinical trial. “To be completely honest, I am usually just referring to the results of the trial to make my decision and sort of hoping” it is the optimal choice, says Ravi B. Parikh, medical director of the data and technology applications shared resource at Winship Cancer Institute of Emory University.
As he well knows, the newly available drug isn’t going to work for a substantial proportion of his patients. But, like his counterparts everywhere, he has a difficult time identifying who the nonresponders will be.
Genetic and molecular biomarkers have been discovered to suggest when a treatment might be more, or less, effective. “We still have this problem of generalizing the results of a clinical trial to the patient sitting in front of us,” says Parikh. “There is a lot of heterogeneity in response, even within a clinical trial,” in addition to the fact that only about 5% of patients with cancer are represented in a typical study.
In hopes of addressing the situation, Parikh and his colleagues used artificial intelligence (AI) to phenotype patients into risk categories and then emulate the eligibility criteria of 11 landmark randomized clinical trials (RCTs) across four of the most prevalent advanced solid malignancies (lung, breast, prostate, and colorectal) in the U.S. This was in turn used to simulate a clinical trial using a nationwide database of electronic health records (EHRs) from Flatiron Health, inclusive of clinical, demographic, laboratory, and genomic data.
Their platform, known as TrialTranslator, is designed to “translate” clinical trial results to real-world populations. The machine-generated phenotypes are effectively AI-driven biomarkers that represent an amalgamation of different factors used to guide clinical decision-making, Parikh says. TrialTranslator could be grouped into the broad category of “digital twins,” he adds, since it creates computational phenotypes of patients that can be used to study the effectiveness of a given drug.
The platform was used to generate survival results for the machine learning-derived-subgroups to find that only patients in the generally healthier low- and medium-risk phenotypes had similar survival times and treatment-associated survival benefits as those in the RCTs, as reported recently in Nature Medicine (DOI: 10.1038/s41591-024-03352-5). Results of the RCTs didn’t hold up well in the real world for the high-risk phenotypes, who had more adverse clinical risk factors and tended not to enjoy the same survival benefits, says Parikh, highlighting the “heterogeneity effect” even for people who are theoretically eligible for the trial.
More Than an Eligibility Problem
One big takeaway here is the importance of expanding trial eligibility criteria to include patients who are likely to participate in the real world. Patients are currently excluded by virtue of their age, ability to perform everyday activities, or organ function, and not always for good reason, Parikh says. Among these are individuals with borderline kidney or liver dysfunction—abnormalities that affect many real-world patients.
The American Society of Clinical Oncology and Friends of Cancer Research both support broadening eligibility criteria for clinical trials to make them more inclusive, he notes. The U.S. Food and Drug Administration also issued draft guidance last spring recommending the expansion of eligibility criteria to include cancer patients with a wider range of performance status. The tricky part is that industry sponsors largely determine eligibility criteria and tend to control the characteristics of enrolled participants to optimize the interpretation of results.
“Just because we expand eligibility criteria doesn’t mean we make it easier for the average patient to go into a clinical trial,” adds Parikh. In addition to transportation and awareness barriers, clinical trials are open at some privileged centers and not others. Both eligibility and enrollment issues need to be addressed to produce higher quality evidence that can be generalized to patients being seen in the clinic.
In the meantime, being able to predict if patients are likely to respond to a treatment as suggested by clinical trial results could potentially be advantageous in preventing overtreatment or ineffective treatment, he says, as well as their exposure to agents that come with side effects. Parikh says he is personally most excited about the potential of TrialTranslator to identify high-risk subgroups of individuals who don’t respond well to a clinical trial agent and could become targets of forthcoming studies taking treatment in different directions.
Three-Step Process
The core team behind the development of TrialTranslator includes two clinicians, two biostatistics faculty members with expertise in machine learning, and a student (Xavier Orcutt, now an M.D.) with both clinical and machine learning expertise, says Parikh. It has been a four-year endeavor requiring a “melding of the minds” to understand which clinical trials were able to be replicated and worthy of the effort, and how to interpret the clinical phenotypes that were generated.
Computational know-how ensured the right methods were utilized to learn from the clinical and genomic data and outputs were presented in the right way, he says. Three sensitivity analyses were done to help assess the robustness of TrialTranslator results—one to examine the impact of different strategies for estimating and filling in missing data points and two others in the trial emulation phase to see if the computed treatment effects differed if strict eligibility criteria were used or patient inclusion depended on receiving standard doses of guideline-recommended chemotherapeutic agents.
The TrialTranslator framework encompasses three broad steps, Parikh explains. The first is to clean the inherently “messy” EHR data so that it is all in a consistent format that can be processed by a machine learning algorithm. One simple example is that weight might be variably recorded in pounds or kilograms but needs to be standardized to one or the other. The cleanup work fell to Orcutt, and getting the data in neat columns and rows represented at least half of the work of the published analysis, he says.
The data once cleaned was used by machine learning algorithms to phenotype patients into the high-, medium-, or low-risk group and within those subgroups emulate the clinical trials by reproducing much of the studies’ eligibility criteria. In doing that, researchers identified the people who received one study-associated treatment versus another to run a simulated trial and learn how the drugs fared against one another.
The number of subgroups is flexible, he adds, since there is no clinical rationale for differentiating the risk levels. “If we had other criteria that we wanted to subgroup these patients under, for example to identify more groups or... separate people who are likely to survive 10 months versus not survive 10 months, then we could do that. We can set the cutoff point for phenotyping however we want and still do the trial emulation.”
Utility Today and Beyond
Parikh says he believes TrialTranslator could ultimately be a clinically useful tool across study types and treatment settings to phenotype real-world patients when treatment decisions are being made. This would of course be after its validation with other real-world data sets, potentially from clinical trials, and the platform undergoes regulatory review, he adds.
One big need to move forward is more and better real-world data sources, says Parikh. Information about the types of treatment patients receive, and their cost, are collected quite well already, but not the granular, cancer-specific information needed to understand how treatments work in the real world without having to wait five or 10 years for the next generation of clinical trials to report out. “That can be done very easily just by having some centralized recording of already collected electronic health records data” and making it available for ingestion by machine learning algorithms.
For research purposes, TrialTranslator can be used today to simulate a trial in a hypothetical patient or to apply the underlying code to other real-world data sets in advance of launching a trial with real patients, he says. It could also “transform” post-marketing surveillance by basing it on real-world data rather than claims data, a broad swath of EHR data, or much lower quality evidence such as patient self-reporting to learn who is and who is not benefitting from new-to-the-scene drugs.
Given other high-quality data sets, TrialTranslator could be used to translate the results of other trials to the real-world populations, including studies conducted within the four malignancies just examined or trials in other cancers, says Parikh. Another promising possibility, which he and his team are already pursuing, is to use the platform to interrogate negative clinical trials to see if they can be resuscitated—that is, potentially expand access to therapies previously deemed ineffective by mining large amounts of real-world data.
Over the next few years, their focus will also be on two other big areas, he reports. These are expanding the number of trials that they’re able to emulate and moving beyond the study of overall or progression-free survival since it is not the only outcome that matters to patients. Adverse events and quality of life are also top concerns.
The problem is these more granular types of data aren’t necessarily collected well at present. But the situation is improving rapidly, meaning Parikh and his colleagues may be able to use TrialTranslator to emulate not only a trial but also an approximation of the non-survival outcomes being reported. The platform might even help expand the number of adverse events that are being reported so that the next version of the tool reports on the likelihood of an adverse event instead of just the odds of survival.
Greater private and public sector investment in the collection of high-quality real-world data for all cancers across all sites of care in and outside the U.S. is a must, says Parikh, as is more expansive clinical trials so EHR data isn’t required to generate clinical insights. He foresees a future where AI is both collecting and processing regulatory grade real-world data across all cancers with TrialTranslator operating at the speed of an office visit.
Leave a comment

