Building Trust In Real-World Evidence
By Deborah Borfitz
July 21, 2021 | Lessons learned to date about the growing use of real-world evidence (RWE), including emerging approaches to improve study design and the measurement of treatment effects, were highlighted during a presentation by regulatory and pharmacoepidemiology (PE) experts at the recent DIA 2021 Global Annual Meeting. The wide-ranging conversation touched on everything from the regulatory context in which RWE is being used and whether clinical questions can be reliably addressed to the quality of real-world data (RWD) sources and more rigorous methodological approaches that might be adopted to help ensure confidence in study findings.
Methods need to be fit for purposes, according to session moderator Nancy Dreyer, chief scientific officer and senior vice president at IQVIA. “Early in the pandemic, high-profile retraction of studies using real-world evidence fueled mistrust.”
Dreyer was referencing moves last year by The Lancet and the New England Journal of Medicine, which retracted separate studies relying on the same international database that included electronic health records (EHR) from 169 hospitals. The studies hit speed records by making it from last patient visit to publication in under six months, she says, but had troubling issues with source data validation.
RWE has had successful application, even in COVID-related studies, and the opportunities are expanding, notes Mark McClellan, director and professor of business, medicine, and policy at the Duke-Margolis Center For Health Policy. Among the growing spectrum of RWE uses are to inform the development of clinical trials, regulatory reviews (e.g., to modify drug labels), improve care delivery, and help de-risk drug development for different stakeholder groups. Big data, artificial intelligence (AI), and precision medicine are all fueling its use.
The Duke-Margolis Center for Health Policy Real-World Evidence Collaborative provides a framework for the development and use of RWE, McClellan says, including whether it is likely to be accurate and useful. The variables include the clinical and regulatory context and whether the questions raised are clinically actionable.
Much work is now underway to improve the accuracy and quality of the underlying data, he adds. Choice of statistical method is important in eliminating bias to ensure study results are actionable.
Marking Progress
As McClellan and his colleagues recently reported in Clinical Pharmacology & Therapeutics (DOI: 10.1002/cpt.2272), RWE was used to support regulatory decisions on medical product effectiveness 34 times between 1954 and 2020. Most instances were for oncology (26%) and rare conditions (50%), and included RWE on the label (61%).
Many uses were for external controls in single-arm trials, continues McClellan. Stated reasons for not including RWE on the label included lack of pre-specification of study design and data issues—including its relevancy to the questions at hand.
The pandemic has provided many opportunities to advance the use of RWE in support of regulatory decision-making, says McClellan. He specifically cited an effort by the U.S. Food and Drug Administration (FDA) to align observational studies through Reagan-Udall Foundation’s COVID-19 Evidence Accelerator model. These studies reflect high adoption of digital health technologies such as remote patient monitoring and telemedicine, he notes.
McClellan also reports on a study, published last fall in JAMA Internal Medicine (DOI: 10.1001/jamainternmed.2020.6252) that follows the framework developed by the RWE Collaborative. The study examines the association between early, off-label use of the monoclonal antibody tocilizumab and mortality among critically ill patients with COVID-19.
The 4,000-patient study used methods to emulate a hypothetical target trial, including analytic approaches to adjust for confounding by indication and to prevent immortal time bias by accounting for the time delay between admission to the ICU and initiation of treatment. The study found a .7 mortality ratio (patients treated with tocilizumab had a lower risk of death compared with those who did not) and the association between treatment with tocilizumab and death was larger among patients admitted to the ICU within three days of symptom onset.
It was not an easy study to get published, McClellan says, but results mirrored that of randomized controlled trials (RCTs).
The 2021 agenda for the Duke-Margolis Center for Health Policy includes efforts to duplicate RCT results using RWE, he says. The RWE Collaborative will be focusing on more efficient collection of data at the point of care as well as exploring opportunities with EHR-embedded pragmatic clinical trials and fit-for-purpose RWE for multiple stakeholders.
Focusing on outcomes that matter to patients in RWE is an imperative, McClelland adds, pointing to the FDA’s Project Patient Voice and Patient-Focused Drug Development initiative.
Regulatory Submissions
A RWE use case for regulatory decision-making was presented by Winona Rei Bolislis, regulatory science and policy manager at Sanofi. She discussed findings of a study published early last year in Clinical Therapeutics (DOI: 10.1016/j.clinthera.2020.03.006) on how RWD had been used in regulatory approvals over a recent 20-year period. The Clarivate platform was used to look at new drug applications and line extensions submitted to the FDA, the European Medicines Agency (EMA), Health Canada, and Japan's Pharmaceuticals and Medical Devices Agency (PMDA).
The study identified 17 cases in which RWD were used for new drug applications (between 1998 and 2019) and 10 for line extensions (between 2012 and 2019), only some of which were approved by regulators, she says.
RWD has been applied broadly across therapeutic areas and populations, notably for rare diseases with a limited number of affected patients, Bolislis says. The data sources have predominantly been EHRs (16 cases) and patient registries (eight cases).
In most instances, RWD were used either as primary data, when noncomparative data were available to demonstrate tolerability and efficacy, or as supportive data when validating findings, she continues. Review timelines were mostly short (less than a year). RWD has been used across companies of all sizes, although more so by large companies.
Moving forward, use of RWD will gain the most ground where RCTs are challenging to conduct and regulators are offering accelerated review of products for patients in need, she says. The reliability and relevance of data underlying the RWE used in regulatory submissions is key, as covered in the FDA’s published framework.
Currently, Bolislis concludes, RWD is proving particularly helpful in monitoring the safety and effectiveness of therapies for COVID-19, especially those in post-marketing use.
Harmonization Needs
The sheer variety of data in real-world settings, range of data formats, ways of extracting and cleaning data, new types of data entry and transfer and analysis methods (e.g., AI), and potential study designs mean it will be a “long journey” to obtain value from RWE, according to Yoshiaki Uyama, Ph.D., director of the Office of Medical Informatics and Epidemiology for the PMDA.
The agency began regularly issuing guidance on proper utilization of health and medical information in real-world databases in 2014 and, like the U.S., has experience with the use of external comparator patients in new drug applications. The two essential elements in the use of RWE in the regulator setting are data reliability and appropriate analysis to arrive at an interpretable result, he says.
Global sharing of RWE comes with a similarly long list of challenges related to variability in approaches and regulations, Uyama adds, although efforts have been made toward international harmonization and will be increasing. Among the positive signs are renovation of the International Council for Harmonization (ICH) E6 Good Clinical Practice Guideline and establishment of the Pharmacoepidemiology (PE) Discussion Group whose goal is to harmonize the technical scientific requirements related to PE studies submitted to regulatory agencies.
PMDA also has a new RWD working group that launched earlier this year whose topics of discussion have included reliability issues with real-world data, Uyama says. The group is enabling knowledge- and experience-sharing across different offices and functions at the agency.
Comparability Assessments
Brian Bradbury, vice president and head of the Center for Observational Research at Amgen, focused on the “methods considerations” of fit-for-purpose RWE. The goal here, he says, is “unbiased estimate” of treatment effect.
Threats to validity include confounding, selection bias, and misclassification, Bradbury says. In PE studies using RWD, confounding at treatment initiation is most concerning because medicines are allocated to patients based on their clinical presentation that are themselves predictive of untoward outcomes.
Patients in two treatment arms differ systemically in terms of “prognostic factors” that need to be addressed through bias control, he continues. In some instances, “the forces driving selection of the treatment far outweigh potential benefits that may be observed by the therapy.”
The key guiding principles when using RWD to generate RWE is “design, design, design,” says Bradbury, adding that new user study design is among the best options. Importantly, researchers need to select an appropriate treatment comparator arm, use a causal diagram to guide selection of candidate confounders, and ensure accurate capture of exposure, outcomes, and covariates.
“Propensity scoring” and “inverse probability of treatment weighting” are among the suitable statistical adjustment methodologies, in addition to “comparability assessment,” Bradbury says. Robustness of findings should also be tested through sensitivity analysis and principle quantitative bias analysis.
New user design is amenable to high-quality PE research because it attempts to “mimic the idea of random allocation,” says Bradbury. In the real world, new users are selected for a therapy by a physician who has determined that the patients needed medical intervention and decided to treat them with one medicine or another. “For that same indication, we can select new users for one [medicine] versus another as an example of pseudo-randomization,” he explains.
Physicians might also be switching patients to a second-line therapy because the first medicine was not having the intended benefits. In both instances, “moving to a therapy” can be a mechanism for contrasting two treatments, Bradbury says.
Study design and analysis can be built in such a way that they adjust for the inevitable “residual imbalances” between the two treatment groups, so they are exchangeable, he says. “But we also have to recognize that a lot of medicines under development today are… add-ons to the standard of care or they may be second- or third-line therapies,” which complicates selection of the comparator arm.
The idea of “trial emulation” can help researchers get to an unbiased estimate of treatment effect, he says, but researchers may want to test that they have succeeded in addressing the biases.
Leading thinkers in the PE community have proposed the idea of using “negative control outcomes” to detect potential uncontrolled confounding, Bradbury shares. “For a negative control outcome to be a valid test, it has to be causally unrelated to the treatments under study…. We are trying to find something that should not be related to the outcome but looks like it is because of confounding and after we’ve adjusted for it, we should not see that relationship anymore.”
Earlier-than-expected benefit from a medicine and major traumatic events (e.g., blood transfusion) indicative of greater disease severity are two potential negative control outcomes, says Bradbury.
The “gating approach” to comparative analysis that is now gaining traction might have researchers test to see if they have successfully controlled for confounding as a third step after study design and bias adjustment, he says. Any meaningful differences observed between two treatment groups after employing negative control outcomes might suggest the comparison effectiveness or comparative safety study should not move forward.
To illustrate the concept, Bradbury references work done by his group looking at patients taking different types of osteoporosis medications. When comparing two injectable medications to one another after adjusting for confounding across several potential negative control outcomes—including incidence of transfusion, major accidents, influenza vaccinations, and other screening efforts—the treatments were found to be “fairly well balanced.”
But in a similar comparison of an injectable medication and an oral medication, the researchers observed “a number of residual imbalances,” he says. Because the two therapies did not appear to meet the “exchangeability assumption,” they cannot be fairly compared “because we probably haven’t addressed all of the bias that’s occurring at the treatment decision.”
The use of negative control outcomes is “not necessarily easy,” says Bradbury, in part because selecting them requires expert clinical judgment and considerable thought about how to effectively conduct the comparative analysis. “We may have to bring in many different types of negative controls to capture the different types of bias that can creep into pharmacoepidemiologic studies” and, preferably, they all have the same confounding structure.
An added complexity is that one or two comparisons may reveal an imbalance, but others do not, he notes. The “right thresholds for determining what constitutes meaningful differences” still needs to be developed.
Despite the challenges, Bradbury says, he believes integrating negative control outcomes into the comparative analysis process is a “great opportunity” to help inform and guide thinking on when an unbiased estimate of treatment effect has been achieved. “A [diagnostic] comparability assessment can potentially help increase study validity and ideally these types of steps will help build trust in real-world evidence.”