Real-World Data Suggests Statins Would Effectively Treat Ulcerative Colitis
By Deborah Borfitz
September 16, 2021 | Researchers at Stanford have published close to 100 papers demonstrating that publicly-available data can be tapped to accelerate clinical translation. They have now integrated real-world molecular and clinical data to show, with astonishing clarity, that statins—the commonly prescribed cholesterol-lowering drug—would be an effective, if unexpected, treatment for ulcerative colitis (UC).
Purvesh Khatri, Ph.D., associate professor of medicine and of biomedical data science at Stanford University, and lead researcher on the study that published today in the Journal of the American Medical Informatics Association (DOI: 10.1093/jamia.ocab165), says it is the first of many expected drug repurposing discoveries that will be forthcoming from his computational lab. A similar type of multi-cohort analysis is underway to identify marketed drugs that can be repurposed for lupus and systemic sclerosis, two other autoimmune diseases, to potentially short-circuit clinical translation.
More than 100 autoimmune diseases are known to exist and most of them have no drug treatment, he says, making this therapeutic area “rife with potential” for the drug repurposing approach. In the case of lupus, only one medicine has been approved by the Food and Drug Administration (FDA) over the past 60 years.
The study on statins may not be entirely good news to the many big-pharma companies that are investing enormous sums in the development of novel drugs for UC, which afflicts nearly 1 million people in the U.S. alone. Statins (including atorvastatin, the generic version of Pfizer’s Lipitor) are inexpensive, widely available, and “generally safe enough that some doctors joke they should be put in the water,” says Khatri.
UC is currently treated with anti-inflammatory drugs and, when they don’t work, partial or total surgical removal of the colon (colectomy), he points out. About 30% of patients eventually get a colectomy. “It’s the last resort.”
Among patients whose UC was being treated medically, the Stanford team found that those who were also taking statins for at least six months had a 50% lower risk of colectomy over the next three to five years, Khatri says. They were also less likely to be hospitalized or prescribed other anti-inflammatory medications.
“The longer individuals took statins, the greater the benefits,” he adds. Importantly, the results held regardless of patient age, sex, cardiovascular conditions, and use of all comparator medications. The observation period also began at the first recorded prescription to mitigate the “healthy user” bias.
While it would be premature for the researchers to recommend physicians start prescribing statins to their UC patients—if they aren’t coincidentally on one already—doing so would be a low-risk proposition relative to the potential benefits. Off-label prescribing is “definitely” going to happen, says Khatri.
Meanwhile, the team plans to run a randomized controlled trial where UC patients are treated with statins for a year or two and have their outcomes formally tracked, he says. In the computational drug repurposing study just published, colectomy was the primary outcome examined but first hospitalizations and new steroid prescriptions (to suppress gut inflammation) were also measured—collectively representing the entire disease severity range.
The Process
To begin, Stanford researchers looked across various datasets of patients with UC to identify the “gene signature” for the disease from their colon biopsy—the genes over-expressed or under-expressed relative to healthy controls, Khatri explains. The next step was to find all the drugs that would reverse that gene signature.
They then cross-compared the gene expression signatures from the UC patients with the transcriptome profiles on small-molecule drugs contained in another public dataset (NIH LINCS). Only small-molecule treatments approved by the FDA, and therefore known to be safe for human use and have some efficacy, were included in the sensitivity analysis, he adds.
Two of the three most-correlating molecules were chemotherapy drugs and dismissed as an option due to their side effect profile, Khatri says. The other was atorvastatin, which has known anti-inflammatory effects and was also recently associated with reduced risk of severe COVID-19 infection.
A search of the Stanford Research Repository (containing electronic health records of 1.8 million patients seen at Stanford University Medical Center between 2008 and 2015) and the Optum Clinformatics DataMart (national insurance claims database of 63 million U.S. residents between 2004 and 2016) turned up about 3,000 UC patients on statins as well as the usual medicines for their disease plus about 5,000 others for the comparator group.
Findability Issues
In biomedical and translational research, the motto has long been to “reduce heterogeneity at all costs,” says Khatri. To that end, clinical trials always start with a single cohort with inclusion/exclusion criteria designed to ensure patients are comparable in terms of their age, disease stage, and treatment regimen. The result is that study findings often do not translate into clinical practice where patient populations are highly heterogenous.
The same issue has come up with artificial intelligence and machine learning methods, Khatri continues. The models have “hidden biases” because the data used in creating them are themselves biased.
The FDA has certainly warmed to the idea of using real-world evidence in support of regulatory decision-making and issued guidance documents on how to proceed. Scientists also have many incentives to make their data public. But there have been bottlenecks in accessing data—even if it is designated as being “publicly available”—as well as in the study review process (e.g., nitpicking about potential confounders), says Khatri.
Publicly available data is often not findable and, even then, typically not in a format that is easily analyzable, Khatri explains. He estimates that only between 10% and 20% of data found in public data repositories on any given disease ends up being used, based on his experience in areas that include organ transplants, pulmonary fibrosis, vaccines, infectious and autoimmune diseases, and cancer.
Rewarded Efforts
Despite the obstacles, the Stanford team has successfully inverted the standard research paradigm by starting with a heterogenous cohort. The public functional genomics data repository NCBI GEO, housing 4.6 million transcriptome profiles across 160,000 independent experiments, has provided researchers with more than enough data for the dozens of published studies they have conducted over the years, says Khatri. It has also been critical to the launch of his host-response diagnostics company Inflammatix.
In 2017, Khatri and his colleagues published a statistical paper describing how much data is needed for multi-cohort gene expression analysis to generalize to heterogeneous, real-world populations. In it they show that with four to five datasets, and a total of 200 to 300 samples (including healthy controls) from three to five different centers, “you are good to go,” he says.
For the UC study, researchers used 11 whole transcriptome datasets containing 272 colon biopsies from patients with UC. Collectively, these datasets included patients from eight countries (biological heterogeneity) with a wide range of disease severity (clinical heterogeneity), and were profiled using different microarray platforms (technical heterogeneity).
It was not effortless—"every paper on every dataset” had to be painstakingly read to understand the underlying patient population, question being asked, and if the disease was correctly diagnosed, Khatri says. Just because a paper is peer-reviewed doesn’t mean all the needed information is there, or available without negotiating with the author or reminding a publication of its own stated data-sharing policies.
Real-world data might be more routinely used in clinical research without all the holdups, he says. But, as he tells his students, an extra month or two of work on the front end invariably gets rewarded with “the largest amount of data in the world for [a given] disease, bar none.”