Applying AI To Clinical Data: Pfizer’s Approach to Identifying Use Cases
By Deborah Borfitz
April 9, 2020 | The machine learning (ML) capabilities of four companies were put to the test in a first-of-its-kind hackathon organized by Pfizer late last year where the singular goal was to see whether artificial intelligence (AI) could predict and identify data discrepancies from datasets of 30 completed clinical trials. In addition to the feasibility of applying AI to clinical data, the competition demonstrated it could be done quickly to appraise performance of potential partners, according to Demetris Zambas, head of data monitoring and management for biometrics and data management in Global Product Development at Pfizer.
“We were pleasantly surprised by the results from the company we ultimately chose,” says Zambas, a presenter at the 11th Annual Summit for Clinical Ops Executives (SCOPE) in Orlando.
The problem with more traditional hackathons, he adds, is that “they’re not looking for specific indicators, they’re looking to find… fairy dust.” They produce little, if anything, tangible.
Pfizer began by telling contestants how many errors it had found manually in the database so they could “teach their tools,” says Zambas. It then posed the challenge: “What can your tool find?” Although not the only criteria for doing business with Pfizer, the accuracy measure was a first-pass test.
By measuring the outputs of the various ML solutions, and their accuracy, Pfizer upended the usual configuration of a hackathon where the participants are all looking for the proverbial needle in a haystack, he notes.
One of the beauties of the competitive approach was speed—the entire hackathon lasted six weeks and one-third of that was for “prep,” Zambas says. Individual ML tools had only four weeks to train on the study data before companies participating in the hackathon had to present their findings.
It was understood that a particular tool’s performance would only improve once Pfizer commenced with supervised learning to throw out any of the bad predictions, he continues.
Moving forward, Pfizer will likely hold similar but broader hackathons with a registration and participation process, Zambas says. This initial hackathon was invitation-only and included “technical and data analytics-type organizations” both large and small.
Different Hammers
Content in the hackathon included data queries, such as correlations between adverse events and concomitant medications, which couldn’t be optimized by simple robotic process automation (RPA), says Zambas.
More than 32 RPA projects were tackled by Pfizer in 2019 following a comprehensive process improvement exercise involving business analysts, business support groups, clinical development teams, the data monitoring and management group and a partner group in programming, says Zambas. During a two-day workshop with each clinical operations function, they took turns attaching Post-it Notes on a room-size diagram of various processes to indicate manual steps that were repeatable or potentially error-prone.
More than 120 automation opportunities were initially identified, he says, which were then prioritized based on feasibility, data availability, and potential risk and mitigation considerations. In effect, a project’s complexity was subtracted from its benefits to come up with a score and a subset of those were deemed suitable for ML, he says.
The prevailing philosophy is: “take advantage of what you can automate with basic process automation” in weeks versus months, says Zambas, which will generally resolve 70% to 80% of process bottlenecks. “You can still do the exceptions manually and look forward to the day when you can perhaps incorporate those into an ML solution.”
The problem is, “when you have a hammer everything starts to look like a nail,” he continues. “You need different hammers for different jobs and when ground truth is not there, the data to support the ground truth is not there either… Don’t waste time on [AI] use cases that are doomed to fail.”
Among the simplest RPA projects in the initial batch of 32 at Pfizer was one that checks for submission of required clinical trial documents and sends out notifications of any omissions, Zambas says. The more complex ones involve disparate data sources where the information is in different formats, including serious adverse event reconciliation and automating the placement and formatting of graphs in study reports.
The industry view is that opportunities for RPA, with or without AI, lay in one of three buckets, he says:
Process efficiency — data query management, study builder, mining/monitoring visit report, document verification, document translation, voice-assisted lab safety/documentation, classify pathology images, reduce trial dropout rates, risk-based monitoring and submission quality control
Drug/trial design – optimize multi-targeted drug-like molecules using pharmacokinetics, predict how a drug behaves in a body, therapy evidence for cost/outcome, predict/discover novel protein features and characteristics, drug repurposing, trial planning, recruitment prediction, patient to trial, trial to patient matching, enrollment optimization and trial simulation
Target/biomarker discovery – literature search/curation and competitive research for compounds, predict target/biomarker with omics, design potential new drug combinations for activating proteins, predict cancer progression using tumor DNA in blood samples, insights to mechanism of disease/molecule (using gene, protein and cell-level view)
Benefits For All
A 2019 white paper Zambas co-authored discusses in detail the evolution of clinical data management to clinical data science, which requires a more sophisticated skillset but provides a potentially more rewarding career. Those in the field are, “now getting to give their input on study design and the data collection and analytics methodology itself, instead of just getting the data and scrubbing it.”
Clinical data managers are also spending less time measuring “units”, instead focusing on “more significant, outcome-level” deliverables such as whether the study collected required data with as little imposition on patients as possible or successfully delivered data on schedule without any unnecessary rework, he adds. Focusing on units instead of outcomes is like calling surgery a success, even if the patient dies on the table, simply because the doctor had followed a prescribed set of steps.
When employing ML, data managers need to ensure the models are fed the right ground truth and to archive that training record—along with the feedback of humans in the loop, says Zambas. The tools, like members of the study team, need to be “qualified to be doing what they’re doing.”
An audit trail is but one type of metadata used to teach ML algorithms, and sponsors may have different levels of it, Zambas says. “All the metadata data may not be accessible to the sponsor if studies are outsourced.”
The goals of the data and automation workstream at Pfizer are to eliminate errors and drive speed, says Zambas. The objective is to use data and automation tools to drive faster drug development to benefit patients and society.
Patients as well as Pfizer are better off if the company is making more real-time decision-making, Zambas says. Refreshing data every six hours and coupling human and artificial intelligence to identify potential discrepancies to “really” lock the database after the last patient, visit and lab analyses are done can facilitate bringing needed therapies to patients faster.
“Every study you apply this to doesn’t automatically accelerate a submission, but the pivotal ones do,” he adds. Importantly, Pfizer has established internal goals that a certain percentage of clinical studies will leverage more automation and electronic data capture capabilities to accelerate drug development.
Editor’s Note: Pfizer says that “Demetris Zambas is an employee of Pfizer and his views expressed in this article may not represent the views of Pfizer.