How Pfizer Used Intelligent Machines to Take on Clinical Data
By Allison Proffitt
October 12, 2021 | Data determines AI feasibility, explained Prasanna Rao, Head of AI and Data Science at Pfizer. Formerly of IBM Watson, Rao outlined Pfizer’s recent experience implementing AI tools to aid with clinical data management at last month’s Bio-IT World Conference & Expo.
“If you have a lot of data points the problems tend to be solved. Machines are very good at looking at patterns in large datasets and making those inferences and predictions,” Rao told the hybrid audience at the event. If you have fewer data points, those are still research challenges.
As impact increases, feasibility decreases, he observed. AI in target discovery, for example, would be highly impactful, but is less feasible. Clinical trial optimization is lower impact, but highly feasible.
The use cases for AI in clinical trial optimization are many, Rao said, listing the top 25 AI uses cases in clinical development as Pfizer sees them. For clinical trial optimization, Rao listed content generation, risk-based monitoring, document verification automation, automatic QC for submissions, end-to-end metadata lineage, SDTM auto-mappings, and more.
“When it comes to Pfizer’s journey, we started small, and we proved that AI is very effective in the clinical trial optimization use case,” he explained. He highlighted how Pfizer built and trained an AI solution for smart data query.
Smarter Queries, Deeper Data
Traditionally, sites and clinical trial participants have submitted data via EDC, ePatient diaries, and other formats. “The data manager’s job is to review the data through listings and reports, and issue a query to a site: ‘A certain data point is not logically consistent, so please explain.’”
Pfizer wanted to automate that process to predict possible data discrepancies and reasons for these discrepancies and then auto-generate query text while keeping data managers and clinicians in the loop to review the predictions and scoring them with a thumbs up and thumbs down.
The team focused on queries in six of the most critical domains: adverse events, labs, dosing, concomitant medications, medical history, and eDairies. Previously, it was up to a human data manager to reconcile findings across these domains. Is something an adverse event, or an expected side effect of a concomitant medication? Does an eDiary entry refer to a new symptom or something already noted in the medical history?
“Cross-panel reconciliation is a challenge, and none of the programmatic techniques are solving this problem,” Rao said. There is far too much data. “We need intelligent machines to look at this, make some sense out of it, and start inferring certain things, which today are done by trained clinicians or data managers.”
Pfizer used BioBERT for named entity recognition to extract concomitant medications and possible diagnoses associated along with Bio-ELECTRA, which was trained on 30 million PubMed articles, more than 100,000 adverse event terms from Pfizer queries, and 30 historical studies.
Through “teach cycles”, Pfizer trained the model with historical data, validated for a baseline accuracy of 70%, and went through a series of iterations to improve accuracy. Accuracy reached 85-95%, Rao said. The team prioritized keeping the false negative rate—when the model made no prediction compared to the human query—extremely low, less than 1%. This kept the overall accuracy a bit lower by allowing more false positives. The team still manually reviews false negative predictions periodically.
“When we apply these types of techniques to our use case, we’re finding an extreme order of success for operationalizing some of these very tricky problems in clinical data.” He reported that from data capture to query generation, Pfizer cut down the time by almost 50% without compromising on data integrity.
Keys to Success
Implementing a robust AI solution will include an assessment phase, design and development phase, and an operational phase, but Rao recommends starting with ROI. “First define what is your end-stage goal. Are you trying to improve your operational efficiency? Are you trying to reduce your cycle time? Or both?”
Once that is clear, Rao advised, then you plan out your data discovery and machine training.
Let the users define their own user experience, he said. It will only improve adoption. “Ultimately embedding AI in operations is extremely useful for scaling, compliance, as well as gaining confidence and trust with the users.”
Starting with a small, well-defined use case is key to delivering incremental value to users, Rao said. His team began with Smart Data Query, but has since expanded to AI offerings in medical coding, protocol deviation, and other use cases.
He warned again making solutions to study or indication specific. His team began this project for vaccine development, but has been able to transfer it to oncology and other indications.
Finally, he encouraged the audience to education stakeholders on accuracy verses value. “Every stakeholder wants 100% accuracy. That’s where hte conversation needs to develop about the value you will get even if your system is not 100% accurate. Why is that? You’re trying to balance the human effort in correcting the predictions and providing feedback to the machine so you don’t have any false negatives. It’s ok for accuracy to be 85%, 97% You can still get a lot of value even if the system isn’t 100% accurate.