Insilico Medicine’s AI-Driven Platform Pushes The Envelope Of Drug Discovery

By Deborah Borfitz 

December 15, 2022 | During a recent virtual unveiling party livestreamed from around the world, Insilico Medicine introduced the latest updates to Pharma.AI, its end-to-end drug discovery platform using generative artificial intelligence to discover new targets, design new molecules, and predict the outcome of clinical trials. Among the highlights were the addition of a knowledge graph to target discovery product PandaOmics and numerous new panels to generative chemistry engine Chemistry42, as well as specifics on the predictive capabilities of the platform’s newest inClinico tool.  

Up first, from the company’s headquarters in Hong Kong, was CEO Alex Zhavoronkov, Ph.D., pointing out Insilico Medicine’s global reach with sites in Taipei, Shanghai, Abu Dhabi, Montreal, New York, and San Francisco. The most important feature of Pharma.AI, he says, is that it is validated. To date, 31 internally developed therapeutic programs have been derived from the platform—in addition to a “very large number” of others being run externally by partnering pharmaceutical companies and academic institutions. 

Notably, Insilico Medicine is one of a handful of AI-based companies anywhere to make its platform commercially available as a service delivered over the internet or by installing the software on premises, Zhavoronkov says. The company was effectively born at NVIDIA GTC, a global AI conference, in 2014 with a focus on deep learning for drug discovery. 

In 2021, after 11 years at GSK and another three with contract research organization Medicilon, Feng Ren, Ph.D., joined Insilico Medicine as chief scientific officer and head of research and development and quickly transitioned into his role as the company's co-CEO. Together, Zhavoronkov says, they began “performing miracles” with the Pharma.AI platform.  

NVIDIA founder Jensen Huang made a brief appearance at the event, pointing to the significant role of the microprocessor in accelerating computing and the application of artificial intelligence. “Nowhere is the impact of AI going to be more profound than digital biology and healthcare,” he says, where its ability to learn complex patterns and relationships has given researchers a way to “decipher the meaning of DNA, proteins and chemicals, and explore the vast universe of drug discovery at lightning speed.” 

Large language models that have learned the language of chemistry and biology are at the heart of this revolution, says Huang, and Insilico Medicine is in the lead position. “Insilico identified a potentially new cellular target and treatment for idiopathic pulmonary fibrosis, a chronic lung disease which results in irreversible decline in lung function. They conjured up a novel drug in less than 18 months for pennies on the dollar compared to traditional drug discovery methods.” 

Alex Aliper, Ph.D., co-founder and head of AI platforms, is one of few people at Insilico Medicine to stay on after the company’s decision in 2015 to hedge its bets on generative chemistry, says Zhavoronkov. He has also played a pivotal role in the development of inClinico, “the oldest and most important project” of the company initially trying to predict the phase 2 to phase 3 transitions in the pipeline of big pharmaceutical companies but now finding an audience with hedge funds and banks supporting the development efforts of small and medium-sized biotechs. 

Development and continued refinement of the Pharma.AI platform is being driven by the expertise and guidance of a global team of biologists, chemists, developers, and AI experts, according to Aliper. Since the launch of PandaOmics just over two years ago, the company has made multiple advancements to the platform. Among the many applications of Pharma.AI was a recent breakthrough study (Frontiers in Aging NeuroscienceDOI: 10.3389/fnagi.2022.914017) showcasing how PandaOmics identified eight novel targets against amyotrophic lateral sclerosis (ALS).  

PandaOmics 3.0 Fleming 

Insilico Medicine believes rigorous testing and feedback in real-world settings using real-world data provides the strongest possible validation of any major piece of software, according to Petrina Kamya, Ph.D., head of AI platforms, in explaining the company’s rationale for making Pharma.AI available to all. “We are democratizing AI for drug discovery.” 

PandaOmics, which incorporates an interactive artificial intelligence encyclopedia, is designed for building action-driven hypotheses regarding which targets and corresponding pathways are most implicated in a disease, she says. The leveraged data sources include multiomics data, clinical trial data, and publications.

The tool has been used to identify targets for some of the most challenging diseases, including cancer as well as fibrosis and ALS, says Daniil Polykovskiy, Ph.D., director of IT. Newly launching PandaOmics 3.0 Fleming introduces a transformer-based knowledge graph for comparison and data harmonization, he announces.  

The knowledge graph—a network linking genes, diseases, chemical compounds, and biological processes automatically generated by a deep learning model that reads scientific publications—is among the most appealing features of the release, Polykovskiy says. The relationship between each pair of connecting nodes is indicated in a legend by arrows or line segments. 

With the cross-dataset comparison feature, PandaOmics can now simultaneously analyze several datasets to enhance the statistical power of the platform, adds Kamya. The data harmonization function allows users to merge multiple datasets together, and the group list of different datasets can be expanded so users can select their sample groups of interest and assign case or control groups accordingly. 

More than 60 scoring philosophies have been integrated into the PandaOmics engine, says Zhavoronkov, including ones tied to genetic evidence, genomic expression profiles, metabolomics, protein-protein interaction networks, and specified tissues of interest. “Basically, we made PandaOmics suitable to tackle any problem ... [and] identify a trackable hypothesis.” 

Therapeutic area leads initially need to know the scoring mechanism is tied to the biological evidence they’re used to relying on, to feel comfortable with the validity of the targets coming out of the platform, explains Polykovskiy.  Over time, they tend to become more willing to experiment with additional AI scores “to increase the level of novelty while still retaining this [traditional] link.” 

Sometimes, commercial tractability of a new drug is a key consideration, Polykovskiy says. “We try to make PandaOmics into an MMA [mixed martial arts] fighter for any situation.” Soon, PandaOmics could start doing automatic target selection in real time, based on the disease profile on a sample, he adds.   

Chemistry42 2.0 LeCun 

The latest release of Chemistry42, 2.0 LeCun, is packed with the many features requested by drug hunters to overcome more challenging real-world problems when designing therapeutics, says Polykovskiy. It also has improved ease of use. 

Chemistry42 is designed for exploring uncharted chemical space to enable the rapid discovery of novel, lead-like structures, he says. It includes the annotation tools to guide the selection of molecular structure, as well as a module for predicting the ease of synthesizing them.  

The product’s pocket-ligand interaction score, based on information about different types of interactions, now incorporates fine-tuned parameters for different pocket types, Kamya says. Chemistry42 also includes a predictive model for seven ADME (absorption, distribution, metabolism, and excretion) properties that are important in later stages of drug discovery, and the list will grow with each upcoming release. 

Another new feature with 2.0 LeCun, Polykovskiy adds, is a sketcher tool allowing users to modify generated structures that can then be scored and included in the dataset. The latest release also introduces a Golden Cubes module for ligand-based scoring of the kinase selectivity of small molecules to minimize unintended, off-target effects that can lead to severe side effects. “This update is the largest update of the platform so far, and we have an aggressive program [of planned improvements] up ahead.” 

As Zhavoronkov reports during the post-presentation Q&A, Chemistry42’s capabilities extend not only to the tracking of protein-protein interactions. The software can also simulate some of those interactions, which play a key role in predicting the protein function of targets and druggability of molecules. 

Chemistry42 can be used for novel short peptides as well as small molecule therapeutics, Kamya later adds. 

Predicting Trial Success 

The core utility of inClinico 1.0, the latest addition to the Pharma.AI suite, is to the probability that a clinical trial will succeed as it transitions from one stage to the next, says Kamya. Perhaps more importantly, the framework pins down the properties behind a prediction of failure or success using data on investigator and pharma trials. 

Many data sources are used to make predictions, says Polykovskiy. These include information from study sites as well as analyses of a drug’s mechanism of action based on proprietary and nonproprietary biomedical knowledge, a drug’s ADME tox profile, the classification and signaling pathway of the disease, and study eligibility criteria. Deep graph neural networks are among the machine learning models employed. 

Key features of inClinico, Kamya says, are “a comprehensive clinical trial database to scout for clinical catalysts that drive growth in pharmaceutical companies ... [and] the opportunity to research clinical trials under a microscope.” A proprietary biomedical knowledge graph was utilized to build in a model for estimating the relevance of a drug’s mechanism of action for a given disease. 

Importantly, inClinico also models the critical part of patients’ eligibility criteria relating to a trial’s success, says Polykovskiy. Users can know the impact of adding or removing different eligibility criteria on clinical trial outcome scores. “This is just the first step in our journey into clinical trials and we can’t wait to see how users will uncover the knowledge and data and reports that we provide.” 

Insilico Medicine started quietly putting its date-stamped trial outcome predictions on preprint server medRxiv in 2016, Zhavoronkov says, which should ultimately provide support of the effectiveness of the inClinico tool. Predictions have been made on trials in the Novartis and Roche pipelines as well as a few smaller, unrelated biotechs. The company has worked extensively on retrospective validation as well as “quasi-prospective” validation using data that mimics the prospective, time-stamped data. 

‘Virtual Portfolios’ 

Additionally, Insilico Medicine has some “virtual portfolios” where it is doing simulated trading for small and medium biotechs involving fixed predictions, he adds. “Our actual first customers for inClinico are hedge funds and banks who are looking to take certain positions in those companies, and we can now intelligently talk to them about the returns they can get ... we have made those predictions and demonstrated prospectively that the algorithm works ... for a certain percentage of the clinical trials.” 

To be predictable, a trial needs to be on a single, small molecule agent that is a targeted therapeutic, he elaborates. “If it is not targeted, we won’t be able to significantly augment the probability of success by making this prediction.”  

The prospective validation work is happening in partnership with hedge funds as well as big pharma companies, adds Aliper, where inClinico has demonstrated 80% predictive accuracy. “In most therapeutic areas we are outperforming ... the baseline benchmarks available for these sorts of predictions.” 

For the purely prospective validations, between 100 and 300 clinical trials are typically being used, he says. The overall database contains roughly 180,000 trials of a targeted small molecule.  

As Aliper points out during the Q&A, inClinico incorporates profile data on different clinical sites and statistically evaluates how productive or efficient they are in running trials in specific therapeutic areas. The tool also indirectly evaluates historical data points coming from sites in different geographies, to account for differing placebo rates seen with treatments for specific central nervous system disorders.  

For clinical oncology, inClinico can use predictive values based on the preclinical toxicology and ADME of a small molecule and, if disclosed by the pharma company, the molecule’s structure, says Aliper. The final prediction can include an evaluation of ongoing phase 2 trials and programs in the broader clinical portfolio that may have a higher potential of covering more indications. 

Role For Deepfakes 

Startup investor Michael Antonov, one of the initial financers of Insilico Medicine, spoke briefly about a few of the technologies being integrated into the workings of the Pharma.AI platform. These include the Nanome platform allowing researchers to work together on molecule development in the same virtual reality space. 

“Insilico generates molecules in much the same way that we can get today with deepfakes using AI,” Antonov says. But instead of using the technology to create artwork or lifelike images of virtual human beings—or fake news and hoaxes—it is being deployed for the design of new molecules that could potentially treat disease. 

Polykovskiy and Steve McCloskey, CEO of Nanome, next provided a first-hand look of the company’s platform by putting on their virtual reality headsets to observe and manipulate molecules designed using Chemistry42 and AlphaFold, an AI program for predicting protein structures. 

Any protein structure prediction tool could be used with the Pharma.AI platform, says Zhavoronkov, but “ideally they should be using the experimentally derived crystal.” As reported in a study on medRxiv, AlphaFold has been used to generate molecules based on a target picked by the PandaOmics engine and “it might be the first AlphaFold-derived molecule that went to drug-like status in a matter of a few weeks.” 

Documenting The Science 

Few people anywhere in the world understand drug discovery and development at even a rudimentary level, points out Zhavoronkov. He has therefore created the first-ever “docuthon” inviting filmmakers to submit a documentary on the costly, complex target-to-clinic journey that has a notoriously low probability of success.   

Since the launch of its first antifibrotic program, Insilico Medicine has generated over 130 hours of high-quality film footage shot in locations around the world with different key opinion leaders and drug discovery experts, he says. That footage is being offered to university students working in filmmaking as well as professional documentarians to transform into long- and short-form stories and compete for prizes. 

Results will be determined by a panel of experts from the science and documentary worlds and be announced in May 2023. Zhavoronkov says he is hopeful pharmaceutical companies will come up with prizes and perhaps contribute some content. 

“One of the main challenges in drug discovery is the fundamental complexity of human biology,” says Zhavoronkov. It is still not understood how most drugs currently on the market work and why they cause certain side effects.  

“I think the integration of AI-powered robotics with personalized medicine and drug discovery ... is the future,” Zhavoronkov says. “We need to get as close to the clinic as possible ... [and] work with real patients.” 

Technology in general is a “huge market of waste,” he says. “We very often think about recycling plastic ... but we are not thinking about recycling biomedical assets ... [and consequently] we are going to see a lot of companies fail and go out of business.” The solution, he believes, is to have more players in the biology and chemistry field working collaboratively on answering complex questions and solving diseases.