AI-Generated Treatment Planning Algorithm Meets Real-World Medicine

By Deborah Borfitz

June 24, 2021 | One of the world’s largest cancer centers is believed to be the first to deploy artificial intelligence (AI) in a therapeutic capacity on real-world patients—specifically, for curative-intent radiation therapy treatment planning for prostate cancer. The methodology involves training an AI machine learning algorithm on previously treated patients to generate radiation treatments for new patients, and “would apply to any type of cancer provided there is [sufficient] data,” according to Tom Purdie, a medical physicist at Princess Margaret Cancer Centre and associate professor in the department of radiation oncology at the University of Toronto.

For close to a decade now, Purdie has been collaborating with Princess Margaret machine learning specialist Chris McIntosh to build the technology from the ground up and methodically steer it toward clinical adoption. This recently culminated with publication of a study in Nature Medicine (DOI: 10.1038/s41591-021-01359-w) detailing what happened when AI-generated radiation treatment plans were put in the hands of physicians making medical decisions.

In a direct comparison with conventional radiation treatments generated by humans, radiation treatments generated by the machine learning algorithm were deemed clinically acceptable in nine out of 10 cases, and in 72% of the cases AI treatments were judged to be the better course of action, says Purdie. Treatment planning using AI was also 60% faster than the human-driven process.

In their study, one phase was a simulation exercise based on 50 patients who had already received treatment, he continues. Given the same choice between AI- and human-generated treatments for a second group of 50 patients with similar demographic and disease characteristics who were about to begin radiation therapy, physicians significantly less often (61% versus 83%) preferred the plans created by the machine learning algorithm.

Up until now, Purdie notes, most of the published work on machine learning methods intended for use in clinical practice have only been tested in a retrospective setting often under idealized conditions in which the results are upper bound of what machine learning can achieve. The reality of the situation, as revealed by the study, is that humans choose results produced by machine learning less often in a prospective than a retrospective setting “where nothing is on the line.”

 

Adoption Barriers

Overall, computer-recommended prostate cancer treatments were different but of equivalent quality to those of their human counterparts, says Purdie. “It seems like when the AI treatment and human treatment were of comparable quality, [physicians] reverted to choosing the treatments that were much more similar to the ones [they had used] in the past.”

Algorithms tested under “perfect conditions” suggest only the upper bounds of what those models can do, “so we should be expecting a falloff when we actually throw [them] to people to use,” says Purdie. And it is a truism that applies to machine learning deployment in general, not just those used to generate radiation treatment plans.

Acceptance in clinical settings depends on integrating AI into familiar work processes rather than generating friction and being disruptive. “Even if AI produces better results, it might not be adopted if it is complicated to get access to those results.”

Major implementation barriers in the past have included running machine learning models on research software only a computer science expert knows how to code, or building an interface to existing systems that are not available to everyone at the hospital. At Princess Margaret, AI-generated radiation treatment plans have been fully integrated into the clinical workflow of the treatment planning system of RaySearch Laboratories that physicians were accustomed to using, he says.

Utilization of AI-produced treatment plans is the rule rather than the exception because doctors do not have to change the way they practice, Purdie continues. “We didn’t manufacture a new metric for physicians to evaluate—we’re giving them the results they are used to seeing.”

Only in a few instances has AI ever been used to produce treatment plans for patients prior to treatment, let alone deployed in a real-world therapeutic setting, he points out. “[Our pragmatic study] was the most comprehensive evaluation of machine learning for prospective use that we are aware of.”

In addition to Purdie, other key players on the research team are McIntosh, who wrote the code for the AI algorithm, and radiation oncologist Alejandro Berlin M.D. and medical physicist Leigh Conroy, both of whom helped advocate to make it a standard treatment practice at the facility. While it took about six months to build the novel software, clinical validation and getting everyone on board and comfortable with the process took over two years.

 

Standardizing Care

For the comparison study, oncologists were not aware of which radiation treatments were designed by a human and which by machine, Purdie says. Those generated by the algorithm were trained on a high-quality, peer-reviewed database of radiation therapy plans from 99 patients previously treated for prostate cancer at Princess Margaret. For each new patient, the model would automatically identify the most similar patients in the database based on thousands of features from CT images (including the typical delineation of targets and surrounding healthy organs) to infer the best treatment.

The algorithm is specific to the most common type of prostate cancer where treatment is expected to eradicate the tumor, says Purdie, although the methodology is broadly applicable to any treatment site—including the brain, breast, lungs, and rectum—to lower the risk of patients developing cardiotoxicity. None of these other uses has yet reached the clinical deployment stage, but the enabling technology has been licensed by fast-growing RaySearch Laboratories to multiple research groups worldwide, primarily in Europe.

Purdie says he expects the fully validated prostate cancer model, in use at the Princess Margaret Cancer Centre for over a year now, to enjoy widespread adoption over the next couple of years. Berlin, the medical doctor on the research team, was “blown away” by the algorithm’s performance.

The ubiquitous 80/20 rule regarding how patients get treated—where roughly 80% of cases fit nicely into predefined guidelines and protocols—is likely to still apply, he adds. But the fact that the algorithm does a great job with “zero effort on the part of the human” adds efficiency to the process in terms of time and effort while effectively reducing variability of care that is important for improved patient safety and outcomes as well as lower healthcare costs.

Getting to 80% with machine learning is not a slam dunk under the best of circumstances, Purdie says. Physician resistance might be encountered due to a host of factors such as how the information is presented, the size and culture of the practice site, and patient risk levels.

Adoption can be streamlined by choosing an algorithm physicians can understand, trust, and correlate to more familiar scoring methodologies, says Purdie. Equally crucial to robust evidence that it works is a convenient, hassle-free way to access the information where seconds matter.