The Importance of Early Phase Trials in Precision Oncology
Contributed Commentary by Elisabeth Coart, PhD, and Everardo Saad, MD
March 25, 2021 | Nowadays, drug development in oncology is largely centered on the basic premise of precision medicine, according to which targeted agents are used based on the expression of predictive biomarkers. Even though late-phase trials of precision oncology are undergoing transformations, we believe that the greatest impact of precision oncology has been in the design of early-phase trials. By this we mean not only basket, umbrella, and platform designs used in phase 2, but also phase 1 trials, especially those with expansion cohorts. One evident current trend is an earlier focus on treatment efficacy than in the past. Also, increasing competition and demands for return on investment often put smaller companies under extreme pressure to generate evidence for the efficacy of their products early on in clinical development. Our goal when assisting these companies is to balance speed and reliability in the design and implementation of early-phase trials.
Setting the Stage
A situation typically faced by smaller companies developing new products is the need to decide between conducting a non-randomized, and thus probably smaller trial, or exchange the efficiency of the former design for the reliability of randomized trials, usually larger and more costly. When thinking about randomized trials, it is common to expect a large increase in sample size. This is true when the goal of randomization is to formally compare results with the new drug and control. However, if the control group is used only for “calibration”, as explained below, the increase in sample size may be more easily affordable. Therefore, we tend to favor randomized designs, assuming it is possible to choose an appropriate control arm. Even when the therapeutic landscape is quickly changing in a given indication, a compromise can be found by using as control “treatment of physician’s choice” or through protocol amendments regarding standard of care.
Randomization in Expansion Cohorts
Several authors consider randomization between different dose levels as a desirable feature of expansion cohorts, a view also expressed by the FDA guidance on this subject. Although we agree with that, especially when the dose-escalation phase has not been able to find the recommended phase 2 dose, we go one step further and advocate randomization against a true control treatment. Only with a control group—whose results can be contrasted with those from the literature—is it possible to assess the amount of selection bias that plagues the results of single-arm trials; this is what we mean by “calibration”. A control group is much better than historical comparisons, because historical data are often unreliable and may lead to wrong conclusions.
We acknowledge that phase 1 trials in which patients are randomized between a promising new drug and standard of care may meet with resistance, both on ethical grounds and on the basis of feasibility and cost. We believe randomization can seldom be considered unethical if one acknowledges the uncertainty that results from using a new agent and from the large individual variability in responses and treatment benefit in oncology. The ethical issue can also be addressed, at least in part, by using a design that allows cross-over to the experimental treatment if it is indeed active. Arguments related to feasibility and cost are also presented in order to justify the implementation of non-randomized early-phase trials. As a general rule, if feasibility is not an issue—because poor accrual may undermine the whole enterprise—we believe the potential increase in cost is compensated by the increase in reliability of results.
A Practical Example
We often advocate for unequal randomization in early trials (for example, a ratio of 2:1 favoring the new drug). This maximizes the number of patients treated with the new drug, for which more information is needed. We surmise that, in an extreme case in which a sponsor needs to decide between a single-arm and a randomized trial of the same size, the randomized trial is superior. Heuristically, we would say it is better to have a trial with 20 patients randomized to the new drug and 10 to standard of care than a single-arm trial with 30 patients treated with the new drug. In fairness, the decision is not often of this type, but rather, for example, between around 25 patients in a single-arm trial or 35 to 40 in a randomized, non-comparative trial. The gain in precision from the single-arm trial does not warrant the loss in reliability.
Before illustrating our contention, a statistical explanation is needed. Single-arm trials are designed with the null hypothesis that the efficacy of the drug (say, the objective response rate, ORR) is below or equal to a certain historical value that implies that the new drug does not warrant further investigation in that setting. The alternative hypothesis is that the efficacy is equal to or above a chosen threshold that warrants further investigation. Therefore, and this is a crucial point, rejection of the null hypothesis means that the efficacy is above the historical value, but not necessarily equal to or above the alternative hypothesis. Since the oncology literature is characterized by great variability in results from different studies in the same population, relying on historical data for development decisions can be extremely dangerous.
Suppose that a sponsor wants to conduct an expansion cohort in an indication for which the historical ORR is 40%, considering 60% as an ORR worthy of further studies. A common solution is to use a two-stage design assuming a one-sided type-I error of 10% and power of 80%, which leads to sample sizes of 12 evaluable patients in the first stage and 16 in the second (28 in total). If the interim analysis shows fewer than five responses, the study can be stopped for futility; otherwise, it continues. If the final analysis shows fewer than 15 responses in the 28 patients, the treatment is declared inactive. If 15 or more responses are observed, the treatment is considered active. However, there is large uncertainty about the true ORR for the treatment. If, for example, exactly 15 responses are seen, the lower limit of the two-sided 80% confidence interval for the observed ORR is 40.1%, which would not conform to the alternative hypothesis used in the design.
A conventional alternative to a single-arm design would be a randomized design aiming to compare the two treatments. Suppose now that the sponsor wants to conduct a randomized trial in which a control treatment with a historical ORR of 40% will be compared with the new treatment, and that the ORR of interest is again 60%. Assuming the same statistical parameters, the required sample size would be 116 patients. If an interim analysis is planned to allow for stopping for futility, an even larger sample size would be required. Few companies can afford this sample size as part of the early development of their compounds
In a situation like this, we often propose an intermediate solution between a single-arm and a comparative design: a non-comparative, randomized design. Suppose again that patients will be randomized between a control treatment with a historical ORR of 40% or the new treatment, with an ORR of interest of 60%. Remember, there is no plan to formally compare the two arms, the control is used simply for “calibration”. As a result, the sample size for the trial is not determined by the expected difference between arms, but simply on the expected results for the new drug. Such a trial can be designed, using the same statistical parameters as before, with a total sample size of 42 patients (28 in the experimental and 14 in the control arm). This allows for an interim analysis as for the single-arm trial, stopping for futility in the same way. Therefore, there is an increase in sample size of 50%, in comparison with the single-arm design, but the trade-off is the gain in reliability, because the control arm can inform further decisions about the development program.
The usually small sample sizes of single-arm expansion cohorts come at the cost of large uncertainty, which can be minimized by having a concurrent control arm that may calibrate historical results. Randomized, non-comparative trials are an interesting intermediate solution between single-arm expansion cohorts and comparative studies. We believe that, in the long run, randomized designs are more likely to be efficient than single-arm designs.
Elisabeth (Els) Coart, PhD, is Director, Consulting Services IDDI. She is IDDI’s expert in analytical and clinical validation of IVDs. She has a strong background in assay development combined with 10 years’ experience as statistician for biotech and diagnostic industries. She has a longstanding interest in Alzheimer’s disease (AD) biomarkers and presents IDDI’s work in this field at AD symposia (AAIC, AD/PD).
Everardo Saad is the medical director of IDDI and a member of its team of consultants. He is a medical oncologist who trained at the University of Texas M.D. Anderson Cancer Center and developed a special interest in clinical-trial methodology. After several years of clinical practice, he shifted his career toward consultancy in clinical research. He has over 15 years of experience in design and analysis of clinical trials for pharmaceutical/biotech companies and academic groups. He has published more extensively in the area of efficacy endpoints in oncology. He can be reached at everardo.saad@iddi.com.