When Placebo Response Isn't Placebo Response: The Perils of Human Raters

Contributed Commentary by Jack Modell 

December 5, 2014 | During my 20 years in clinical practice as a psychiatrist and medical researcher, I noticed early on among my patients with major mental illnesses, particularly those with severe depression and psychotic disorders, that they did not generally get better by my simply being nice to them, treating them with ineffective medications, seeing them weekly for office visits, or by providing other so-called supportive interventions that did not directly address the underlying illness. This is not to say that kindness and supportive therapy are not critical to the patient-physician relationship, but rather that kindness and support alone rarely make a biologically based illness substantially improve or disappear.

With this background, I vividly recall my surprise when I joined the pharmaceutical industry and was asked to figure out how to decrease the nearly 50 percent placebo-response rate they see in antidepressant trials for major depressive disorder. I had a hard time believing that 50 percent of patients in a true major depressive episode get better on placebos or just by seeing the doctor every couple of weeks, but they had voluminous data supporting their figure.

Reviewing the data and published explanations for high “placebo response rates” in psychiatric studies, three particularly important factors emerge which are easily mitigated by proper trial design and training.

1) Subjects admitted into clinical trials often have psychiatric symptoms, but do not meet criteria for the disorder being studied. Examples include subjects with personality disorders whose symptoms partly mimic the disorder being studied, subjects with symptoms in response to a particular stressor (not of sufficient severity or duration to meet formal criteria for a major psychiatric disorder and likely to abate over time), and subjects who may feign or exaggerate symptoms to gain attention or access to a clinical trial. Unlike the very sick patients I encountered in clinical practice, subjects with these presentations often improve with only supportive interventions.

Recruitment of severely ill subjects is difficult because of widespread availability of effective medications. Patients in a major depressive episode, for example, are reluctant to commit to the effort, uncertainties, and treatment delays involved with a clinical trial when a prescription for an effective antidepressant can be filled quickly.

2) Investigators often inflate or deflate ratings to enable subjects to enroll, or remain enrolled in a clinical trial. This can be done by coaching subjects on their answers or, when subject responses fall in between scale severity ratings, by rounding to a rating that is more likely to qualify the subject for the trial.

At the first follow-up visit, with less incentive to influence rating scores, the scores of the entire included population return toward their true values. Moreover, subjects and investigators, expecting that the onset of treatment should coincide with some clinical improvement, may bias rating scores to reflect this expectation even though the symptoms of the illness may not show much true change.

While this early “improvement” in rating scores for subjects in clinical trials may appear to be a placebo effect, it is largely driven by artificially inflated scale scores regressing back to their true distribution. The introduction of non-qualified subjects to the study and rater bias will continue to hamper detection of actual drug-placebo differences throughout the study.

3) Investigators do not always understand the objective of the clinical trial, which should not be “to show treatment efficacy,” but rather to test the null hypothesis of no treatment difference or to estimate likely treatment effect, as well as objectively recording all adverse events. Many also do not fully understand the importance of objectivity and consistency in performing clinical ratings, the intention and importance of inclusion and exclusion criteria, and the destructive effect on the outcome and scientific integrity of the trial that including subjects who are not fully qualified can have.

Each of these factors can skew drug and placebo trial populations and results, making it appear that subjects improved well beyond what would have resulted with strict adherence to protocol requirements and objective assessment of study entry and outcome measures.

What can be done to prevent these problems? Thorough and meticulous investigator and rater education and training are essential. Often, perfunctory explanations of the protocol and clinical assessment tools are provided at investigator meetings, and rater training takes the form of brief demonstrations of how the rating scales are used, without testing raters to ensure they understand how the scales are to be used, scored, and interpreted. Rater training must include testing for adequate understanding and proficiency in use of each of the study measures.

Beyond review of the protocol design and study requirements, training of research staff must include detailed explanations about why the trial is designed exactly as it is; the importance of adherence to inclusion and exclusion criteria; and the need for honesty, objectivity, and consistency. Detailed training on the disease under study also ensures that site staff have a complete understanding of the intended clinical population and its characteristics.

No matter how well-informed or well-intentioned investigators might be, humans cannot match computers in objectivity and consistency. Unless programmed to do so, a computer cannot coach a subject on how to respond, nor inflate (or deflate) ratings based on feelings, expectations, or desired outcomes. Following precise algorithms, a computer faithfully asks and interprets questions and records their responses the same way every time. Several studies have shown that computerized assessments of entry criteria and outcome measures in clinical trials – in particular, interactive voice/web response systems (IVRS/IWRS) – provide data of quality and signal-detection ability that meet or exceed that obtained by human raters. Strong consideration should be given to using IVRS/IWRS systems for assessing study entry criteria and endpoints that allow such use.

The author gratefully acknowledges John H. Greist, MD, for his outstanding research and input.

Jack Modell, M.D., is the senior medical officer at Rho. A board-certified psychiatrist with more than 30 years of experience in clinical research, teaching and patient care, he has also spent nearly 15 years working in clinical drug development. Modell’s specialties include neuroscience, pharmacology, drug development, clinical research, medical governance and clinical diagnosis and treatment. He is known for leading the first successful development of preventative pharmacotherapy for the depressive episodes of season affective disorder. 

 


 

Selected References 

Greist JH, Mundt JC, Kobak K. Factors contributing to failed trials of new agents:  can technology prevent some problems. J Clin Psychiatry 2002;63[suppl 2]:8-13.

Mundt JC, Greist JH, Jefferson JW, Katzelnick DJ, DeBrota DJ, Chappell PB, Modell JG. Is it easier to find what you are looking for if you think you know what it looks like? J Clinical Psychopharmacol 2007;27:121-125.

http://www.healthtechsys.com/publications/ivrpubs2.html