The Problem with Phase III Clinical Trials
Phase III clinical trials are the final phase of research needed before a drug receives U.S. Food and Drug Administration (FDA) approval. Two fairly large-scale studies are needed and they need to show the drug is both safe and effective on the subjects tested.
There’s been a long-standing problem with such clinical studies, however, one that the FDA has long been aware of but powerless to fix. They are purposely designed to employ stringent inclusion and exclusion criteria that may exclude a substantial portion of the population. In other words, the people the drugs are studied on are not representative of the people that will actually be receiving the drugs once approved.
In other words, Phase III clinical studies are stacked in favor of finding positive results for the medication under study.
A new study published in the latest issue of The American Journal of Psychiatry by Wisniewski and colleagues (2009) decided to put the hypothesis to the test by examining the great data generated by the government-backed STAR*D project. “STAR*D was designed with broad inclusion and minimal exclusion criteria to ensure recruitment of a representative sample of treatment-seeking depressed outpatients who receive treatment in typical clinical settings,” the researchers noted.
The researchers divided the STAR*D subjects into two groups — those who would’ve qualified for a Phase III clinical trial (the “efficacy sample”), and those who would not have:
STAR*D enrolled a total of 4,041 participants, 2,876 of whom made up an analyzable sample (having at least one postbaseline visit and a score of 14 or higher on the HAM-D). Of these, 2,855 could be classified into the efficacy sample (N=635, 22.2%) or the nonefficacy sample (N=2,220, 77.8%)
You can see an interesting phenomenon already based upon the researchers’ classification. Only 22.2 percent of the subjects in the STAR*D would have qualified for a Phase III clinical trial. The vast majority of subjects would not have qualified, which immediately calls into question the generalizability and usefulness of data that would only apply to 22.2 percent of the population. (Previous research has suggested this number may be as low as 9 percent.)
They also found that the efficacy sample, as compared to the non-efficacy sample of depressed people, had:
- Shorter duration of depression
- Lower rates of prior suicide attempts
- Lower rates of family history of substance abuse
- Lower rates of anxiety and other non-depressive symptoms
- More likely to be seen in a psychiatric specialty care setting
- Less likely to have severe side effects
- Less likely to have a serious adverse event (either psychiatric or due to the medication)
All of which may readily explain the observation by most clinicians that medications rarely meet the expectations found and published in peer-reviewed research (the so-called “gold standard”):
[A]ll measures of outcome showed significant but modest differences between the groups, with the efficacy sample having, on average, better outcomes. These differences were consistent in the direction and magnitude of effect when examined separately in primary and psychiatric care settings.
Given these between-group differences, the smaller efficacy sample is clearly not representative of the more inclusive, treatment-seeking population. By inference, a patient sample that meets the inclusion criteria for a phase III clinical trial is not representative of depressed patients seen in typical clinical practice, and phase III trial outcomes may be more optimistic than results obtained in practice.[…]
To our knowledge, the current study is the first to examine the differences in treatment outcome. Notably, response and remission rates were poorer and the times to response and remission were longer in patients ineligible for efficacy trials. Thus, current efficacy trials suggest a more optimistic outcome than is likely in practice, and the duration of adequate treatment suggested by data from efficacy trials may be too short.
There is an obvious trade-off in opening up Phase III clinical trials to a broader and more representative sample of patients — medications will not meet the FDA’s threshold for efficacy, and therefore not be approved. Therefore, unless the FDA were to change their Phase III requirements, this situation is not likely to change on its own, independently, any time soon, despite data such as this that shows the research is fundamentally flawed.
In research, how you choose your sample is a fundamental way that you can help shape your results. Researchers know this, of course, and often will pick inclusion or exclusion criteria for their sample that will lead to the greatest likelihood of them finding significance in their data. Once you know what to look for in sampling (e.g., Is it a randomized or a convenience sample? Are the inclusion/exclusion criteria overly strict? Is it representative of the population and demographics?), you can tell a lot about the actual usefulness and generalizability of the study’s findings.
The latest research continues a long line of similar studies that give us insight into why medications rarely seem to work as well (or as with as few side effects) as their clinical trials indicated.
So if you’re feeling frustrated about your antidepressant or psychiatric medication not working as well as advertised, this may be one of the reasons why — it’s not as effective in the general population as it is on the cherry-picked sample studied.
Stephen R. Wisniewski, A. John Rush, Andrew A. Nierenberg, Bradley N. Gaynes, Diane Warden, James F. Luther, Patrick J. McGrath, Philip W. Lavori, Michael E. Thase, Maurizio Fava, and Madhukar H. Trivedi. (2009). Can Phase III Trial Results of Antidepressant Medications Be Generalized to Clinical Practice? A STAR*D Report. Am J Psychiatry, 166(5), 599-607.
Grohol, J. (2018). The Problem with Phase III Clinical Trials. Psych Central. Retrieved on April 7, 2020, from https://psychcentral.com/blog/the-problem-with-phase-iii-clinical-trials/