Last week, the results of the world’s largest effort to reproduce results found in psychology research came in. Brian Nosek’s Reproducibility Project took a look at 100 psychology experiments’ results published in 2008 from just three major psychology journals. It attempted to reproduce the study to see what kind of results they would get.
In an ideal world, one might think that something on order of 75 or even 80 percent of the studies should have reproduced similar results, right? Because the new studies where simply re-conducted on a different population by researchers who carefully followed the original researchers’ methods. In most cases, the researchers also had direct contact and cooperation from the original researchers.
But in a finding spun a dozen different ways since published in last week’s Science journal, the Project didn’t come anywhere close to 75 percent. Only 36 percent of the replications produced significant results — compared to 97 percent of the original 100 studies.
What does this mean for psychology?
Despite some trying to spin this finding as not unexpected or “not as bad as it could’ve been,” this doesn’t bode well for psychological science. Each month, hundreds of new psychology studies are published. What this finding means, in a nutshell, is that most of those studies’ findings are not to be trusted. They are, in effect, false.
From The Atlantic’s coverage:
“The success rate is lower than I would have thought,” says John Ioannidis from Stanford University, whose classic theoretical paper Why Most Published Research Findings are False has been a lightning rod for the reproducibility movement.
“I feel bad to see that some of my predictions have been validated. I wish they’d been proven wrong.”
Another bad finding from the new research is that the effect sizes measured were typically 50 percent smaller than what the original researchers found. That means that even when the results were reproduced by the new researchers, the impact of the variables being studied weren’t nearly as important as originally thought.
Reasons for Poor Reproducibility in Psychology Research
There are a dozen different reasons for this poor showing by psychological research. But before we review some of them, this is a cold splash of reality water in taking results from a single study and generalizing from them. Or even worse, believing something is true when it has yet to be shown true by more than a single study.
If a study isn’t double-blinded — as most of these were not — the researchers’ own biases may subtly influence how the data is collected or analyzed. If a researcher has just spent 8 or 18 months collecting data only to find no significant results, they may go on a data fishing expedition to find some other data relationship they can publish.1 Researchers then change their original hypotheses to fit what the data actually found (since most researchers still do not pre-register their research with a tracking service — although that’s slowly changing).
Others have suggested that perhaps “surprisingness” is another explanation — that journals nowadays focus on publishing surprising findings, since they are more popular and interesting to readers. When you add in the possibility of regression to the mean — that variables may be most extreme when first measured, but less extreme when measured a second or third time — the suggestion is that these two factors combine to encourage publication of studies that are intrinsically hard to reproduce.
What Does it Mean for Psychological Science?
Human nature is infinitely complex. Psychological science attempts to deconstruct human behavior and emotions into small pieces to better understand the whole. However, if research cannot reproduce the science behind the studies, it suggests that much of what the field publishes every year also cannot be trusted.2
However, we also don’t know the reproducibility statistics of most science, since nothing like the Reproducibility Project has ever been attempted before in other fields. It could be that this a flaw suffered by most science, or it could be a flaw that impacts social sciences more than other sciences.
But in the short-run, this emphasizes something I’ve always said — a psychological science finding isn’t something you can hang your hat on until it’s been reproduced by another researcher. Findings that can be reproduced are said to be “robust,” and therefore trusted.
Look for this kind of information when evaluating or reading news articles based upon a new study. While not as sexy as new, “surprising” findings, research that verifies or calls into question what we already think we know are just as important.
For further information…
The Atlantic: How reliable are psychology studies?
Mindhacks: Don’t call it a comeback
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 28.
Reproducibility Project: Psychology – the raw data
- Otherwise all of that time, money and effort were wasted, because few researchers can or want to publish null results. [↩]
- And it really calls into question authors’ broad generalizations made about the applicability of their findings, found in nearly every Discussion section of recent psychology studies. [↩]