A few weeks ago, Jonah Lehrer wrote a somewhat dumbed-down and sensationalistic article for The New Yorker entitled, The Truth Wears Off: Is there something wrong with the scientific method? In it, Lehrer cites anecdotal evidence (and a little data) to support the proposition that perhaps the scientific method — how we scientifically validate our hypotheses with data and statistics — has gone horribly awry.
But what Lehrer failed to note is that most researchers already know about the flaws he describes, and diligently work toward minimizing the impact of those issues.
The scientific method isn’t broken. What Lehrer is describing is simply science at work — and working.
The best response to this essay comes from ScienceBlogs writer PZ Myers, Science is not dead. In this rebuttal, Myers points out the primary problems with science when it can’t replicate prior findings:
- Regression to the mean: As the number of data points increases, we expect the average values to regress to the true mean…and since often the initial work is done on the basis of promising early results, we expect more data to even out a fortuitously significant early outcome.
- The file drawer effect: Results that are not significant are hard to publish, and end up stashed away in a cabinet. However, as a result becomes established, contrary results become more interesting and publishable.
- Investigator bias: It’s difficult to maintain scientific dispassion. We’d all love to see our hypotheses validated, so we tend to consciously or unconsciously select results that favor our views.
- Commercial bias: Drug companies want to make money. They can make money off a placebo if there is some statistical support for it; there is certainly a bias towards exploiting statistical outliers for profit.
- Population variance: Success in a well-defined subset of the population may lead to a bit of creep: if the drug helps this group with well-defined symptoms, maybe we should try it on this other group with marginal symptoms. And it doesn’t… but those numbers will still be used in estimating its overall efficacy.
- Simple chance: This is a hard one to get across to people, I’ve found. But if something is significant at the p=0.05 level, that still means that 1 in 20 experiments with a completely useless drug will still exhibit a significant effect.
- Statistical fishing: I hate this one, and I see it all the time. The planned experiment revealed no significant results, so the data is pored over and any significant correlation is seized upon and published as if it was intended. See previous explanation. If the data set is complex enough, you’ll always find a correlation somewhere, purely by chance.
Number 1 explains a lot of the problems we find in science today, especially psychological science. You know most of those experiments you read about in Psychological Science, the flagship publication of the Association for Psychological Science? They’re crap. They are N = 20 experiments conducted on small, homogeneous samples of mostly Caucasian college students at midwestern universities. Most of them are never replicated, and fewer still are replicated on sample sizes that would likely demonstrate that the original results were nothing more than a statistical fluke.
Researchers know this already, but live by a very different rulebook than you or I. Their livelihood depends upon their continuation of doing good, publishable research. If they stop doing this research (or can’t get it published in a peer reviewed journal), they’re at greater risk for losing their jobs. It’s known as “publish or perish” in academia, and it’s a very real motivation for publishing any research, even if you know the results are likely not to be replicable. See Number 3 above.
Finally, I see so much of Number 7 in research studies I review, it’s almost embarrassing. The scientific method only works well and reliably when you formulate hypotheses beforehand, run your subjects to collect your data, and then analyze that data according to the hypotheses you started with. If you decide to start changing the hypothesis to fit the data, or run statistical tests you hadn’t counted on, you’re tainting your findings. You start on a fishing expedition that every researcher has done. But just because everyone’s done it means it’s a good or ethical behavior to engage in.
The problem is that research is time consuming and often expensive. If you just ran 100 subjects through a trial and found nothing of significance (according to your hypotheses), not only are you unlikely to get that study published, but you just wasted months (or even years) of your professional life and $X from your always-limited research budget.
If you can’t see how this might result in less-than-optimum research findings being published, then you may be a bit blind to basic human psychology and motivation. Because researchers are not super-people — they have the same faults, biases, and motivations as anyone else. The scientific method — when followed rigorously — is supposed to account for that. The problem is, nobody is really watching over researchers to ensure they do follow it, and there’s no inherent incentive to do so.
I’ll end with this observation, again from PZ Myers,
That’s all this fuss is really saying [– s]ometimes hypotheses are shown to be wrong, and sometimes if the support for the hypothesis is built on weak evidence or a highly derived interpretation of a complex data set, it may take a long time for the correct answer to emerge. So? This is not a failure of science, unless you’re somehow expecting instant gratification on everything, or confirmation of every cherished idea.
Other’s Opinions on Lehrer’s Essay
Science is not dead – PZ Myers
In praise of scientific error – George Musser
Are humans the problem with the scientific method? – Charlie Petit
The Mysterious Decline Effect – Jonah Lehrer