The recent news that Dana Carney, the lead author of the original “power pose” study no longer believes in this effect has grabbed headlines. And in January 2016, Slate published an article whose headline trumpeted the claim that the original study was “the latest example of scientific overreach.”
Many people are surprised, and maybe a bit angry, that scientists were wrong. Maybe the millions of people who watched the TED Talk feel a bit foolish because they unnecessarily struck silly poses in the mirror before going on a job interview!
The sky is falling, Chicken Little! How can we trust anything that researchers say?!
There’s just one problem with such a reaction: this saga is a pristine example of science working the way it’s supposed to!
How Do We Know Anything?
I’ll back up a bit. There are many ways of gathering knowledge. One method is to appeal to authority. If an expert says something is true, then it must be true! After all, an expert gets to be an expert by studying hard and learning everything there is to know on a topic. That must mean that they’re usually right…right?!
Another method is to engage in philosophical inquiry, such as reasoning. If something makes sense, and follows established rules of logic, then it’s surely true!
A third method is through non-systematic observation. If you see something happen a few times, then it must always work that way!
As useful as these methods can be in everyday life, there are limitations to each. Appeal to authority is especially useful in cases of great ambiguity (such as deciding whether or not some action is morally correct). But, authorities aren’t always right. Thailand, Malaysia, Indonesia, and the Philippines were never invaded by communist forces — falsifying the “domino theory” that spurred American military involvement in Vietnam. The eventual U.S. withdrawal from Vietnam did not result in a string of communist invasions of other surrounding countries, as some government leaders, like Eisenhower and Johnson, feared. In case you’re curious, Vietnam remains a Communist country in name, though not in economics.
Neighboring Laos also remains Communist, though it was actually a communist country before the U.S. military began fighting in Vietnam.
Philosophical inquiry has long yielded interesting and important insights. But times change, and so do people’s interpretations. Descartes’ idea that we can’t necessarily trust that anything exists outside of our own being (“I think, therefore I am”) was revolutionary at the time of its original publication in 1637. But in 2016, 17 years after the movie The Matrix dramatized and popularized this very notion, it’s not seen as such a profound insight. Philosophical ideas such as Plato’s “ideal forms” have long been abandoned, persisting only in classes on the history of philosophy (and on Wikipedia).
Non-systematic observations can also be very helpful, especially as children. By watching what goes on around us, we learn about the world. But this method can be flawed as well. To use a contemporary example: why does one person think someone is reaching for a gun, while another person who witnesses the same event believes that the person is surrendering? Is one person lying? If so, which one? And if each witness honestly believes that he or she is telling the truth — who is right?
Even combining two or more of these approaches can yield a less-than-satisfactory conclusion. When I was young (around 5 or 6 years old), I believed that trees caused the wind to blow. After all, in my backyard, I’d often see the trees start to move, just before I felt the wind on my skin. This occurred again and again, so it was entirely reasonable to deduce that, since they would start to move before I felt the wind, trees must therefore be the source of wind. So, my youthful conclusion was logical, and was also supported by repeated observations.
It was also wrong. Wind actually comes from the movement of air out of an area of high pressure, and into a lower-pressure area (just in case you didn’t already know that).
Another example comes from doctors, hundreds of years ago, who believed that people got sick because the “four humours” of the body (blood, phlegm, black bile, and yellow bile) were out of balance. Therefore, they would frequently attach leeches to suck some blood out of the sick patient, in order to restore the balance of humours in the body.
When the afflicted individual got better, this was taken as support for their idea: the patient got better because the balance of humours had been restored. When the afflicted individual didn’t get better, this was interpreted as an instance of the individual being so sick that nothing could cure them!
Thankfully, modern medicine knows better. (Leeches! *shudder*)
Refresher: The Scientific Method
The scientific method was devised as a way to systematically observe and document the natural world, in order to determine what is repeatable and what is not. Appeals to authority are therefore not needed, as anybody (if properly trained and equipped) can observe phenomena that others have described. The proper application of the scientific method prevents blunders such as the ones I described in the previous three paragraphs. The best scientific tests involve changing some variable (called an “independent variable”) in order to determine whether a change in the independent variable has an effect on something else that you’re observing (called a “dependent variable”).
A simple example of how the scientific method is supposed to work: you observe that every time you drop your hammer, it takes the same amount of time before it hits your foot (ouch!). So you decide to conduct a test, to see just how fast your hammer falls! So, you decide to drop the hammer from different heights: 1 foot, 2 feet, 3 feet, 4 feet, and 5 feet. “Height” is therefore your independent variable. You then measure the amount of time it takes for the hammer to hit the ground from each of these heights. “Amount of time” is your dependent variable.
If you’ve measured both height and time in a precise manner, you will be able to extrapolate a model of how fast your hammer falls. You can then test this model by calculating how long the hammer takes to fall from 10 feet — and then actually drop it from 10 feet, to see if your model was accurate! Rinse and repeat, until your model accurately predicts how fast your hammer falls from a wide variety of heights.
Congratulations! You’ve just conducted a line of scientific inquiry!
This is easy to say, but harder to do. There will be variation in the amount of time it takes you to press the stop/start button on your stopwatch. Even if you build a machine to measure the amount of time for you, the hammer won’t hit the ground at exactly the same time, even if you drop it from exactly the same height. Other factors, like air resistance, will have tiny impacts on how long your hammer takes to hit the ground.
So, if you aren’t aware of these factors, or if you are unable to adjust your experiment to eliminate them, there will be little bits of variance between different observations of the exact same phenomenon. This variance will cause imperfection in your mathematical model. And if it’s imperfect, other people may decide to completely ignore all your hard work!
If you’re a careful scientist, you will drop the hammer from the same height, multiple times. This will help to even out inconsistencies, such as differences in how long you take to hit the start/stop button on your stopwatch. Now, you can find an average amount of time it takes for the hammer to fall from 1 foot, and the average time it takes the hammer to fall from 2 feet, and the average time it takes the hammer to fall from 3 feet, and so on.
Next, you might wonder if other things fall as fast as your hammer. You can test this with a variety of other objects. You can also test this with more sophisticated methodologies, such as dropping things in a vacuum-sealed chamber (to eliminate any effects of air resistance on different objects). Eventually, you’ll arrive at a model that can be used to accurately predict how fast anything will fall.
Just in case you’re not a physicist, we already know the answer: objects accelerate toward the ground at 9.8 meters per second squared. So now that you know this, you can calculate just how fast your phone was going that time you dropped it and shattered the screen…
Like the hammer, science is a tool — nothing more. The scientific method is only useful for determining what is true in the natural world. If a variable is measurable and quantifiable, it is subject to scientific scrutiny. But, just as the hammer cannot tell us what to build, science cannot tell us what to study — or how to communicate our results.
And that’s where the problem can come in.
Jumping the Gun
It’s natural to be excited about the results of a study. When your research gets published in a prestigious journal, you want people to read your work, and cite it in their own publications. But you don’t just want to tell your peers about what you found: you want to tell everyone!
Not so fast. Remember a few paragraphs ago, when I mentioned that you’re going to find some inconsistencies in your timing when you use your stopwatch? A similar problem afflicts all sorts of different types of research.
In psychological research, this issue can rear its ugly head in the form of measurement error, or in the form of sampling error. Since people can be so different from one another, you may just happen to randomly get a bunch of people who behave in some way that is not typical.
As a researcher, you couldn’t possibly know that you got an unusual sample! Therefore, the researcher arrives at an erroneous conclusion about people in general…but the researcher has no way of knowing that the conclusion is erroneous! This is called Type I error.
This is precisely why, according to proper scientific methodology, researchers should replicate (or re-do) their experiments. Replication ensures that one weird sample doesn’t mislead the psychological community into a false conclusion about how the mind works. Science is therefore supposed to be a self-correcting process — and often, it is.
But there’s a snake in paradise. There is little incentive to re-do someone else’s work, as most journals will only publish unique, original studies. That means that only the first person to find an effect gets rewarded with seeing his or her work in print. Since journal publications are an important metric for academic hiring committees, or for tenure committees, this means that if Joe Researcher fails to beat other people to the punch, that can have a negative impact on his career!
As you can probably imagine, this tends to result in a couple problems. One major issue is the so-called “file-drawer” problem: lots of research gets filed away and forgotten because it doesn’t yield a statistically significant result. The researchers therefore conclude that there is “no effect,” even if the effect is real and they just happened to get a weird sample.
Or, conversely, researchers might study an already-known effect, find no evidence for it, and conclude that they must have messed up somewhere…even if the original, “known” effect isn’t real and the original authors were the ones with the weird sample! Both versions of the “file-drawer” problem can hurt psychologists’ understanding of how the mind works.
Another result of this lack of incentive to replicate is that researchers tend to wade into hyper-specific areas of study. So, there are thousands, if not tens of thousands, of researchers worldwide studying minutiae [though I doubt that they’d put it that way!] related to how the visual system works, or how neurons are organized in a certain part of the brain. Such a strategy ensures that the researcher is contributing new findings and getting published…even if few people actually care about the results!
But very few researchers bother to re-do existing studies…because that won’t help you get tenure!
Some researchers have calculated that there are likely thousands of ‘things we know that just ain’t so’ in the psychological literature! This number could be — and should be! — vastly reduced by simply running some replication studies, as in the Many Labs Project.
In 2015, the Open Science Collaboration brought widespread attention to the replication issue by redoing 100 published studies in psychology. They found that over 50% of those studies failed to replicate; an alarming result (but Gilbert, King, Pettigrew, and Wilson’s reply pointed out some errors in that paper)! Of course, that procedure begs the question of which finding is “real” — the original study, or the single replication attempt?
When a study is replicated, that replication increases our confidence that the effect isn’t just a mirage. And when a study fails to replicate, it doesn’t mean that the original researcher was wrong, or lying, or careless, or stupid — it simply means that one of the groups of people had some unusual characteristics. To determine which result is the “weird” one, we need to conduct further replication attempts.
This is why replication is such a critical piece of the research puzzle: we don’t care if something happens only once; we do care if it happens often!
But, as the field of psychological research stands today, there’s very little incentive to conduct replications — and plenty of incentive not to! This is why I was actually pleased to see the headline that Dr. Carney has recanted her stance on the “power pose.”
Like any good scientist, she changed her mind when the preponderance of the evidence changed. When the “power pose” effect failed to be reproduced—and failed, and failed, and failed again—it became obvious that the initial finding wasn’t due to some psychological process that happens when you stand in certain postures, but was instead due to some other unknown factors.
Researchers are human, too. They often tend to cling to their own ideas, even in the face of strong counterevidence. So, Dr. Carney should be commended — not scorned — for her willingness to change her mind in the face of new evidence. That’s exactly what scientists are supposed to do!
The entertainment industry, however, has no such practices. That’s why movies such as Limitless and Lucy continue to portray the popular — and thoroughly debunked — myth that we only use 10% of our brains. Occasionally, a person’s entire brain is active at once. But it doesn’t result in increased intelligence or superpowers. It’s called a seizure.
A reporting gap occurs between the excitement to publish a surprising new finding and the long, tedious, and unexciting replication process. After all, nobody remembers what Buzz Aldrin said when he was the second man to step on the Moon! We’ll see if Dr. Cuddy’s TED talk is amended or removed in light of this new evidence…I suspect that it won’t be.
It’s best if we avoid the temptation to jump the gun and talk about something as fact before it’s been replicated. But, as long as we only reward the first researchers to “discover” some effect and ignore the important work of replication, we will get fooled again.
This article was revised from its original version on September 25, 2017.