A lot of medical and behavioral health research gets disseminated to the public every year. By some accounts, every day brings one or two "exciting, new findings" from medical and health researchers. We've all heard the joke about how one study says alcohol is bad for one system in your body, while a study released a month later says alcohol is good for another system. Contradictory results like this are not surprising, given the hundreds of thousands of researchers who work everyday on their studies. But how do you make sense of it all?
When it comes to medical research, there is no definitive answer, because the human body is so complex. Studies which are most often repeated in the media are not inherently better than research which is published unnoticed. Larger studies are often associated with larger organizations and universities, which often have larger public relations departments. This is unfortunate, since it gives people the impression that every study reported on is important and perhaps even "ground-breaking" (as the media so often state).
There are some warning signs to look out for that signal a study's results may not be all that noteworthy. We've culled a few of our favorites:
Cost/Length of Study
When a news story mentions the cost of a study in the millions of dollars, or that a study lasted for more than a year, watch out. While costly, longitudinal research is important and can have some very significant, worthwhile results, these are not usually important factors to mention when reporting on a study's findings (especially in the first paragraph of two). Just because a group of researchers found a way to spend a million dollars in two years time doesn't, by itself, mean anything.
The sample refers to the people who were a part of the experiment. The smaller the sample size studied, the less generalizable the results of the study. While one or two hundred people is a lot in the behavioral and social sciences, it is virtually nothing for a drug study. Even a study of 200 people doesn't mean a whole lot unless other studies can reproduce those results (see the last point, below). Look out for studies which only examined people from one geographic location (e.g., one city in the U.S.). We already know that people's behaviors and attitudinal characteristics may be different from region to region. Studies that rely on one geographic area may not be very generalizable to other areas. (That's why normative sampling for psychological assessment measures and drug studies are done on a nationwide basis.) Another common sampling problem is that the people were not chosen randomly for study. Any study which selects their subjects from a pool of individuals, or a study which advertises for a person with specific characteristics or needs is not using random sampling techniques. These studies may be biased because of this, and you should be wary of their findings.
Meaningless Effect Sizes
The scientific inquiry process examines data that a study generates and all sorts of statistical analyzes are run. It is quite common in the scientific world to look at the significance of any finding and express that significance in terms of a probability. For instance, a researcher might say, "I can say that with a 95% probability, the data support my research hypothesis." A lower probability means less certainty about the statistical significance of the results. But while statistical significance may be fine for researchers, what about actual significance?
For example, a recent study purportedly found that people who spend more time online become more lonely and depressed. But when the results are examined, we find that people in the study lost, on average, 2.7 people in their average social circle of 66 people (it was not clear whether online friends were included in this count). That's a decline of 4%. That decline is so small that it may be attributable to a host of other factors. Even worse, the changes on measures of loneliness and depression were even less significant. On the loneliness scale, people's scores increased an average of 4/10ths of 1%! On the depression scale, people's scores increased an average of 1%. One percent seems like an awful small number, and one which may be explained by fluctuations in the testing procedure, the test itself, or any a number of other things. While all of these findings were statistically significant, none of them appear all that significant in terms of what non-researchers would care about.
Many researchers, whether they admit it to themselves or not, have profit motives. While "unrestricted" grants are often given by pharmaceutical companies to allow researchers to conduct the necessary drug effectiveness studies, there are still strings attached. Drug company money flows more freely to cooperative researchers than to those who find problems with the effectiveness of a particular drug. Peter Breggin has written extensively about this problem in the past. We are also beginning to see how the U.S.'s FDA is implicated in a sloppy review process of drug approval, which contributes to the overall problem. While there is no reason to discount pharmaceutical-funded research out-of-hand, you should be wary of such research and more skeptical and demanding of its results.
The other factor to watch out for is ego motives. History is littered with the discarded remains of professional reputations which have suffered when it was discovered the researcher faked some of their data or skewed their findings. Researchers are humans and thrive on the attention and idea that what they have discovered will be written down in history books in future years. It can, unfortunately, influence not only their results, but also the way in which they frame their findings. It is this latter problem which causes so much confusion. If a researcher makes the claim that they have found why X behavior or disorder is caused by Y, the news media simply repeat the claim. Very few studies can make claims about causality, especially in the world of mental health, because the relationships are so often complex and hard to disentangle. Be way of any research which is trumpeted as discovering that X causes Y.
Unacceptable Measures or Comparisons
Sometimes researchers determine that they to need measure something, say depression, in a way which is very specific and not measured in typical assessment tools. They have three choices: (1) use an existing, well-accepted measurement tool anyway; (2) adopt a lesser-know measurement tool to the task; or (3) develop their own measurement tool. Option (1) is not usually done, since the researchers have already determined that the tools usually used won't work well in their particular study. Option (2) is often done as a fallback. The problem is that there is a reason some measurement tools aren't often used -- they suffer from problems in the way they were constructed or the way they are scored and analyzed. Yet it is often better than Option (3). Option (3) requires that the researchers conduct what is basically a whole additional study to ensure the measurement tool they have developed is psychometrically sound and reliable. This is an additional, time-consuming process. And because one study does not a conclusion make (see below), it by no means ensures the measurement they are using is as good as they may think it is.
Most people don't have the knowledgebase to determine whether the measurement tools used in the study were appropriate. But if you read a story about some research where the researchers had to develop their own measurements (and the development of such measurements was not the focus of the study), then you have another warning sign about that study.
Researchers can also be just as guilty as comparing apples to oranges as the rest of us. But when they do it, and have it published in a peer-reviewed journal, it seems less odious and more acceptable than when we do it in an argument. It is still wrong. Be on the lookout for it.
Results are Not the Final Word
Whenever a study claims it has discovered a new trend, a new gene, or some new curative treatment for a problem, it usually means little in and of itself. Research works only because it can be reproduced by another set of researchers, using another set of people, at another time. Studies generally mean little if they are the first kid on the block, because conclusions about what the study has found (outside of being of interest to other researchers in that specific field) need to be reproduced. Once the results are reproduced, trends and significance can start to be deduced. All too often, the news media jump on a new study's findings without echoing the concerns of the researchers --that the findings are often very preliminary and may not be generalizable to the whole population. Popular news media simply ignore these caveats, or stick them near the end of their story.
These are by no means the only factors in which to look for when evaluating research, but they are a good starting point. Even a good study may have one or more of these factors present, so these factors are not foolproof. But they are the beginning of a yardstick in which to turn a more critical eye toward the research results you see and hear disseminated in the news on an almost daily basis. Understand that just because a researcher discovered a new gene, new treatments for that disorder are not necessarily months or even years away. Single studies promoting a new drug or treatment method are not as persuasive as an entire set of studies published over a number of years which all draw similar, complimentary conclusions. Ideally, informed consumers look for trends in research, and demand that reporters who write on this kind of research place the new research into a broader, balanced perspective. Only then will all these research findings begin to make some sense.
Feedback and comments on these factors are welcomed! Please drop us a line to tell us what you think, and anything you might suggest we add to our list.
Grohol, J.M. (Sep 1998). Telling the good from the bad: Factors to look for in evaluating research. [Online].