Want to be a better consumer of social science research? Here’s a short crib sheet for determining the general legitimacy and generalizability of virtually any social science study. Keep in mind that this crib sheet is not going to be 100% accurate or relevant to apply to every study you might be reading about. But it’s a good short-hand guide to help get you started.

What kind of research was it?

The most robust, best studies employ an experimental group and a control group. Studies that leave out the control group are usually less useful than those that do. A survey is the least powerful type of research one can conduct, as it has no experimental or control group, but can be helpful for identifying trends or zero in on concepts or hypotheses that can be studied more in-depth.

How big was the study?

A study of less than 50 people in virtually any experimental design is going to have very, very limited generalizability (because they nearly always lack sufficient statistical power). This means that while the results may be potentially interesting, until they are replicated in another group (and preferably, a larger group), you should take them with a grain of salt. (Some research, like single-case experimental designs, can also provide single data points of interest or future research, but generally can tell us little about broader trends or treatments.)

Who was in the study?

Good research seeks to use participants that are representative of the population in general. The more representative the sample, the more one can readily generalize from the results. So a study of 200 participants that is balanced for gender, race, socio-economic status, and history is far better than a study of 200 college students at Harvard or OSU.

How long were people studied for?

A study that examines participants for less than 12 weeks for any type of treatment is virtually useless. No clinician or doctor that I know has ever had any typical, mainstream type of treatment that worked in less than 12 weeks’ time. A survey that surveys a group of people at one moment of time means the results found are good for that specific moment in time.

There are good and reasonable exceptions to this rule, for the treatment of anxiety (medications are often taken as needed, not every day), and for things like acute psychosis or mania. Studies examining these specific concerns can be for shorter lengths of time and still provide valuable information.

Indeed, any study that is shorter (such as a 4 week or 8 week study) provides us some information. It’s just that that information is a snapshot of the typical treatment regiment, and doesn’t give us as full a picture as a longer treatment study. Study length is less of a concern for any study that is not specifically examining a treatment for a mental disorder.

Who funded the study?

Generally, most studies that are government funded will exhibit less bias than those funded by a company (such as a pharmaceutical company) with a direct interest in achieving a specific result. Virtually all studies are conducted within a university or hospital setting, however, so funding information may not be readily available (the researchers’ affiliations usually provide little information about how the study was funded). Government funding doesn’t mean a study can’t be badly designed or implemented, it just means that you don’t have to worry about “funding bias” influencing the results.

How do the authors talk about their results?

Authors should be humble and cautious about their results and not making overly-broad generalizations or summary conclusions (especially about causation if causation was not designed into the study, as it usually is not). Authors should also clearly describe the limitations of the current study in any journal article; articles that leave out such information should be viewed skeptically, as every study has limitations.

Authors should also clearly note the different between clinical and statistical significance in treatment studies. A 2 or 3 point change in a scale measuring depression might be statistically significant (resulting in a “positive” result), but have little clinical significance for most participants. (See this article or this article for examples of this.) While it’s informative to know that an experimental group is statistically different (e.g., better than chance) than a control group, that difference may not have real-world meaning to most of us.

Beware, too, of studies that rely completely on clinician-rated measures or scales without any patient-rated scales. Who better to tell you a treatment is working than the patient themselves?

* * *

Thanks to CL Psych for reviewing an earlier draft of this article.