Correlation does not necessarily imply causation, as you know if you read scientific research. Two variables may be associated without having a causal relationship. However, just because a correlation has limited value as a causative inference doesn’t mean that correlation studies are not important to science. The idea that correlation does not necessarily imply causation has led many to de-value correlation studies. However, used appropriately, correlation studies are important to science.

Why are correlation studies important? Stanovich (2007) points out the following:

“First, many scientific hypotheses are stated in terms of correlation or lack of correlation, so that such studies are directly relevant to these hypotheses…”

“Second, although correlation does not imply causation, causation does imply correlation. That is, although a correlational study cannot definitely prove a causal hypothesis, it may rule one out.

Third, correlational studies are more useful than they may seem, because some of the recently developed complex correlational designs allow for some very limited causal inferences.

…some variables simply cannot be manipulated for ethical reasons (for instance, human malnutrition or physical disabilities). Other variables, such as birth order, sex, and age are inherently correlational because they cannot be manipulated, and, therefore, the scientific knowledge concerning them must be based on correlation evidence.”

Once correlation is known it can be used to make predictions. When we know a score on one measure we can make a more accurate prediction of another measure that is highly related to it. The stronger the relationship between/among variables the more accurate the prediction.

When practical, evidence from correlation studies can lead to testing that evidence under controlled experimental conditions.

While it is true that correlation does not necessarily imply causation, causation does imply correlation. Correlational studies are a stepping-stone to the more powerful experimental method, and with the use of complex correlational designs (path analysis and cross-lagged panel designs), allow for very limited causal inferences.


There are two major problems when attempting to infer causation from a simple correlation:

  1. directionality problem- before concluding that a correlation between variable 1 and 2 is due to changes in 1 causing changes in 2, it is important to realize the direction of causation may be the opposite, thus, from 2 to 1
  2. third-variable problem- the correlation in variables may occur because both variables are related to a third variable

Complex correlational statistics such as path analysis, multiple regression and partial correlation “allow the correlation between two variables to be recalculated after the influence of other variables is removed, or ‘factored out” or ‘partialed out’” (Stanovich, 2007, p. 77). Even when using complex correlational designs it is important that researchers make limited causation claims.

Researchers who use a path analysis approach are always very careful not to frame their models in terms of causal statements. Can you figure out why? We hope you reasoned that the internal validity of a path analysis is low because it is based on correlational data. The direction from cause to effect cannot be established with certainty, and “third variables” can never be ruled out completely. Nevertheless, causal models can be extremely useful for generating hypotheses for future research and for predicting potential causal sequences in instances where experimentation is not feasible (Myers & Hansen, 2002, p.100).

Conditions Necessary to Infer Causation (Kenny, 1979):

Time precedence: For 1 to cause 2, 1 must precede 2. The cause must precede the effect.

Relationship: The variables must correlate. To determine the relationship of two variables, it must be determined whether the relationship could occur due to chance. Lay observers are often not good judges of the presence of relationships, thus, statistical methods are used to measure and test the existence and strength of relationships.

Nonspuriousness (spuriousness meaning ‘not genuine’): “The third and final condition for a causal relationship is nonspuriousness (Suppes, 1970). For a relationship between X and Y to be nonspurious, there must not be a Z that causes both X and Y such that the relationship between X and Y vanishes once Z is controlled” (Kenny, 1979. pp. 4-5).