Measurement is an important part of the scientific process. The key aspects concerning the quality of scientific measures are reliability and validity.

Reliability is a measure of the internal consistency and stability of a measuring device.

Validity gives us an indication of whether the measuring device measures what it claims to.

Internal consistency is the degree in which the items or questions on the measure consistently assess the same construct. Each question should be aimed at measuring the same thing. Internal consistency is often measured using Cronbach’s Alpha — a super-correlation of all the items on the scale. If the score is .70 or higher the measurement is acceptable. However, .80 or higher is preferable. It is also important to consider the context when considering the score that reflects internal consistency.

Stability is often measured by test / retest reliability. The same person takes the same test twice and the scores from each test are compared. A high correlation between the two test scores implies the test is reliable. In most circumstances a correlation of at least .70 is considered acceptable. However, this is a general guideline and not a statistical test.

Interrater reliability is another reliability coefficient that is sometimes used in assessing reliability. With interrater reliability different judges or raters (two or more) make observations, record their findings and then compare their observations. If the raters are reliable then the percentage of agreement should be high.

When asking if a measure is valid we are asking if it measures what is supposed to. Validity is a judgment based on collected data, not a statistical test. There are two primary ways to determine validity: existing measures and known group differences.

The existing measures test determines if the new measure correlates with existing relevant valid measures. The new measure should be similar to measures that have been recorded with already-established valid measuring devices.

Known group differences determine whether the new measure distinguishes between known group differences. An illustration of known group differences is seen when different groups are given the same measure, and are expected to score differently. As an example, if you were to give Democrats and Republicans a test assessing the strength of certain political views, you would expect them to score differently. Their views are substantially different on many issues. If these two groups scored differently, as expected, we could say that the measure indicates validity — measurement of what it claims to measure.

When designing new measuring devices it is imperative to consider their reliability and validity. A measure can be reliable and not valid. But a valid measure is always a reliable measure.