Objective Measurement of Subjective Phenomena

7. Reliability

Standards for reliability

Standards for acceptable levels of reliability vary across experts, but some general guidelines can be provided (Clark & Watson, 1995; Nunnally & Bernstein, 1994):

Table 1

General Reliability Standards
.90 or higher Excellent
.80 to .90 Strong
.70 to .80 Acceptable
.60 to .70 Weak or Poor
below .60 Unacceptable

The preceding standards for evaluating reliability are reasonable for multi-item scales that contain 8-10 items or more, particularly if the rating for each item is on a Likert scale with a 1-4 or 1-5 or larger scale.

Scales with few items or with more restricted rating scales (e.g., dichotomous scales) will likely have fewer points of discrimination among participants and this may lead to lower levels of reliability (e.g., reliability between .50 and .60). Such scales may still be usable for research purposes, but the “proof is in the pudding” (i.e., usability of such scales will depend on whether they are strongly related to other measures as hypothesized.

In addition, measuring devices to be used in high-stakes decision-making, such as deciding whether a person has mental retardation, should have very high levels of reliability, preferably above .95. Traditional individually administered tests of intelligence tend to attain this level of reliability.

Clark, L. A., & Watson, D. (1995). Constructing validity: Basic issues in objective scale development. Psychological Assessment, 7, 309-319.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill.