Observational Studies

7. Measurement Reliability

What some people call "noise", also called "chance error," does not create systematic error. It makes a measure unreliable. If one measured the same thing repeatedly, but as much as possible in the same manner, the results will likely vary, at least a bit. However, the mean of the measures could be a good approximation of the "true" value. The noise cancels out in a large number of measures. For example, the urine drug tests given to individuals on parole are generally thought to be usefully valid. But the measures have some "wiggle" in them. Even measures for the same person on the same day will likely differ at least a bit from one another. But the average over many tests could be a good approximation of the noise-free value.

A key complication in practice is that for most measures we use, there is only a single measurement, and that measurement is likely to be inaccurate by some (unknown) chance amount that is not cancelled out. Ideally, the variation across units being measured is not being dominated by noise.

One way to get some handle on this is to determine whether the measure varies in sensible ways with other measures to which it should be related.

If the chance components for each measure are approximately independent of one another, this can be a very helpful analysis. For example, city neighborhoods with a lower median household income should have more crime, more young people dropping out of school, and higher infant mortality rates.

This idea sometimes can be exploited more directly to estimate the reliability of a given measurement procedure. For example, it is common to break up some multiple item instrument, such as a measure of depression, into two sets of randomly chosen items. The correlation between the two "parallel" sets of items is an estimate of the reliability of the instrument overall. The higher the correlation, the more reliable the instrument.