Multilevel Modeling

7. Levels and Variables

The higher levels that were discussed in the previous section (e.g., neighborhoods) can be considered as variables in a regression equation with an indicator variable specified for each neighborhood.

Put differently, why are variables such as gender, ethnicity/race, or social class not a level?

Neighborhoods are treated as a level because they are a population of units from which we have observed a random sample. This enables us to draw generalizations for a particular level (e.g., neighborhoods) based on an observed sample of neighborhoods. On the other hand, gender, for instance, is not a level because it is not a sample out of all possible gender categories. Rather, it is an attribute of individuals. Thus, male or female in our gender example are ‘fixed’ discrete categories of a variable with the specific categories only contributing to their respective means. They are not a random sample of gender categories from a population of possible gender groupings.

The situation becomes less clear when the study includes all individuals in the population, and hence also includes all neighborhoods, ethnic/race, gender, and social class groups. Such a study design arises when census data is linked to mortality data, e.g. (Blakely, Salmond et al., 2000). Why might we still consider neighborhoods here as levels, but not ethnicity/race? First, it is more efficient to model neighborhoods as a random variable given the (likely) large number of neighborhoods. Second, we would usually wish to ascribe a fixed effect to each ethnic group, but not each neighborhood. Rather, we wish to model an ecologic attribute such as social capital at the neighborhood-level.

It is possible to consider ‘levels’ as ‘variables.’ Thus, when neighborhoods are considered as a variable, they are typically reflective of a fixed classification. While this may be useful in certain circumstances, doing so robs the researcher of the ability to generalize to all neighborhoods (or ‘population’ of schools) and inferences are only possible for the specific neighborhoods observed in the sample.