Clustered data also arise as a result of sampling strategies. For instance, while planning large-scale survey data collection, for reasons of cost and efficiency, it is usual to adopt a multistage sampling design. A national population survey, for example, might involve a three-stage design, with regions sampled first, then neighborhoods, and then individuals. A design of this kind generates a three-level hierarchically clustered structure of individuals at level-1, which are nested within neighborhoods at level-2, which in turn are nested in regions at level-3. Individuals living in the same neighborhood can be expected to be more alike than they would be if the sample were truly random. Similar correlation can be expected for neighborhoods within a region.
Much documentation exists on measuring this “design effect” and correcting for it. Indeed, clustered designs (e.g., individuals at level-1, nested in neighborhoods at level-2, nested in regions at level-3) are often a nuisance in traditional analysis. However, individuals, neighborhoods, and regions can be seen as distinct structures that exist in the population that should be measured and modeled.
While the conventional approach to such correlated data structures is to treat the clustering as a nuisance, multilevel models view such hierarchical structures as a feature of the population and one that is of substantive interest. Indeed, “once you know that hierarchies exist, you see them everywhere” (Kreft and de Leeuw, 1998).
Flash is not available on mobile devices. Please view the Flash Description.