# Cluster Unit Randomized Trials

## 7. Sample Size Assessment

Most trials enroll clusters of varying size. It is common in this case to replace m in the previous formula by the mean cluster size , which will lead to a slightly underpowered study. However if previous data are available on the distribution of cluster sizes, a more accurate formula may be applied (Eldridge et al., 2006). Let cv=S_{m}/ denote the coefficient of variation characterizing this distribution, where S_{m} is the standard deviation of the cluster sizes. Then VIF may be replaced in the formula above by

VIF_{A}=1 + [(cv^{2} +1) -1] ρ. This adjustment has been shown to have greatest impact when the number of clusters is small and/or the value of ρ is high (Guittet et al., 2006).

### Example 3

Consider a family randomized trial designed to evaluate the efficacy of a dietary intervention in lowering blood pressure. Data from previous trials performed in a similar population indicate that the intracluster correlation coefficient with respect to diastolic blood pressure may be taken as 0.20, while the mean and standard deviation of the corresponding family size distribution can be reasonably estimated as 2.2 and 0.65, respectively (cv=0.30). Previous experience also indicates that the between-subject standard deviation of diastolic blood pressure is approximately 10.0.

Assuming it is of interest to detect a mean difference of 4mm Hg with 80% power at the two-sided 5% level, the value of VIF_{A} may be obtained as 1+[(0.30^{2}+1)2.2-1]0.2 =1.28 and the number of subjects required in each of two groups by

n= {(1.96+0.84)^{2} 2(102)/4^{2}}{1+[(0.30^{2}+1)2.2 -1])0.2}=(98.75)(1.28)=127 or about 64 families per group.