|Natural Resource Sampling|
When individuals in a population regularly or naturally combine to form groups or close geographical clusters, significant savings in data gathering costs can be had by the use of cluster selection. In cluster selection, we select at random a subset of clusters to represent the whole population, with every individual in the selected clusters being measured.
For example, suppose we were interested in estimating the average biomass of pond cypress in the Ocala National Forest of North Central Florida. Using aerial photography, we identify each of the cypress domes (clusters) in the study region. A random sample of domes is selected and measurement teams are sent to each selecte dome to measure biomass of each tree in the dome. Using these data, an overall biomass estimate is computed along with associated confidence intervals.
In cluster sampling, a simple random sample of clusters is taken, all individuals in the selected cluster being included in the sample. If only a sample of individuals is taken from each of the selected clusters, the sampling method is known as two-stage selection. Often a hierarchy of clusters is used: First some large clusters are selected, next some smaller clusters are drawn from elements within the selected large clusters; and so on until finally individuals are selected within the final-stage cluster. This general method is known as multi-stage selection.
At first, one might think that cluster selection is just another form of stratified selection. There are major differences between strata and clusters.
Although strata and clusters are both groupings of units, they serve entirely different sampling purposes. Since strata are all represented in the sample, it is advantageous if they are internally homogeneous in the variables of interest. On the other hand, with only a sample of clusters being examined, the ones selected need to represent the ones not selected. This is best done when the clusters are internally heterogeneous in the survey variables as possible.
Whereas stratified selection always leads to more precise estimates of the population mean, cluster selection, except in special circumstances, leads to a loss in precision compared with a simple random selection. Unless the economy in measurement and data collection created by cluster selection permits a sufficient increase in sampling size to offset the associated loss of precision, cluster selection will be inappropriate.
Systematic selection can be viewed as a type of cluster selection. For example, in systematic selection from a list, given a value of the starting count, k and intersample count, K, we have a sample consisting of the , , , etc. units. The K possible samples define K different clusters. In systematic sampling we use only one of the possible K clusters. With only one cluster, it is impossible to obtain an estimate of variance. This is why we say that there is no acceptable estimate for variance of the parameter estimate using only one starting point. We need at least two starting points to be able to get a true variance estimate.
In addition, systematic and cluster selection share the following properties.
When clusters are not all of the same size, there are a number of techniques available which take cluster size distribution into account in the final estimates. We may be able to initially stratify clusters by size and hence reduce the variability in cluster size. Another approach is to select clusters with probability proportional to size (PPS). With PPS selection we can have a cluster selection design which produces parameter estimates having properties very similar to what is obtained from cluster selection with equal cluster sizes. Since, in many cases, the true number of individuals in each cluster is not known, selection may need to be preformed using probability proportional to estimated size.
Assuming primary units are selected at random, define
Compute the sum of all measurements in the i-th cluster as:
The mean for the cluster is:
The estimate of the average cluster unit mean is:
The estimated total (amount) is:
The sample variance of the cluster unit totals is:
From this, the sample variance of the total estimate is:
The variance of the overall mean estimator will depend on the variability of the individual cluster sizes. See the section on estimation in multi-stage selection for information on how to compute this variance.
If cluster sizes vary considerably, a ratio estimator may be used. This estimator is similar to that discussed for unequal length strip quadrats. Define the ratio, r as:
Then the estimate of the population total is given by:
where M is the total number of individual units in the population. Often it may be difficult to get the exact value of M which limits the usefulness of this estimator. If the cluster total ( ) is highly correlated with the cluster size ( ), the ratio estimator, is a more efficient estimator than is, defined in the previous section.
Note that is a biased estimate, but the bias usually decreases as the sample size, n, increases.
Variance estimates and confidence intervals can be obtained as for variable strip sampling.
Cluster sampling may also be used to estimate a population proportion. In this case, the cluster total, , measures the number of individuals in the cluster having the characteristic of interest. The overall population proportion is estimated by:
with associated variance:
where is the average cluster size for the population.
For example, suppose we were to examine planting beds in a pine tree nursery for evidence of fusiform rust. There are N = 415 beds of which we choose n = 25 to sample. We observe plants in the 25 beds, of which are found to be infected with disease. The overall proportion estimate is . If and , then the variance of the estimate is and a 95% confidence interval for the proportion is:
Copyright ©,1997 L. C. Arvanitis and K. M. Portier, University of Florida