Stratified Sampling for Estimating a Mean

Background Information

The primary purpose of this sampling design is to estimate the mean for the entire site, i.e., for all strata combined. Preexisting information is used to divide the site into non-overlapping strata that are expected to be more homogeneous internally than for the entire site (all strata combined). Please consult EPA's guidance document, Guidance for the Data Quality Objectives Process (EPA 2006a) to put this test in the context of environmental decision-making. There are two related steps to calculating the number of samples needed to estimate the mean for the site:

Equation Used to Determine Total Number of Samples

The equation depends on the specific method:

Method 1: Minimize Variance of Sample Mean for Fixed Cost

The total number of samples is computed to maximize the precision of the estimated population mean for a pre-specified fixed total cost, \(C-c_0\), of collecting and measuring samples. Note that the calculation is for the total number of samples, i.e., for combined strata, rather than individual strata.

The formula used to calculate the total number of samples is: $$n=\frac{(C-c_0)\displaystyle\sum_{h=1}^{L}\frac{W_hs_h}{\sqrt{c_h}}}{\displaystyle\sum_{h=1}^{L}W_hs_h\sqrt{c_h}}$$

where

\(L\)

is the number of strata, \(h=1,2,...,L\),

\(s_h\)

is the estimated standard deviation of the measured values in stratum \(h\),

\(W_h = N_h/N\)

is the weight associated with stratum \(h\),

\(N_h\)

is the total number of possible sampling locations (units) in stratum \(h\),

\(N\)

is the total number of possible units in all strata combined, \(N = \displaystyle\sum_{h=1}^{L}N_h\),

\(C\)

is the total sampling budget, \(C = c_0 + \displaystyle\sum_{h=1}^{L}c_hn_h\),

\(c_0\)

is the fixed overhead cost,

\(c_h\)

is the cost of collecting and measuring a sample in stratum \(h\), and

\(n_h\)

is the number of samples collected in stratum \(h\).

 

Method 2: Minimize Cost for Required Variance of Sample Mean

The total number of samples is computed to achieve the pre-specified precision of the estimated population mean for specified stratum costs, but no restriction on total costs. Note that the calculation is for the total number of samples, i.e., for combined strata, rather than individual strata.

The formula used to calculate the total number of samples is: $$n = \frac{\Bigg(\displaystyle\sum_{h=1}^{L}W_hs_h\sqrt{c_h}\Bigg)\displaystyle\sum_{h=1}^{L}\frac{W_hs_h}{\sqrt{c_h}}}{V+\frac{1}{N}\displaystyle\sum_{h=1}^{L}W_hs_h^2}$$

where

\(L\)

is the number of strata, \(h=1,2,...,L\),

\(s_h\)

is the estimated standard deviation of the measured values in stratum \(h\),

\(W_h = N_h/N\)

is the weight associated with stratum \(h\),

\(N_h\)

is the total number of possible sampling locations (units) in stratum \(h\),

\(N\)

is the total number of possible units in all strata combined, \(N = \displaystyle\sum_{h=1}^{L}N_h\),

\(V\)

is the pre-specified variance or precision, and

\(c_h\)

is the cost of collecting and measuring a sample in stratum \(h\).

 

Method 3: Predetermined Number

The user supplies the total number. VSP provides no assurance that this user-supplied number is adequate for any particular design goal.

Equation Used to Determine Allocation of Samples to Strata

The total number of samples is allocated to the individual strata on an optimal basis using the formula: $$n_h = n\frac{N_h\sigma_h/\sqrt{c_h}}{\displaystyle\sum_{h=1}^{L}N_h\sigma_h/\sqrt{c_h}}$$

where

\(n_h\)

is the number of samples allocated to stratum \(h\),

\(L\)

is the number of strata,

\(N_h\)

is the total number of units in stratum \(h\),

\(s_h\)

is the estimated population standard deviation for stratum \(h\),

\(c_h\)

is the cost per population unit in stratum \(h\),

\(n\)

is the total number of units sampled in all strata, \(n = \displaystyle\sum_{h=1}^{L}n_h\) .

 

Statistical Assumptions

The assumptions associated with the formulas for computing the number of samples are:

1. The estimated stratum standard deviations, \(s_h\), are reasonable and representative of the stratum populations being sampled.

2. The sampling locations are selected using simple random sampling.

3. The stratum costs, \(c_h\), and the fixed cost \(c_0\), are accurate.

For an illustration on stratified sampling, please refer to Stratified Sampling in chapter 3 of the VSP User’s Guide.

Reference:

Gilbert, RO. 1987. Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York. (Same text available as Gilbert, RO. 1997. John Wiley & Sons, New York.)

T he Stratified dialog contains the following controls:

Total Number of Samples Method

Minimize Variance of Sample Mean for Fixed Cost Method:

Specify Total Budget

Minimize Cost for Required Variance of Sample Mean Method:

Specify Required Variance

Predetermined Number Method:

Specify Total Number of Samples

Number of Strata