Stratified Sampling for Estimating a Proportion

Background Information

The primary purpose of this sampling design is to estimate the proportion for the entire site, i.e., for all strata combined. Preexisting information is used to divide the site into non-overlapping strata that are expected to be more homogeneous internally than for the entire site (all strata combined). There are two related steps to calculating the number of samples needed to estimate the proportion for the site:

Equation Used to Determine Total Number of Samples

The equation depends on the specific method:

Method 1: Minimize Variance of Sample Proportion for Fixed Cost

The total number of samples is computed to maximize the precision of the estimated population proportion for a pre-specified fixed total cost, \(C-c_0\), of collecting and measuring samples. Note that the calculation is for the total number of samples, i.e., for combined strata, rather than individual strata.

The formula used to calculate the total number of samples is: $$n = \frac{(C-c_0)\displaystyle\sum_{h=1}^{L}\frac{W_h\sqrt{P_h(1-P_h)}}{\sqrt{c_h}}}{\displaystyle\sum_{h=1}^{L}W_h\sqrt{P_h(1-P_h)}\sqrt{c_h}}$$

where

\(L\)

is the number of strata, \(h=1,2,...,L\),

\(P_h\)

is the estimated proportion of measurements in stratum \(h\),

\(W_h = N_h/N\)

is the weight associated with stratum \(h\),

\(N_h\)

is the total number of possible sampling locations (units) in stratum \(h\),

\(N\)

is the total number of possible units in all strata combined, \(N = \displaystyle\sum_{h=1}^{L}N_h\),

\(C\)

is the total sampling budget, \(C = c_0 + \displaystyle\sum_{h=1}^{L}c_hn_h\),

\(c_0\)

is the fixed overhead cost,

\(c_h\)

is the cost of collecting and measuring a sample in stratum \(h\), and

\(n_h\)

is the number of samples collected in stratum \(h\).

 

Method 2: Minimize Cost for Required Variance of Sample Proportion

The total number of samples is computed to achieve the pre-specified precision of the estimated population proportion for specified stratum costs, but no restriction on total costs. Note that the calculation is for the total number of samples, i.e., for combined strata, rather than individual strata.

The formula used to calculate the total number of samples is: $$n = \frac{\Bigg(\displaystyle\sum_{h=1}^{L}W_h\sqrt{P_h(1-P_h)}\sqrt{c_h}\Bigg)\displaystyle\sum_{h=1}^{L}\frac{W_h\sqrt{P_h(1-P_h)}}{\sqrt{c_h}}}{V+\frac{1}{N}\displaystyle\sum_{h=1}^{L}W_hP_h(1-P_h)}$$

where

\(L\)

is the number of strata, \(h=1,2,...,L\),

\(P_h\)

is the estimated proportion of measurements in stratum \(h\),

\(W_h = N_h/N\)

is the weight associated with stratum \(h\),

\(N_h\)

is the total number of possible sampling locations (units) in stratum \(h\),

\(N\)

is the total number of possible units in all strata combined, \(N = \displaystyle\sum_{h=1}^{L}N_h\),

\(V\)

is the pre-specified variance or precision, and

\(c_h\)

is the cost of collecting and measuring a sample in stratum \(h\).

 

Method 3: Predetermined Number

The user supplies the total number. VSP provides no assurance that this user-supplied number is adequate for any particular design goal.

Equation Used to Determine Allocation of Samples to Strata

The total number of samples is allocated to the individual strata on an optimal basis using the formula: $$n_h = n\frac{N_h\sqrt{P_h(1-P_h)}/\sqrt{c_h}}{\displaystyle\sum_{h=1}^{L}N_h\sqrt{P_h(1-P_h)}/\sqrt{c_h}}$$

where

\(n_h\)

is the number of samples allocated to stratum \(h\),

\(L\)

is the number of strata,

\(N_h\)

is the total number of units in stratum \(h\),

\(P_h\)

is the proportion of measurements in stratum \(h\),

\(c_h\)

is the cost per population unit in stratum \(h\),

\(n\)

is the total number of units sampled in all strata, \(n = \displaystyle\sum_{h=1}^{L}n_h\).

 

Statistical Assumptions

The assumptions associated with the formulas for computing the number of samples are:

1. The estimated stratum standard deviation, \(P_h\), are reasonable and representative of the stratum populations being sampled.

2. The sampling locations are selected using simple random sampling.

3. The stratum costs, \(C_h\), and the fixed costs \(C_0\), are accurate.

The first and third assumptions will be assessed in a post data collection analysis. The second assumption is valid because simple random sampling is used.

Reference:

Cochran, W.G. 1977. Sampling Techniques, 3rd edition. John Wiley & Sons, New York.

The Stratified dialog contains the following controls:

Total Number of Samples Method

Minimize Variance of Sample Proportion for Fixed Cost Method:

Specify Total Budget

Minimize Cost for Required Variance of Sample Proportion Method:

Specify Required Variance

Predetermined Number Method:

Specify Total Number of Samples

Number of Strata