Confidence Interval on a Mean Design

Background Information

The purpose of the Confidence Interval on a Mean option is to calculate the minimum number of samples required to estimate the mean within a prespecified margin of error at a given confidence level. This option assumes the data will be drawn from an approximately normal distribution or that the number of samples is large enough so that the distribution of sample means is approximately normal.

Example

The following illustration is based on Example 4.4 in Statistical Methods for Environmental Pollution Monitoring (Gilbert 1987, p. 32).

Suppose we wish to estimate the true mean concentration of a pollutant in stream sediments by calculating a 90% two-sided confidence interval around the sample mean. Assume the estimated standard deviation is 50 mg/L and the desired half-width of the confidence interval is 20 mg/L. How many samples should we take?

Entering the above values into VSP shows the recommended minimum number of samples to be 19. As illustrated in Gilbert (p. 32), this value must be found by iteration since the number of samples appears on both sides of the sample-size equation. VSP does the iteration and reports the final value.

For an extended discussion of confidence intervals, see Statistical Intervals A Guide for Practitioners (Hahn and Meeker 1991). This reference also clarifies the difference between confidence, prediction, and tolerance intervals.

Equations Used to Calculate Recommended Minimum Number of Samples

One-Sided Confidence Interval

The number of samples is calculated using Eq. (1) (Gilbert 1987, p. 32) when the MQO option is not selected. The number of samples is calculated using Equation (2) when the MQO option is selected (Gilbert et al. 2001, p. 3-10).

\begin{equation} n = \left[ \frac{t_{1- \alpha ,df} S_{total}}{d} \right]^2 \end{equation}

\begin{equation} n = \left( \frac{t_{1- \alpha ,df}}{d} \right)^2 \left(s_{sample}^2 + \frac{s_{analytical}^2}{r} \right) \end{equation}

where:

\(n\)

is the recommended minimum sample size for the study area.

\( t_{1- \alpha ,df}\)

is the value of the Student's t- distribution with \(df\) degrees of freedom. By definition, the proportion of the distribution to the left of the value \( t_{1- \alpha ,df}\) is \( 1 - \alpha \).

\( s_{total} \)

is the estimated standard deviation due to both sampling and analytical variability.

\( d\)

is the width of the confidence interval.

 

MQO Specific:

\( s_{sample} \)

is the standard deviation due to the inherent variability in the sampling process when analytical error is zero.

\( s_{analytical}\)

is the standard deviation due to the inherent variability in the analysis process alone.

\(r\)

is the number of times an individual sample is analyzed.

 

Confidence Interval Calculation

Once data is collected, the total variance is estimated by the sample standard deviations (\(s\)). Equation (3) is used to compute an upper one-sided confidence limit, and Equation (4) to compute the corresponding lower one-sided limit (Gilbert 1987, p. 139).

\begin{equation} UL_{1- \alpha} = \bar x + t_{1- \alpha ,n-1} \frac{s}{\sqrt{n}} \end{equation}

\begin{equation} LL_{1- \alpha} = \bar x - t_{1- \alpha ,n-1} \frac{s}{\sqrt{n}} \end{equation}

Two-Sided Confidence Interval

The number of samples is calculated using Eq. (5) (Gilbert 1987, p. 32) when the MQO option is not selected. The number of samples is calculated using Equation (6) when the MQO option is selected (Gilbert et al. 2001, p. 3-10).

\begin{equation} n = \left[ \frac{t_{1- \alpha /2,df} s_{total}}{d} \right]^2 \end{equation}

\begin{equation} n = \left( \frac{t_{1- \alpha /2,df}}{d} \right)^2 \left(s_{sample}^2 + \frac{s_{analytical}^2}{r} \right) \end{equation}

where:

\(n\)

is the recommended minimum sample size for the study area.

\( t_{1- \alpha /2 ,df}\)

is the value of the Student's t- distribution with \(df\) degrees of freedom. By definition, the proportion of the distribution to the left of the value \( t_{1- \alpha /2 ,df}\) is \( 1 - \alpha /2 \).

\( s_{total} \)

is the estimated standard deviation due to both sampling and analytical variability.

\( d\)

is the width of the confidence interval.

 

MQO Specific:

\( s_{sample} \)

is the standard deviation due to the inherent variability in the sampling process when analytical error is zero.

\( s_{analytical}\)

is the standard deviation due to the inherent variability in the analysis process alone.

\(r\)

is the number of times an individual sample is analyzed.

 

Confidence Interval Calculation:

Once data is collected, the total variance is estimated by the sample standard deviations (\(s\)). Equation (7) is used to compute an upper one-sided confidence limit, and Equation (8) to compute the corresponding lower one-sided limit (Gilbert 1987, p. 139).

\begin{equation} UL_{1- \alpha} = \bar x + t_{1- \alpha /2,n-1} \frac{s}{\sqrt{n}} \end{equation}

\begin{equation} LL_{1- \alpha} = \bar x - t_{1- \alpha /2,n-1} \frac{s}{\sqrt{n}} \end{equation}

Statistical Assumptions

The assumptions associated with the formulas for computing the number of samples are:

  1. The sample mean is normally distributed

  2. The population values are not spatially or temporally correlated

  3. The sampling locations will be selected randomly

The first two assumptions will be assessed in a post data collection analysis. The last assumption is valid because the gridded sample locations were selected based on a random process.

References:

Gilbert, RO. 1987. Statistical methods for environmental pollution monitoring. Van Nostrand Reinhold, New York. (Same text available as Gilbert, RO. 1997. John Wiley & Sons, New York.)

Gilbert, RO, JR Davidson, JE Wilson, BA Pulsipher. 2001. Visual Sample Plan (VSP) models and code verification. PNNL-13450, Pacific Northwest National Laboratory, Richland, Washington.

Hahn, GJ. and WQ Meeker. 1991. Statistical intervals a guide for practitioners. John Wiley & Sons, New York.

The Confidence Interval on a Mean dialog contains the following controls:

For a One-Sided Confidence Interval:

Largest acceptable difference between true mean and sample mean (Width of C.I.)

For a Two-Sided Confidence Interval:

Largest acceptable difference between true mean and sample mean (1/2 Width of C.I.)

Confidence Level

For Non- Measurement Quality Objectives:

Estimated Standard Deviation

For Measurement Quality Objectives:

Estimated Sampling Standard Deviation

Estimated Analytical Standard Deviation

Analyses per Sample

MQO Button

Sample Placement page

Cost page

Data Analysis page

Data Entry sub-page

Summary Statistics sub-page

Tests sub-page

Plots sub-page

Analyte page