Two-Sample t-Test Design

Background Information

The two-sample t-test is a standard statistical test that can be used to test if the true survey-unit mean exceeds the true reference-area mean. The difference of means is compared to a specified difference, i.e., Action Level, which could be zero. Please consult EPA's guidance document, Guidance for the Data Quality Objectives Process (EPA 2006a) to put this test in the context of environmental decision-making.

Before deciding to develop a sampling plan based on using the two-sample t-test, consider the assumptions and limitations involved. For example, this test assumes equality of variances, i.e., the standard deviations of the two populations are approximately equal. For a discussion of these assumptions, limitations, and for the details of the test, please consult EPA's Data Quality Assessment: Statistical Methods for Practitioners (EPA 2006, pp. 66-71). This document, as well as the DQO guidance document, is currently available at: http://www.epa.gov/quality/qa_docs.html.

Equations Used to Calculate Recommended Minimum Number of Samples

The number of samples is calculated using Eq. (1) below (as used in EPA 2006, p. 67) when the MQO option is not selected. The number of samples is calculated using Equation (2) below when the MQO option is selected (Gilbert et al. 2001, p. 3-11).

\begin{equation} m = n = \frac{2s_{Total}^2 (Z_{1- \alpha } + Z_{1- \beta })^2}{\Delta^2} +0.25Z_{1- \alpha}^2 \end{equation}

\begin{equation} m = n = \frac {2 \left ( s_{Sample}^2 + \frac{s_{Analytical}^2}{r} \right) (Z_{1-\alpha} + Z_{1- \beta} )^2}{\Delta^2} + 0.25 Z_{1- \alpha }^2 \end{equation}

where:

\( n, \, m \)	is the recommended minimum sample size for the survey unit and the reference area, and common to both survey unit and reference area
\( S_{Total} \)	is the estimated standard deviation due to both sampling and analytical variability and assumed common to both survey unit and reference area
\( Z_{1- \alpha} \)	is the value of the standard normal distribution for which the proportion of the distribution to the left of \( Z_{1- \alpha} \) is \( 1 - \alpha \)
\( Z_{1- \beta} \)	is the value of the standard normal distribution for which the proportion of the distribution to the left of \( Z_{1- \beta} \) is \( 1 - \beta \)
\( \Delta \)	is the width of the gray region
\( \alpha \)	is the probability of rejecting the null hypothesis when the null hypothesis is true.
\( \beta \)	is the probability of not rejecting the null hypothesis when the null hypothesis is false.
MQO Specific:
\( S_{Sample} \)	is the standard deviation (assumed equivalent for both survey unit and reference area) due to the inherent variability in the sampling process when analytical error is zero
\( S_{Analytical} \)	is the standard deviation due to the inherent variability in the analysis process alone
\( r \)	is the number of times an individual sample is analyzed

Statistical Assumptions

The assumptions associated with the formulas for computing the number of samples are:

1. The data from each area (site and reference area) originate from normal populations.

2. The variance of the site and reference populations are equal.

3. The variance estimate, \( s^2 \), is reasonable and representative of the populations being sampled.

4. The population values are not spatially or temporarily correlated.

5. The sampling locations will be selected randomly.

The first four assumptions will be assessed in a post data collection analysis. The last assumption is valid because the gridded sample locations were selected using a random process.

References:

EPA. 2006a. Guidance on Systematic Planning Using the Data Quality Objectives Process. EPA QA/G-4, EPA/240/B-06/001, U.S. Environmental Protection Agency, Office of Environmental Information, Washington DC.

EPA. 2006. Data Quality Assessment: Statistical Methods for Practitioners. EPA QA/G-9S, EPA/240/B-06/003 U.S. Environmental Protection Agency, Office of Environmental Information, Washington DC.

Gilbert, RO, JR Davidson, JE Wilson, BA Pulsipher. 2001. Visual Sample Plan (VSP) models and code verification. PNNL-13450, Pacific Northwest National Laboratory, Richland, Washington.