Design Proportion Confidence Interval

Background Information

The purpose of this method is to find the proportion of sample locations on the site that have a certain characteristic of concern (such as presence of a contaminant above a specified level or detectable presence) and compute the upper confidence limit (UCL) on that proportion of samples. You want to be X% confident that the true proportion of samples on the site that have the characteristic of concern is below the UCL.

Equations Used to Calculate Recommended Minimum Number of Samples

The equation used to calculate the number of samples is based on a standard normal approximation of the binomial distribution.  A standard normal approximation of the binomial distribution is used to determine the number of samples because of the fluctuation in statistical power associated with the exact binomial proportion confidence interval.  Small changes in the estimate of the proportion cause drastic changes in the associated statistical power (Chernick & Liu, 2002).  This in turn translates to seemingly contradictory sample size requirements.  Using the standard normal approximation guarantees that with greater uncertainty, the number of samples required to satisfy the confidence level increases.  Given that the number of samples is modestly large and the proportion of unacceptable items is neither approximately 0 nor 1, then the sample size generated by the standard normal approximation is sufficient.

The equation used to calculate the number of samples is:

\begin{equation} n = {(z_{1-\alpha})^2 p (1-p) \over d^2} \end{equation}

where

\(n\)

is the number of samples required,

\(\alpha\)

is the maximum acceptable probability that the true proportion exceeds the UCL,

\(z_{1-\alpha}\)

is the value of the standard normal distribution such that the proportion of the distribution less than \(z_{1-\alpha}\) is \(1-\alpha\),

\(d\)

is the maximum desired difference between the estimated proportion and the UCL, and

\(p\)

is the maximum expected proportion. This proportion is set at 0.5 unless specified by the user.

Statistical Assumptions

The assumptions associated with the formulas for computing the number of samples are:

1.

The distribution of samples with the characteristic of concern follows a Binomial(\(n\);\(p\)) distribution where \(n\) = total number of samples and \(p\) = proportion of unacceptable samples.

2.

The sampling locations will be selected randomly or any judgmentally selected samples are representative of the population. If using judgment sampling to select those locations where the likelihood of unacceptable samples is highest, the estimated proportion could be biased high. This may be acceptable if one desires an upper bound on the true proportion.

The these assumptions will be assessed in a post data collection analysis.

Data Analysis

Given the total number of samples obtained and the number of those samples that have a certain characteristic of concern (unacceptable or contaminated), the estimated proportion and an upper confidence limit (UCL) on that proportion can be calculated. The methodology used to calculate the UCL is the Agresti-Coull method (Agresti & Coull, 1998). This method is preferred over the UCL given by the standard normal approximation. The UCL associated with the standard normal approximation does not have sufficient coverage probability when the number of samples is small. In essence, the Agresti-Coull method gives a tight upper bound on confidence and, in theory, the true value for the proportion is contained in the confidence interval \( (1-\alpha)\)% of the time. The formula used is:

\begin{equation} UCL = \tilde{p} +z_{(1-\alpha)} \sqrt{{1 \over \tilde{n}}\tilde{p} (1-\tilde{p})} \end{equation}

where

\( \tilde{n}\)

= \(n+z_{(1-\alpha)}^2 \)

\( \tilde{p}\)

= \({1 \over \tilde{n}} ( X+{1 \over 2}z_{(1-\alpha)}^2 ) \)

\(n\)

is the number of samples obtained,

\(\alpha\)

is the maximum acceptable probability that the true proportion exceeds the UCL,

\(z_{1-\alpha}\)

is the value of the standard normal distribution such that the proportion of the distribution less than \(z_{1-\alpha}\) is \(1-\alpha\),

\(X\)

is the number of unacceptable or contaminated samples

References:

Agresti, A., & Coull, B. (1998). Approximate is better than "exact" for interval estimation of binomial proportions. The American Statistician, 52(2), 119-126.

Chernick, M. R., and Liu, C. Y. (2002). The saw-toothed behavior of power versus sample size and software solutions: Single binomial proportion using exact methods. The American Statistician, 56, 149-155.

The One-Sample t-Test dialog contains the following controls and inputs:

Analyte

Confidence

Maximum desired difference between the estimated proportion and the UCL

Radio button - Don't use prior knowledge

Radio button - Expect the proportion to be below a specified level

Maximum expected proportion