Upper Tolerance Limits for the Normal Distribution

Background Information

One-sided upper tolerance limits can be used to statistically test whether a specified area or room in a building is contaminated with biological agents, chemicals or radionuclides at concentrations greater than their respective fixed action levels (ALs). The statistical meaning, use, and computation of tolerance limits are discussed in Hahn and Meeker (1991) and Helsel (2005, Chapter 6).

In this module VSP computes the number of measurements, \(n\), needed to compute a one-sided upper tolerance limit to statistically test if the true \(P\)th percentile of a normally distributed population exceeds a fixed AL. A discussion of this use of tolerance limits is given in Millard and Neerchal (2001, page 339). Once the data are obtained, if the VSP user inputs the \(n\) measurements into VSP using the Data Analysis tab of the dialog box, then VSP will compute the one-sided upper tolerance limit as well as the mean, standard deviation, median and the estimated \(P\)th percentile of the \(n\) data. VSP also computes the Shapiro-Wilk W test (Gilbert 1987, pp. 158-160) and a histogram and box plot of the data to assess if the data are normally distributed.

A one-sided upper tolerance limit on the true \(P\)th percentile of a population of measurements is identical to a one-sided upper confidence limit on the true \(P\)th percentile of the population (Hahn and Meeker 1991, page 61). The true \(P\)th percentile is the value above which \(100(1-P)\%\) of the population lies and below which \(100P\%\) of the population lies.

A one-sided upper tolerance limit on the true \(P\)th percentile of a population is a value computed using the \(n\) measurements, denoted here by \(UTL_{P,\alpha}\), such that at least \(100P\%\) of the population of measurements is less than \(UTL_{P,\alpha}\) with \(100(1-\alpha)\%\) confidence. For example, if \(P\) = 0.90 and \(\alpha\) = 0.05, then at least 90% of the population is less than the computed value \(UTL_{0.90,0.05}\) with 95% confidence.

The method VSP uses to compute one-sided upper tolerance limits assumes the data are normally distributed. If this assumption is false, the computed tolerance limit will not be accurate and decisions based on that computed limit may be in error. Hence, prior to using the one-sided tolerance limits, the Shapiro-Wilk W test and graphical plots mentioned above should be examined to evaluate if the measurements obtained are normally distributed. If they are not normally distributed, then the nonparametric (distribution-free) upper tolerance limits provided in VSP can be used to test the null hypothesis. These nonparametric limits are valid for any distribution, but more measurements will be needed to attain the same confidence in the decision.

Other assumptions that underlie the use of tolerance limits are:

Representative measurements have been obtained from a defined target population (e.g., a wall, section of a wall, the floor in a hallway, the walls and floors of a selected set of rooms, etc.) using simple random sampling or a systematic grid pattern that has a randomly selected starting location.
The measurements are statistically independent, i.e., there is no spatial correlation (no spatial patterns) of contaminant levels throughout the target population.

The assumption of statistical independence implies that tolerance limits may be most useful and defensible for building areas that are not expected to contain hot spots or other dominant spatial patterns. Hence, tolerance limits may be most defensible after decontamination has occurred and the objective is to test if the room is ready to be re-occupied.

Method Used in VSP to Estimate a Percentile of a Normal Distribution

The estimated \(P\)th percentile of a normal distribution, denoted by \(X_p\), is computed from \(n\) measurements as follows (Millard and Neerchal, 2001, page 276, Equation 5.138):

\begin{equation} x_p - \bar{x} - Z_ps \end{equation}

where

\(\bar{x} = \frac{1}{n}\displaystyle\sum_{i=1}^{n}x_i\) = mean of the \(n\) measurements \begin{equation} \end{equation}

\(x_i\) = \(i\)th measurement

\(Z_p\) = \(P\)th percentile of the standard normal distribution (normal distribution with mean zero and standard deviation 1, e.g., if \(P\) = 0.95, then \(Z_{0.95}\) = 1.645. Tables of \(Z_P\) values are in many statistical books, e.g. Gilbert (1987, Table A1, page 254). \begin{equation} \end{equation}

\(s = \sqrt{\frac{1}{n-1}\displaystyle\sum_{i=1}^{n}(x_i-\bar{x})^2}\) = standard deviation of the \(n\) measurements \begin{equation} \end{equation}

Method Used in VSP to Compute a One-Sided Upper Tolerance Limit on the \(P\)th Percentile of a Normal Distribution

A one-sided upper tolerance limit for the true \(P\)th percentile of a normal distribution is computed as follows:

\begin{equation}UTL_{P,\alpha} = \bar{x}+t_{n-1,Z_P\sqrt{n},1-\alpha}\frac{s}{\sqrt{n}} \end{equation}

where \(\bar{x}\) and \(s\) are computed using Equations (2) and (4), respectively, and

\begin{equation}\large t_{n-1,Z_P\sqrt{n},1-\alpha} \end{equation} = the \(100(1-\alpha)\)th percentile of the non-central t distribution with \((n-1)\) degrees of freedom and non-centrality parameter \(Z_p\sqrt{n}\).

The null hypothesis being tested is

\(H_0\): The true \(P\)th percentile \(\ge\) AL

or equivalently,

\(H_0\): Less than \(100P\%\) of the population \(\lt\) AL.

\(H_0\) is rejected if \(UTL_{P,\alpha} \lt\) AL, in which case the alternative hypothesis,

\(H_a\): More than \(100P\%\) of the population \(\lt\) AL,

is accepted as being true.

VSP states Conclude the Population is Contaminated if \(UTL_{P,\alpha} \ge\) AL.

VSP states Conclude the Population is Not Contaminated if \(UTL_{P,\alpha} \lt\) AL.

Method Used in VSP to Compute the Number of Measurements Needed to Test \(H_0\) Using a One-Sided Upper Tolerance Limit on the \(P\)th Percentile of a Normal Distribution

VSP computes the number of measurements, \(n\), needed to test the null hypothesis, \(H_0\), using the exact method given by Lyles and Kupper (1996, Equation 5). The procedure is to find the smallest integer \(n\) such that

\begin{equation} t_{n-1,-Z_P\sqrt{n},\alpha}-t_{n-1,-Z_{1-\theta}\sqrt{n},1-\beta} \ge 0 \end{equation}

where

\begin{equation} \large t_{n-1,-Z_P\sqrt{n},\alpha}\end{equation}

= the \(100(\alpha)\)th percentile of the non-central t distribution with \(n-1\) degrees of freedom and non-centrality parameter \(-Z_P\sqrt{n}\),

\begin{equation}t_{n-1,-Z_{1-\theta}\sqrt{n},1-\beta}\end{equation}

= the \(100(1-\beta)\)th percentile of the non-central t distribution with \(n-1\) degrees of freedom and non-centrality parameter \(-Z_{1-\theta}\sqrt{n}\),

\begin{equation} \theta = 1-\phi\Big[\frac{\text{UBGR}-\text{LBGR}}{\sigma}+Z_P\Big] \end{equation}

\(\alpha\)	is the false rejection rate specified by the VSP user, i.e., á is the probability the VSP user can tolerate that the data will falsely indicate that the null hypothesis, \(H_0\), should be rejected.
\(\beta\)	is the false acceptance rate specified by the VSP user, i.e., \(\beta\) is the probability the VSP user can tolerate that the data will falsely indicate that the null hypothesis should be accepted.
\(\phi(x)\)	is the probability that a measurement from a standard normal distribution falls below the value x.
\(\text{UBGR}\)	is the upper bound of the gray region of the Decision Performance Goal Diagram (DPGD) specified by the VSP user.
\(\text{LBGR}\)	is the lower bound of the gray region of the DPGD specified by the VSP user.
\(Z_P\)	is defined by Equation (3) above,
\(\sigma\)	is the true standard deviation of all possible measurements from the target population. In practice, an estimate of \(\sigma\) is used in Equation (10).

Statistical Assumptions

The assumptions that underlie the equations used to compute \(n'\) and \(n\) are:

1. The \(n\) measurements are normally distributed.

2. Representative measurements have been obtained from a defined target population (e.g. a wall, section of a wall, the floor in a hallway, the walls and floors of a selected set of rooms, etc.) using simple random sampling or a systematic grid pattern that has a randomly selected starting location.

3. The \(n\) measurements are statistically independent, i.e. there is no spatial correlation (no spatial patterns) of contaminant levels throughout the target population.

4. The estimated standard deviation, \(s\), is reasonably close to the true standard deviation, \(\sigma\), for the population. (Sensitivity of \(n\) to uncertainty in \(s\) can be evaluated below.

References:

Gilbert, R.O. 1987. Statistical Methods for Environmental Pollution Monitoring, Wiley & Sons, New York, NY.

Hahn, G.J. and W.Q. Meeker. 1991. Statistical Intervals. Wiley & Sons, Inc, New York, NY.

Helsel, D.R. 2005. Nondetects and Data Analysis, Statistics for Censored Environmental Data, Wiley & Sons, New York, NY.

Lyles, R.H. and L.L. Kupper. 1996. On Strategies for Comparing Occupational Exposure Data to Limits, American Industrial Hygiene Association Journal 57:6-15.

Millard, S.P. and N.K. Neerchal. 2001. Environmental Statistics with S-Plus. CRC Press, New York, NY.

The Parametric Upper Tolerance Limit dialog contains the following controls:

Required fraction of the population to be less than the action level

Type I Error Rate (Alpha)

Type II Error Rate (Beta)

Width of Gray Area (Delta) / LBGR / UBGR

Action Level

Estimated Standard Deviation

Cost page

Data Analysis page

Data Entry sub-page

Summary Statistics sub-page

Tests sub-page

Plots sub-page