Confidence Interval on a Median Design

Background Information

The purpose of the Confidence Interval (CI) on a Median option is to calculate the minimum number of samples required to estimate the median within a prespecified margin of error at a given confidence level. This option makes no distributional assumptions about the data.

Equations Used to Calculate Recommended Minimum Number of Samples

Two sample confidence interval calculation methods are available for when \(n \le\ 20\) and \(n \gt 20\) (Conover, 1980, p. 111-117) (Gilbert, 1987,p. 141-142). In VSP, the user specifies a one-sided or two-sided CI, and the percent confidence desired to be within a specified number of percentiles of the median. A binary search is then performed on the CI equations to solve for the minimum sample size to satisfy these parameters. VSP first uses the \(n \le\ 20\) method to determine if a solution can be found where \(n \le\ 20\), and if not, then uses the \(n \gt 20\) method.

For a one-sided CI, the \(n \le\ 20\) method follows a binomial distribution and determines the lowest value of \(n\) where the yth quantile of the sample would be equal to or above the true median at least X% of the time (Note that the sample sizes work out to be the same regardless of whether the CI will be a lower or upper tailed, so the same formula can be used for any one-sided CI). X% is the specified confidence for one-sided confidence intervals, and the average of 100% and the specified confidence for two-sided confidence intervals. The yth quantile is determined by taking 50% and subtracting the allowable number of percentiles from the median specified by the user.

For a two-sided CI, the \(n \le\ 20\) method is the same as performing a one-sided confidence interval, but replacing the specified percent confidence with the average of 100% and the specified percent confidence. If no solution is found where \(n \le\ 20\), a binary search is conducted using Equation (1) for a one-sided CI and Equation (2) for a two-sided interval to compute the value of \(r\), rounding \(r\) up to the next higher integer. The search is repeated until the smallest \(n\) is found where \(100*2*r/n\) is greater than 50% minus the allowable number of percentiles from the median specified by the user.

\begin{equation} r = 0.5*n-0.5*w_{\alpha}\sqrt{n} \end{equation}

\begin{equation} r = 0.5*n-0.5*w_{\alpha/2}\sqrt{n} \end{equation}

Where:

\(r\) is the rank of the lowest value expected to fall within the confidence interval when ranking the \(n\) samples from smallest to largest.

\(n\) is the sample size.

\(w_{\alpha}\) is the specified quantile of a normal distribution, e.g. if \(\alpha\) = 0.2 then \(w_{\alpha}\) is the 0.2 quantile (20th percentile) of a normal distribution.

The \(n \gt 20\) method described above is an approximation method, so VSP also performs a final check using a binomial distribution in a similar manner to the \(n \le\ 20\) method, and may increase the sample size by 1 to ensure the specified confidence levels are met.

Confidence Interval Calculation

Once data is collected, a CI can be computed using the binomial distribution. The values are ordered from smallest to largest and numbered from 1 to \(n\). Given parameters \(n\) and 0.5, the probability of \(r\) or fewer observations for a lower tail and/or \(s\) or more observations for an upper tail can be computed using each value from 1 to \(n\) for both \(r\) and \(s\). For a lower one-sided CI, determine the largest value of \(r\) such that probability of \(r\) occurrences following the binomial distribution is less than \(\alpha\). For an upper one-sided CI, determine the smallest value of \(s\) such that the probability of \(s\) occurrences following the binomial distribution is greater than \(1-\alpha\). For a two-sided CI, follow the method used for the one-sided CIs except substituting \(\alpha/2\) for \(\alpha\). The measured values that correspond to these numbered values are the values used for the confidence intervals.

References:

Conover,WJ. 1980. Practical Nonparametric Statistics. John Wiley and Sons, New York.

Gilbert, RO. 1987. Statistical methods for environmental pollution monitoring. Van Nostrand Reinhold, New York. (Same text available as Gilbert, RO. 1997. John Wiley & Sons, New York.)