Walsh's Outlier Test

Walsh's test can be used to detect multiple outliers in a data set that is not required to be normally distributed. This test will detect outliers that are both much smaller and much larger than the rest of the data.

Although Walsh's test does not require that the data be normally distributed, it requires at least 60 samples to be performed at a significance level of \(\alpha\)=0.10, and at least 220 samples to be performed at a significance level of \(\alpha\)=0.05.

 

Performing Walsh's Test

VSP performs Walsh's Test as described in section 4.4.6 of the EPA's QA/G-9S documents (EPA). The \( n \) observed values are ordered from smallest to largest. We specify the maximum number of suspected outliers \( k \) and compute the following values:

$$ c = ceiling(\sqrt{2n}) $$

$$ r = k + c $$

$$ b^2 = \frac{1}{\alpha} $$

$$ a = \frac{1 + b \sqrt{\frac{(c-b^2)}{c-1}}}{c-b^2-1} $$

where \(\alpha\)=0.10 for 60  \(< n \leq\)  220, and \(\alpha\) = 0.05 for \(n >\)  220, and ceiling ( ) indicates rounding the value to the next largest integer.

If the following equation holds:

\( X_{(k)}-(1+a)X_{(k+1)}+aX_{(r)} < 0 \)

then the \( k \) smallest points are outliers with an \(\alpha\) level of significance.

The \( k \) largest points are outliers with an \(\alpha\) level of significance if

\( X_{(n+1-k)}-(1+a)X_{(n-k)}+aX_{(n+1-r)} > 0 \) .

If both of the inequalities are true, then the test concludes that both the \( k \) smallest and the \( k \) largest points are outliers, with a significance level of \( \alpha \).

References:

EPA. 2006. Data Quality Assessment: Statistical Methods for Practitioners. EPA QA/G-9S, EPA/240/B-06/003, U.S. Environmental Protection Agency, Office of Environmental Information, Washington DC.