Summary Statistics Sub-page

This sub-page of the Data Analysis page displays basic summary statistics computed from the individual data values from the Data Entry Sub-page .

image\pagesummary.gif

This page contains the following controls / displays:

Analyte Drop-List

Data Set Drop-List

Suspected Outlier

Significance Level

Statistic Output

\(n\)

is the number of measurements in a data set

Min

is the minimum of the \(n\) data

Max

is the maximum of the \(n\) data

Range

is the minimum value minus the maximum value, i.e., \( Range = x_{[n]} - x_{[1]} \)

Mean

is the arithmetic mean computed as: \( \bar x= \frac{1}{n} \displaystyle\sum\limits_{i=1}^n x_i \)

Median

is the 50th percentile of the data set, i.e., the value above which and below which half the data set lies. The sample median is computed from the ordered data (order statistics) \( x_{[1]} \leq x_{[2]} \leq \dotsb \leq x_{[n]} \) as follows:

Median = \( x_{[(n+1)/2]} \) if \(n\) is an odd number

Median = \( \frac{1}{2} (x_{[n/2]} + x_{[(n+2)/2]}) \) if \(n\) is an even number

Variance

is computed as \( Var = s^2 \) (see below)

Standard Deviation

is computed as:  \( s = \sqrt{ \frac{1}{n-1} \displaystyle\sum\limits_{i=1}^n (x_i - \bar x)^2} \)

Standard Error

Is the standard deviation of the estimated mean. It is computed as:

\( SE = \frac{s}{ \sqrt{n}} = \sqrt{ \frac{1}{n(n-1)} \displaystyle\sum\limits_{i=1}^n (x_i - \bar x)^2} \)

Interquartile Range

is the 75th percentile of the data set minus the 25th percentile of the data set

Skewness

is a measure of the symmetry of the data set. It is computed as:

\( SKEW = \frac{ \frac{n}{(n-1)(n-2)} \displaystyle\sum\limits_{i=1}^n (x_i - \bar x)^3}{s^3} \)

where \( \bar x \) and s are computed as given above.

\(P\)th Percentile

is the value below which \(p\)% of the \(n\) data fall and above which (100 - \(p\))% of the \(n\) data fall.

The \(P\)th percentile is computed in VSP by first computing \(k\) = \(p\)(\(n\)+1). If \(k\) is an integer, the \(P\)th percentile is \( x_{[k]} \) , which is the \(K\)th largest of the \(n\) data values. For example, to compute the 50th percentile of a data set of \(n\) = 9 measurements, we have p = 0.50 and \(k\) = \(p\)(\(n\)+1) = 0.50(10) = 5, which is an integer. Hence, the 50th percentile (the median) is the 5th largest datum, \( x_{[5]} \).

If \(k\) is not an integer, the \(P\)th percentile is obtained by linear interpolation between the two closest order statistics. For example if \(n\) = 11 and \(p\) = 0.70, then \(k\) = p(n+1) = 0.70(12) = 8.4, then the 70th percentile is found by linear interpolation between the 8th and 9th largest of the 11 data, i.e., between \( x_{[8]} \) and \( x_{[9]} \) .

VSP also conducts outlier tests on the data.

If the sample design is non-parametric (no distributional assumptions), Walsh's Outlier Test is performed on the data.

If the sample design is parametric (assuming an normal distribution), Rosner's Outlier Test will be performed if there are at least 25 samples and Dixon's Outlier Test will be performed if there are 3 to 24 samples.

Summary Statistics Sub-page when Accounting for Non-Detects

When accounting for non-detects, the Summary Statistics page appears as follows:

image\PageSummaryND.gif

This page contains the following controls / displays:

Analyte Drop-List

Statistic Output

\(n\)

is the total number of measurements in the data set

# Detects

is the number of measurements that are considered to be detects

Min Detect

is the minimum of the data that are considered to be detects

Max Detect

is the maximum of the data that are considered to be detects

# Non-Detects

is the number of measurements that are flagged as non-detects

Min Non-Detect

is the minimum of the data that are flagged as non-detects

Max Non-Detect

is the maximum of the data that are flagged as non-detects

Mean

is the mean obtained from the Product Limit Estimator (Kaplan-Meier) analysis

Standard Error of Mean

is the standard deviation of the mean obtained from the Product Limit Estimator (Kaplan-Meier) analysis

Median

is the 50th percentile of the data set obtained from the Product Limit Estimator (Kaplan-Meier) analysis

Interquartile Range

is the 75th percentile (see below) of the data set minus the 25th percentile of the data set

\(P\)th Percentile

is the estimate of the value below which \(p\)% of the population values fall and above which (100 - \(p\))% of the population values fall. These values are obtained from the Product Limit Estimator (Kaplan-Meier) analysis.

The Product Limit Estimator for left-censored data is calculated directly as discussed by Bechtel (2000) and Singh (2006).

References:

Bechtel Jacobs Company, LLC. 2000. Improved Methods for Calculating Concentrations used in Exposure Assessment. Prepared for the DOE. Report # BJC/OR-416. pp 14-20. http://rais.ornl.gov/documents/bjc_or416.pdf.

Helsel, D.R. 2005. Nondetects and Data Analysis: Statistics for Censored Environmental Data. John Wiley & Sons, Inc. Hoboken, NJ. pp 63-68.

Singh, A., R. Maichle, and S.E. Lee. 2006. On the Computation of a 95% Upper Confidence Limit of the Unknown Population Mean Based Upon Data Sets with Below Detection Limit Observations. Prepared for the EPA. Report # EPA/600/R-06/022. pp 30-32.

http://www.epa.gov/osp/hstl/tsc/Singh2006.pdf