This sub-page of the Data Analysis page displays basic summary statistics computed from the individual data values from the Data Entry Sub-page .
Statistic Output |
|
n |
is the number of measurements in a data set |
Min |
is the minimum of the n data |
Max |
is the maximum of the n data |
Range |
is the minimum value minus the maximum value, i.e., Range=x[n]−x[1] |
Mean |
is the arithmetic mean computed as: ˉx=1nn∑i=1xi |
Median |
is the 50th percentile of the data set, i.e., the value above which and below which half the data set lies. The sample median is computed from the ordered data (order statistics) x[1]≤x[2]≤⋯≤x[n] as follows: Median = x[(n+1)/2] if n is an odd number Median = 12(x[n/2]+x[(n+2)/2]) if n is an even number |
Variance |
is computed as Var=s2 (see below) |
Standard Deviation |
is computed as: s=√1n−1n∑i=1(xi−ˉx)2 |
Standard Error |
Is the standard deviation of the estimated mean. It is computed as: SE=s√n=√1n(n−1)n∑i=1(xi−ˉx)2 |
Interquartile Range |
is the 75th percentile of the data set minus the 25th percentile of the data set |
Skewness |
is a measure of the symmetry of the data set. It is computed as: SKEW=n(n−1)(n−2)n∑i=1(xi−ˉx)3s3 where ˉx and s are computed as given above. |
Pth Percentile |
is the value below which p% of the n data fall and above which (100 - p)% of the n data fall. The Pth percentile is computed in VSP by first computing k = p(n+1). If k is an integer, the Pth percentile is x[k] , which is the Kth largest of the n data values. For example, to compute the 50th percentile of a data set of n = 9 measurements, we have p = 0.50 and k = p(n+1) = 0.50(10) = 5, which is an integer. Hence, the 50th percentile (the median) is the 5th largest datum, x[5]. If k is not an integer, the Pth percentile is obtained by linear interpolation between the two closest order statistics. For example if n = 11 and p = 0.70, then k = p(n+1) = 0.70(12) = 8.4, then the 70th percentile is found by linear interpolation between the 8th and 9th largest of the 11 data, i.e., between x[8] and x[9] . |
If the sample design is non-parametric (no distributional assumptions), Walsh's Outlier Test is performed on the data.
If the sample design is parametric (assuming an normal distribution), Rosner's Outlier Test will be performed if there are at least 25 samples and Dixon's Outlier Test will be performed if there are 3 to 24 samples.
When accounting for non-detects, the Summary Statistics page appears as follows:
Statistic Output |
|
n |
is the total number of measurements in the data set |
# Detects |
is the number of measurements that are considered to be detects |
Min Detect |
is the minimum of the data that are considered to be detects |
Max Detect |
is the maximum of the data that are considered to be detects |
# Non-Detects |
is the number of measurements that are flagged as non-detects |
Min Non-Detect |
is the minimum of the data that are flagged as non-detects |
Max Non-Detect |
is the maximum of the data that are flagged as non-detects |
Mean |
is the mean obtained from the Product Limit Estimator (Kaplan-Meier) analysis |
Standard Error of Mean |
is the standard deviation of the mean obtained from the Product Limit Estimator (Kaplan-Meier) analysis |
Median |
is the 50th percentile of the data set obtained from the Product Limit Estimator (Kaplan-Meier) analysis |
Interquartile Range |
is the 75th percentile (see below) of the data set minus the 25th percentile of the data set |
Pth Percentile |
is the estimate of the value below which p% of the population values fall and above which (100 - p)% of the population values fall. These values are obtained from the Product Limit Estimator (Kaplan-Meier) analysis. |
The Product Limit Estimator for left-censored data is calculated directly as discussed by Bechtel (2000) and Singh (2006).
Bechtel Jacobs Company, LLC. 2000. Improved Methods for Calculating Concentrations used in Exposure Assessment. Prepared for the DOE. Report # BJC/OR-416. pp 14-20. http://rais.ornl.gov/documents/bjc_or416.pdf.
Helsel, D.R. 2005. Nondetects and Data Analysis: Statistics for Censored Environmental Data. John Wiley & Sons, Inc. Hoboken, NJ. pp 63-68.
Singh, A., R. Maichle, and S.E. Lee. 2006. On the Computation of a 95% Upper Confidence Limit of the Unknown Population Mean Based Upon Data Sets with Below Detection Limit Observations. Prepared for the EPA. Report # EPA/600/R-06/022. pp 30-32.