Helsel and Hirsch (1995) show how to compute a nonparametric estimate of a linear line using the Kendall-Theil method when there are no seasonal differences in the trend. This method does not require that the residuals about the line be normally distributed. The estimate of the slope for the line was first developed by Theil (1950) and is discussed by Sen (1968) and illustrated in Gilbert (1987, pages 217-218). The intercept of the line is estimated using the method in Conover (1999, page 336). Neither the estimate of the slope or intercept is strongly affected by outliers. It is possible to estimate the slope if there are missing data or when less than 20% of the measurements are reported as less than the detection limit (Helsel and Hirsch 1995, page 371).
The observations obtained over time are not serially correlated.
The observations are representative of the true conditions at sampling times.
The sample collection, handling, and measurement methods provide unbiased and representative observations of the underlying populations over time.
The true trend is linear over time
There are no differences in the trend line for different seasons, e.g., months or calendar quarters.
There is no requirement that the measurements be normally distributed.
A measurement is obtained at n points in time; t1,t2,...,tn at a specified location. Compute the N=n(n−1)/2 slope estimates
Q=yj−yktj−tk
for all j>k and k=1,2,..., (n−1)andj=2,3 ,...,n , where yj and yk are the measurements at times tj and tk, respectively. The median of these N estimates of slope is the nonparametric slope estimate ˆβ1. Compute ˆβ1 as follows:
Rank the N values of Q from smallest to largest:
Q[1]≤Q[2]≤...≤Q[N]
Compute the nonparametric estimate of the slope as
ˆβ1=Q[(N+1)/2] if N is odd
And
ˆβ1=12(Q[N/2]+Q[(N+2)/2]) if N is even
Denote the upper and lower limits for the slope by UL and LL, respectively. Then UL and LL are computed as follows:
UL is the Uth largest slope estimate, Q[U] where
U=1+(N+Cα)/2
And
Cα=Z1−α/2[Var(S)]1/2
Where Z1−α/2 is the 100(1−α/2) percentile of the standard normal distribution
Var(S)=n(n−1)(2n+5)18
LL is the Lth largest slope estimate, Q[L] where
L=(N−Cα)/2
A 100(1−α)% confidence interval on the true slope is given by
LL≤β1≤UL
This method assumes that n is large, say n>10. Helsel and Hirsch (1995, page 273) provide a method for determining UL and LL when n=10. VSP uses the approximate method above even when n=10, but a warning message The upper and lower limits on the estimated slope are approximate when n=10 appears when n=10.
VSP computes an estimate of the nonparametric intercept of the assumed linear trend as follows (Helsel and Hirsch 1995, page 267; Conover 1999, page 336):
ˆβo=yMED−ˆβ1×tMED
where
yMED = median of the n measurements y1,y2,...,yn
tMED = median of the n times t1,t2,...,tn
and
ˆβ1 = nonparametric slope estimate.
The estimated nonparametric linear line is
y=ˆβo+ˆβ1×t
or
y=yMED+ˆβ1(t−tMED)
Hence the estimated value of y at time tj is
ˆyj=yMED+ˆβ1(tj−tMED)
Conover, W.J. 1999. Practical Nonparametric Statistics, 3rd edition, Wiley, New York.
Gilbert, R.O. 1987. Statistical Methods for Environmental Pollution Monitoring, Wiley, New York.
Helsel, D.R. and R.M. Hirsch. 1995. Statistical Methods in Water Resources, Elsevier, New York.
Sen, P.K. 1968. Estimates of the regression coefficient based on Kendall's tau. Journal of the American Statistical Association 63:1379-1389.
Theil, H. 1950. A rank-invariant method of linear and polynomial regression analysis, 1,2, and 3: Ned. Akad. Wentsch Proc., 53:386-392, 521-525, and 1397-1412.