Nonparametric Estimate of Trend

Background Information

Helsel and Hirsch (1995) show how to compute a nonparametric estimate of a linear line using the Kendall-Theil method when there are no seasonal differences in the trend. This method does not require that the residuals about the line be normally distributed. The estimate of the slope for the line was first developed by Theil (1950) and is discussed by Sen (1968) and illustrated in Gilbert (1987, pages 217-218). The intercept of the line is estimated using the method in Conover (1999, page 336). Neither the estimate of the slope or intercept is strongly affected by outliers. It is possible to estimate the slope if there are missing data or when less than 20% of the measurements are reported as less than the detection limit (Helsel and Hirsch 1995, page 371).

Assumptions

The observations obtained over time are not serially correlated.
The observations are representative of the true conditions at sampling times.
The sample collection, handling, and measurement methods provide unbiased and representative observations of the underlying populations over time.
The true trend is linear over time
There are no differences in the trend line for different seasons, e.g., months or calendar quarters.
There is no requirement that the measurements be normally distributed.

Nonparametric Slope Using the Method of Theil (1950) and Sen (1968)

A measurement is obtained at \( n \) points in time; \( t_1 , t_2, ... , t_n \) at a specified location. Compute the \( N = n(n-1)/2 \) slope estimates

\begin{equation} Q = \frac{y_j - y_k}{t_j - t_k} \end{equation}

for all \( j > k \) and \( k = 1,2,...,(n-1) \quad \mbox{and} \quad j = 2,3,...,n \) , where \( y_j \) and \( y_k \) are the measurements at times \( t_j \) and \( t_k \), respectively. The median of these \( N \) estimates of slope is the nonparametric slope estimate \( \hat \beta_1 \). Compute \( \hat \beta_1 \) as follows:

Rank the \( N \) values of \( Q \) from smallest to largest:

\begin{equation} Q_{[1]} \leq Q_{[2]} \leq ... \leq Q_{[N]} \end{equation}

Compute the nonparametric estimate of the slope as

\( \hat \beta_1 = Q_{[(N+1)/2]} \quad \) if \( N \) is odd

And \begin{equation} \end{equation}

\( \hat \beta_1 = \frac{1}{2} ( Q_{[N/2]} + Q_{[(N+2)/2]}) \) if \( N \) is even

Approximate Confidence Limits for the Nonparametric Slope Estimate \( \hat \beta_1 \)

Denote the upper and lower limits for the slope by \( UL \) and \( LL \), respectively. Then \( UL \) and \( LL \) are computed as follows:

\( UL \) is the \( U^{th} \) largest slope estimate, \( Q_{[U]} \) where

\begin{equation} U = 1 + (N + C_{\alpha})/2 \end{equation}

And

\begin{equation} C_{\alpha} = Z_{1- \alpha /2} [Var(S)]^{1/2} \end{equation}

Where \( Z_{1- \alpha /2} \) is the \( 100(1- \alpha /2) \) percentile of the standard normal distribution

\begin{equation} Var(S) = \frac{n(n-1)(2n+5)}{18} \end{equation}

\( LL \) is the \( L^{th} \) largest slope estimate, \( Q_{[L]} \) where

\begin{equation} L = (N - C_{\alpha})/2 \end{equation}

A \( 100(1 - \alpha )\)% confidence interval on the true slope is given by

\begin{equation} LL \leq \beta_1 \leq UL \end{equation}

This method assumes that \( n \) is large, say \( n > 10 \). Helsel and Hirsch (1995, page 273) provide a method for determining \( UL \) and \( LL \) when \( n = 10 \). VSP uses the approximate method above even when \( n = 10 \), but a warning message The upper and lower limits on the estimated slope are approximate when \( n = 10 \) appears when \( n =10 \).

Nonparametric Intercept, \( \hat \beta_o \) , of the Linear Trend

VSP computes an estimate of the nonparametric intercept of the assumed linear trend as follows (Helsel and Hirsch 1995, page 267; Conover 1999, page 336):

\begin{equation} \hat \beta_o = y_{MED} - \hat \beta_1 \times t_{MED} \end{equation}

where

\( y_{MED} \) = median of the \( n \) measurements \( y_1 , y_2 , ... , y_n \)

\( t_{MED} \) = median of the \( n \) times \( t_1 , t_2, ... , t_n \)

and

\( \hat \beta_1 \) = nonparametric slope estimate.

Estimating the Nonparametric Linear Line

The estimated nonparametric linear line is

\begin{equation} y = \hat \beta_o + \hat \beta_1 \times t \end{equation}

\begin{equation} y = y_{MED} + \hat \beta_1 (t - t_{MED}) \end{equation}

Hence the estimated value of \( y \) at time \( t_j \) is

\begin{equation} \hat y_j = y_{MED} + \hat \beta_1 (t_j - t_{MED} ) \end{equation}

References:

Conover, W.J. 1999. Practical Nonparametric Statistics, 3rd edition, Wiley, New York.

Gilbert, R.O. 1987. Statistical Methods for Environmental Pollution Monitoring, Wiley, New York.

Helsel, D.R. and R.M. Hirsch. 1995. Statistical Methods in Water Resources, Elsevier, New York.

Sen, P.K. 1968. Estimates of the regression coefficient based on Kendall's tau. Journal of the American Statistical Association 63:1379-1389.

Theil, H. 1950. A rank-invariant method of linear and polynomial regression analysis, 1,2, and 3: Ned. Akad. Wentsch Proc., 53:386-392, 521-525, and 1397-1412.