Seasonal Kendall Test for Monotonic Trend

Background Information

The purpose of the Seasonal Kendall (SK) test (described in Hirsch, Slack and Smith 1982, Gilbert 1987, and Helsel and Hirsch 1995) is to test for a monotonic trend of the variable of interest when the data collected over time are expected to change in the same direction (up or down) for one or more seasons, e.g., months. A monotonic upward (downward) trend means that the variable consistently increases (decreases) over time, but the trend may or may not be linear. The presence of seasonality implies that the data have different distributions for different seasons (e.g., months) of the year. For example, a monotonic upward trend may exist over years for January but not for June.

The SK test is an extension of the Mann-Kendall (MK) test, which is also available in VSP. The MK test should be used when seasonality is not expected to be present or when trends occur in different directions (up or down) in different seasons. The SK test is a nonparametric (distribution-free) test, that is, it does not require that the data be normally distributed. Also, the test can be used when there are missing data and data less that one or more limits of detection (LD).

The SK test was proposed by Hirsch, Slack and Smith (1982) for use with 12 seasons (months). The SK test may also be used for other seasons, for example, the four quarters of the year, the three 8-hour periods of the day, and the 52 weeks of the year. VSP assumes that the method described below (using the standard normal distribution to test if a trend is present based on the computed SK test statistic) is valid for any definition of season (week, month, 8-hr periods, hours of the day) that may be used. Hirsch, Slack and Smith (1982) showed that it is appropriate to use the standard normal distribution to conduct the SK test for monthly data when there are 3 or more years of monthly data. For any combination of seasons and years they also show how to determine the exact distribution of the SK test statistic rather than assume the exact distribution is a standard normal distribution.

Assumptions

The following assumptions underlie the SK test:

1. When no trend is present the observations are not serially correlated over time.

2. The observations obtained over time are representative of the true conditions at sampling times.

3. The sample collection, handling, and measurement methods provide unbiased and representative observations of the underlying populations over time.

4. Any monotonic trends present are all in the same direction (up or down). If the trend is up in some seasons and down in other seasons, the SK test will be misleading.

5. The standard normal distribution may be used to evaluate if the computed SK test statistic indicates the existence of a monotonic trend over time.

There are no requirements that the measurements be normally distributed or that any monotonic trend, if present, is linear. Hirsch and Slack (1994) develop a modification of the SK test that can be used when serial correlation over time is present.

Notation

Following Hirsch, Slack and Smith (1982), let \(x_{ij}\) denote the datum obtained for season \(i\) in year \(j\), where a season may be a month, week, day, etc. Here, for illustration purposes, we assume the season is a month. Then define

\begin{equation} X = (X_1 , X_2 , ..., X_{12}) \end{equation}

to be the entire data set collected over years consisting of data subsets \(X_1\) through \(X_{12}\), where

\( X_1 = (x_{11}, x_{12},..., x_{{1n}_1})\) is the subset of January data for \(n_1\) years,

\( X_2 = (x_{21}, x_{22},..., x_{{2n}_2})\) is the subset of February data for \(n_2\) years,

\( X_{12} = (x_{12,1}, x_{12,2}, ..., x_{{12,n}_{12}})\) is the subset of December data for \(n_{12}\) years.

The null hypothesis \(H_0\) for the SK test is that \(X\) (Equation 1) is a sample of independent random variables (\(x_{ij}\)) and that each \(X_i\) is a subsample of independent and identically distributed random variables over years. The alternative hypothesis \(H_a\) is that the \(H_i\) are not distributed identically over years. That is, there is an upward or downward monotonic trend over years for one or more months. Another way of stating the null hypothesis is that for each season (month), the data over years are randomly ordered. The alternative hypothesis is that there is a monotonic trend in more or more seasons (months) (Hirsch and Slack, 1984).

There is no restriction that \(n_i = n_l\) for \( i \neq l\) , i.e., that is, the number of years that data are collected can be different for different months. Also, there need not be a data value for every month-year combination during the sampling period. However, it is assumed that there is only one datum, \(x_{ij}\), per month-year combination. If there are multiple data for a month, VSP assumes that the median of the multiple data for a month are used in the SK test. Kendall (1975) considered an alternative approach (not currently in VSP) where the multiple data for a month are considered to be ties. Kendall (1975) and Gilbert (1987) give the changes needed in computing the SK test for that case.

Method of Computing the Seasonal Kendall Test

The SK test is conduced as described in the following steps. Note that the procedure begins by computing the Mann-Kendall (MK) test separately for each month.

1. List the data obtained for the \(i^{th}\) month in the order in which they were collected over time,\(x_{i1}, x_{i2}, ..., x_{in}\), which denote the measurements obtained for month \(i\) for years 1, 2, …, \(n\), respectively.

2. Determine the sign of all \(n_i(n_i - 1)/2 \) possible differences\( x_{ij} - x_{ik}\) for the \(i^{th}\) month, where \(j > k\). These differences are
$$ x_{i2} - x_{i1}, x_{i3} - x_{i1}, ..., x_{{in}_i} - x_{il}, x_{i3} - x_{i2}, x_{i4} - x_{i2}, ..., x_{{in}_i} - x_{{i, n}_i-2}, x_{{in}_i} - x_{{i, n}_i-1} $$

3. Let \(sgn( x_{ij} - x_{ik})\) be the indicator function for month \(i\). This function takes on the
values 1, 0, or -1 according to the sign of \(x_{ij} - x_{ik} \); that is,

\(sgn( x_{ij} - x_{ik})\)

= 1 if \(x_{ij} - x_{ik} \) > 0

 

= 0 if \(x_{ij} - x_{ik} \) = 0, or if the sign of \(x_{ij} - x_{ik} \) cannot be determined due to non-detects

 

= -1 if \(x_{ij} - x_{ik} \) < 0

                             

For example, if \(x_{ij} - x_{ik} \) > 0, that means that the observation for year \(j\) in month \(i\),
denoted by\(x_{ij}\), is greater than the measured concentration for year \(k\) in month \(i\),
denoted by \(x_{ik}\).

4. Compute

\begin{equation} \large S_i = \displaystyle\sum_{k=1}^{n_i - 1}  \displaystyle\sum_{j=k+1}^{n_i} sgn(x_{ij} - x_{ik}) \end{equation}

which is the number of positive differences minus the number of negative differences for the \(i^{th}\) month. If \(S_i\) is a positive number, observations made in month \(i\) in later years tend to be larger than those made in month \(i\) in earlier years. If \(S_i\) is a negative number, then observations made in month \(i\) in later years tend to be smaller than those made in month \(i\) in earlier years.

5. Compute the variance of \(S_i\) as follows:

\begin{equation} \large VAR( S_i) = \frac{1}{18} \left[ n_i (n_i - 1)(2n_i + 5) - \displaystyle\sum_{p=1}^{g_i}t_{ip} (t_{ip} - 1)(2t_{ip} + 5) \right] \end{equation}

where \(g_i\) is the number of tied groups for the \(i^{th}\) month and \(t_{ip}\) is the number of data in the \(p^{th}\) group for the \(i^{th}\) month. For example, if the sequence of measurements over 9 years for the \(i^{th}\) month is {23, 24, 29, 6, 29, 24, 24, 29, 23} we have \(g_i\) = 3 tied groups for that month: \(t_{i1} = 2\) for the tied value 23, \(t_{i2} = 3\) for the tied value 24, and \(t_{i3} = 3\) for the tied value 29. When there are ties in the data due to equal values or non-detects, the variance is adjusted by a tie correction method described in Helsel (2005, p. 191).

6. Compute

\begin{equation} S' = \displaystyle\sum_{i=1}^m S_i \end{equation}

and

\begin{equation} VAR(S') = \displaystyle\sum_{i=1}^m VAR(S_i) \end{equation}

where \(m\) is the number of months for which data have been obtained over years. For example, if data were obtained over years for each month of the year, then \(m\) = 12. If season is a week and data were obtained over years for all 52 weeks, then \(m\) = 52.

7. Compute the SK test statistic \(Z_{SK}\), as follows:

\(Z_{SK}\)

= \(\frac{S' - 1}{\sqrt{VAR(S')}}\) if \(S'\) > 0

 

= 0 if \(S'\) = 0

 

= \( \frac{S' + 1}{\sqrt{VAR(S')}}\) if  < 0

 

A positive (negative) value of \(Z_{SK}\) indicates that the data tend to increase (decrease) over time.

8. If the null hypothesis for the SK test is

\(H_0\): No monotonic trend over time

and the alternative hypothesis is

\(H_a\): For one or more seasons there is an upward monotonic trend over time

then \(H_0\) is rejected and \(H_a\) is accepted if \(Z_{SK} \geq Z_{1 - \alpha}\), where \(Z_{1 - \alpha} \) is the \( 100(1 - \alpha )^{th}\) percentile of the standard normal distribution. Note that \(\alpha\) is the tolerable probability the stakeholders can accept that the SK test will falsely reject the null hypothesis.

9. If the null hypothesis is

\(H_0\): No monotonic trend over time

and the alternative hypothesis is

\(H_a\): For one or more seasons there is a downward monotonic trend over time,

then \(H_0\) is rejected and \(H_a\) is accepted if \( Z_{SL} \leq -Z_{1 - \alpha}\).

10. If the null hypothesis is

\(H_0\): No monotonic trend over time

and the alternative hypothesis is

\(H_a\): For one or more seasons there is an upward or downward monotonic trend over time,

then \(H_0\) is rejected and \(H_a\) is accepted if \( |Z_{SK}| \geq Z_{1 - \alpha /2} \), where the vertical bars denote absolute value

References:

Gilbert, R.O. 1987. Statistical Methods for Environmental Pollution Monitoring. Wiley, NY.

Helsel, D.R. and R.M. Hirsch. 1995. Statistical Methods in Water Resources. Elsevier, NY.

Hirsch, R.M. and J.R. Slack. 1984. A nonparametric trend test for seasonal data with serial dependence. Water Resources Research 20(6):727-732.

Hirsch, R.M., J.R. Slack and R.A. Smith. 1982. Techniques of Trend Analysis for Monthly Water Quality Data. Water Resources Research 18(1):107-121.

The Mann-Kendall dialog contains the following controls:

Alternative Hypothesis

False Rejection Rate (Alpha)

False Acceptance Rate (Beta)

Change that is Important to Detect Trend Time Period

Data will be sampled every Sample Collection Period

Standard Deviation of Residuals

Calculate Button

Data Analysis page

Data Entry sub-page

Summary Statistics sub-page

Tests sub-page

Plots sub-page