LOWESS Smoothing of Trend Data

$image\LOWESS.gif$

For the LOWESS (Locally Weighted Smoothing Scatterplot), VSP uses the method developed by William S. Cleveland (1979). The method, as implemented by VSP, proceeds as follows:

1. The neighborhood size, $r$ , is computed as $f \times N$, where $N$ is the number of data points and $f$ is the neighborhood scaling factor $0 < f \leq 1$. (Note: VSP uses a default value of $f$ = 0.5)

2. A fitted value, $ \hat y_i$, is obtained for each data point ($x_i,y_i$), $i = 1..N$ using locally weighted regression:

A weight, $W_j$, $j = 1..N$, is calculated for each data point using the formula:

$$ W_j = Tri \left( \frac{x_j - x_i}{h} \right) $$

Where:

$h$ is the $r^{th}$ smallest distance $|x_j - x_i|$, $j = 1..N$ ,

and $Tri(x)$ is the TriCube function:

$$ Tri = ( 1 - |x|^3)^3 , \quad \text{for} |x| < 1 $$

$$ Tri = 0 , \quad \text{for} |x| \geq 1 $$

The fitted value is computed as:

$$ \hat y_i = x_i \times slope + intercept \text{,} $$

Where $slope$ and $intercept$ are the results of weighted linear regression using the weights $W_1..W_N$.

3. A robustness weight, $\delta_i \text{,} i = 1.. N $ , is computed for each data point using the formula:

$$ \delta_i = Bi \left( \frac{|y_i - \hat y_i|}{6s} \right) $$

Where:

$s$ is the median of the residual values $| y_i - \hat y_i | \text{,} \quad i = 1.. N$,

and $Bi(x)$ is the BiSquare function:

$ Bi = (1 - x^2)^2 , \quad \text{for} |x| < 1 $

$Bi ,\, \text{for} |x| \geq 1 $

4. A new fitted value, $ \hat y_i \text{,} \quad i = 1.. N$, is obtained for each data point using weighted regression based on the robustness weights:

A weight, $W_j \text{,} \, j = 1.. N$, is calculated for each data point using the formula:

$$ W_j = Tri \left( \frac{x_j - x_i}{h} \right) \delta_j $$

Where:

$h$ is the $r^{th}$ smallest distance $|x_j - x_i| \text{,} \quad j = 1.. N$,

$\delta_j$ is the robustness weight defined above,

and $Tri(x)$ is the TriCube function defined above.

The fitted value is computed as:

$$ \hat y_i = x_i \times slope + intercept \text{,} $$

Where slope and intercept are the results of weighted linear regression using the weights $W_1..W_N$.

5. Step 3 and 4 are performed a total of $t$ times. (Note: VSP uses a default value of $t$ = 2)

6. The output points $( x_i' , y_i') \text{,} \quad i = 1.. n$, are computed as follows:

$$ x_i' = x_{min} + (i-1) \left(\frac{x_{max} - x_{min}}{n - 1} \right) $$

Where:

$n$ is the number of output points,

$x_{min}$ is the minimum date in the data set,

and $x_{max}$ is the maximum date in the data set.

A weight, $W_j \text{,} \, j = 1.. N$, is calculated for each data point using the formula:

$$ W_j = Tri \left( \frac{x_j - x_i'}{h} \right) \delta_j $$

Where:

$h$ is the $r^{th}$ smallest distance $|x_j - x_i| \text{,} \quad j = 1.. N$,

$ \delta_j$ is the robustness weight defined above,

and $Tri(x)$ is the TriCube function defined above.

The fitted value is computed as:

$ y_i' = x_i' \times slope + intercept$,

Where $slope$ and $intercept$ are the results of weighted linear regression using the weights $W_1..W_N$.

The output points ($x_i',y_i'$) are plotted as the line for the visual display.

Reference:

William S. Cleveland. Robust Locally Weighted Regression and Smoothing Scatterplots, 1979, Journal of the American Statistical Association, Vol. 74, No. 368. p. 829-836.