Trend Regression Calculations

The Trend tool in EQuIS Professional and EQuIS LakeWatch reports does regression calculations to identify trends in seasonal data. On the bottom chart, the observed and residual data are plotted against time and straight-line plots are fitted to the data using ordinary least square regressions. A p-value is then calculated for the line fitted to the data. A low p-value means that there is a low probability that the fit of the line is attributable to chance; in other words, there is a low possibility of observing a trend at least as large as the value calculated when there is no trend in the data. This article explains how each of the values is calculated.

Two separate regressions are calculated and displayed on the chart: The data regression (the actual seasonalized data points), and the residuals regression (the difference between the actual seasonalized data points and the best-fit polynomial curve). The input to both regressions is the set of independent and dependent value pairs (x,y) where the independent value (x) is a numeric representation of the date and the dependent value (y) is the observed value (actual or residual) on that date.

The simple linear regression of the form y = b + mx is computed using ordinary least squares estimation as defined below.

•Sx = ∑ xi = The sum of all x (independent) values in the set.

•Sy = ∑ yi = The sum of all y (dependent) values in the set.

•Sxy = ∑ xiyi = The sum of the products (x*y) of each data point in the set.

•Sxx = ∑ xi2 = The sum of the squares (x*x) of each x (independent) value in the set.

•Syy = ∑ yi2 = The sum of the squares (y*y) of each y (dependent) value in the set.

•n = The number of data points in the set.

•b = ( SySxx − SxSxy ) / ( nSxx − SxSx ) = The intercept constant of the linear regression (compare to Excel's INTERCEPT and LINEST functions).

•m = ( nSxy − SxSy ) / ( nSxx − SxSx ) = The slope coefficient of the linear regression (compare to Excel's SLOPE and LINEST functions).

•R = ( nSxy − SxSy ) / √((nSxx − SxSx) * (nSyy − SySy)) = The Pearson correlation coefficient indicating goodness-of-fit (compare to Excel's PEARSON function). The square of this value (R-squared) is the coefficient of determination (compare to Excel's LINEST function).

•SEr2 = ( Syy − bSy − mSxy ) / ( n − 2 )

•SEm = SEr / √( Sxx − SxSx/n ) = The standard error of the slope coefficient. Compare to Excel's LINEST function.

•t = m / SEm = The t-statistic representing the ratio of the slope coefficient and its standard error.

•df = n − 2 = The Degrees of freedom used in calculating probability.

•p = The calculated p-value that represents the probability (two-tailed Student t-distribution) based on the t-statistic and degrees of freedom (n-2). Compare to Excel's T.DIST or T.DIST.2T function.