This page is intended to describe how the long-term water-level trends are initially identified and quantified, and how they are subsequently used.
As discussed in the trend detection section of this page, the primary testing for trends in this study is conducted using regression analysis and the nonparametric Mann-Kendall test (Kendall 1938) which is commonly used for hydrologic data analysis (Hirsch and Slack 1984, Helsel and Hirsch 1992). However, a different or modified version of the standard test is required to test water-elevation data in which there is a significant seasonal component.
Hirsch, and others (1982) developed such a test by computing the Kendall score separately for each month. The separate monthly scores are then summed to obtain the test statistic. The variance of the test statistic is obtained by summing the variances of the Kendall score statistic for each month. In this test, the null hypothesis is that the time series is of the form zt = µm + et where et is white noise error and µm represents the mean for period m.
Among the advantages of the Seasonal Kendall trend test is that it is a rank-based procedure especially suitable for non-normally distributed data, censored data, data containing outliers and non-linear trends. The null hypothesis of randomness H0 states that the data(x1, .., xn) are a sample of n independent and identically distributed random variables. The trend test statistic Z is used as a measure of trend magnitude, or of its significance. It is not a direct quantification of trend magnitude.
The methods used for calculating the Seasonal Kendall trend test statistic and the slope estimator are derived from the discussion of these methods and their limitations in McBride (2000).
Classify a time-series data set by month of the year.
For the purposes of ths project, daily maximum water levels are grouped by month of observation and the mean of these values computed. The resulting monthly mean daily maximum values are then sorted by month of the year.
For each month i:
Compute the sign of all possible value differences within the set of values for that month, sign(valuem - valuen) where valuem is from a year that is later than valuen. For example, all October values are compared to each other, but not to any November or September values, and the October 1978 value would be subtracted from the October 1981 value, but not vice versa. For 5 years of data, there would be 10 pairs, for 6 years of data 15 pairs, and so forth.
Because the published accuracy of the historical ground-water elevation data is +/- 0.05 ft, differences between -0.05 and 0.05 are considered to be 0.
Convert the positive signs to +1, negative signs to -1, and 0 results to 0. Then add the results and call that Si.
Compute the variance of Si, Var(Si), from [n(n-1)(2n+5)-Summation(tip(tip-1)(2tip+5))]/18 where n is the number of (monthly) values in the set, tip is the number of tied data in pth tied group for the ith, and the summation is over the number of tied groups for that month.
For computational purposes, group size is considered to be 2 (number of terms in the initial difference comparison). This results in a higher variance in cases where there are multiple tied values of the same value - for example, treating three equivalent values as 2 ties of two members instead of 1 tie of three values. The higher variance, in turn, results in a lower test statistic, and consequently an overall stricter test for overturning the null hypothesis (The null hypothesis is that there is NO monotonic trend in the data.) The resulting variance computation is the simpler [n(n-1)(2n+5)-18*tip]/18
Compute S as the sum of the Si series over all months.
Compute the variance of S, by summing the variance Var(Si) over all months.
Compute the test statistic (ZsK) from the large sample normal approximation, with a continuity correction of one unit, where:
ZsK = (S-1)/(Var(S)0.5 if S > 0
ZsK = 0 if S = 0
ZsK = (S+1)/(Var(S)0.5 if S < 0
Consider the null hypothesis to be invalid if |ZsK| > Za/2, where a is the chosen significance level and Za/2 is the value of the abscissa that cuts off an area = a/2 in the right tail of the unit normal distribution.
For the purposes of the project, we are looking for 5 percent of the pairs of monthly mean values compared to be different by 0.05 ft or more, with a minimum of two pairs that are different.
While the Seasonal Kendall trend test is being performed, record the slopes of each pair where slope = (valuem - valuen) / (m - n)
Sort the slopes in ascending order and take the median estimate. In the case of and even number of slopes, the mean of the two median values is used. For example, if there are 100 slope values, the average of the 50th and 51st values are used.
Another statistical test carried out on the time-series water-level data was the Mann-Kendall trend test for correlation. The annual mean daily maximum water level was chosen for testing of correlation with time. Like the Seasonal Kendall trend test (SKTT), this test is more applicable toward monotonic trends. Because annual means are used, the test is somewhat less sensitive to very small amplitude trends, but it is less sensitive to seasonal effects. Some sensitivity to extreme events does pose a potential problem given the smaller sample size used, compared to the SKTT.
Helsel and Hirsch (1992) provide further information on the Mann-Kendall trend test.
The initial assumption is that water levels at all of the ground-water sites will display a long-term trend with time. For the purpose of examining how water levels at a site change through time, even a demonstrably 0.000 ft/yr slope in a water-level trend is considered a significant result. For operational purposes, and because of historical site management practices, surface-water sites in southern Florida are not analyzed for long-term trends.
As a consequence of a long-term trend, water levels from a site, when binned by any consistent time interval shorter than the period of record being examined, will display some measure of auto-correlation. This is expected to affect the p-values derived from the Mann-Kendall trend test; however, this is one of the few tests available in SAS that performs a correlation analysis and outputs results in a format amenable to automated processing. While the Mann-Kendall trend test is computed for both water year mean values and daily values binned by month, only the results of the test against the annual mean values is used.
Five years of record is considered the minimum period necessary for water-level trend determination. This is based on review of ground-water data from southern Florida, where water levels throughout the region are actively managed to some extent, and may not be sufficient record in all locations.
SAS Institute statistical software is used for the statistical analyses. Data retrieval, formatting, storage, and output processing (i.e., for graphs) are conducted using perl scripts written for the project.
Daily water-level data for the past 25 years, not including the current water year, are retrieved. If less than 5 years from the start to the end of the data are available within this retrieval, then a zero-order regression (best fit of the data to a zero-slope line, generally having an intercept approximating the mean water level) is used for the water-level trend.
Data timestamps are converted to the following values and added to the data set: week of the year, USGS water year, decimal year by month (base 1975 as zero year), and decimal year by second (base 1975 as zero year)
Data for current water year and data prior to 1975 are assigned a statistical weight of 0. All other data are assigned a statistical weight of 1. Because of how the data retrieval procedure is coded, all data should have a weight of 1.
Period of record means are computed and grouped by data element. Because of how the data retrieval procedure is coded, all data should have the same data element ID number, but it remains useful when merging data sets in the SAS scripts used.
Data are sorted by data element, date and time. Then the means dataset is merged into the working data set.
A data characteristics analysis is then run by data element, data value day (day of year, 1-366), and data time of day. The number of elements for a given Julian day are retrieved from the output to determine the number of years of record available. Again, 5 years of record is considered the minimum period necessary for water-level trend determination.
A regression is performed of water-level data relative to decimal date (date recorded to the second). Predicted and residual values from the regression model are merged back to the working data set. For stations with insufficient record, the predicted values are nulled, and the residual values are reassigned as the original data values.
Another data characteristics analysis is then run by data element and water year to obtain the annual mean of data values by water year. This data set is retained for the Mann-Kendall trend test.
The Mann-Kendall trend test is performed on the water year annual means data set for correlation of the mean water levels to time.
The regression analysis predicted and residual data data set, the regression results output data set, and the Mann-Kendall results output data set are then saved to disk.
Another data characteristics analysis is then run by data element, calendar year, and month to obtain the monthly means of data values. This data set is saved to disk for the Seasonal Kendall trend test.
A univariate population statistics analysis is then run on the working data set, and the population statistics for the retrieved period are saved to disk.
A second Mann-Kendall trend test is performed on the retrieved period of data for correlation of the water levels to time expressed by calendar month (August 1976 = time 1.75; October 1980 = 5.833). The results of this analysis are then saved to disk.
After each iteration:
The Seasonal Kendall trend test is performed on the monthly means data previously saved to disk.
The period of (retrieved) record mean value is retrieved from the data saved to disk. This is used as the intercept value for the zero-slope trend line. Because the procedure used to evaluate water-level trends is run annually, while the population statistics may be run for the site separately, the mean stored at this point may not be identical to the most recent (i.e., later) mean of a 25-year retrieval to date.
The regression output results data are retrieved from disk and are parsed to retrieve the root mean squared error (RMSE), order of the latest regression analysis, regression equation line intercept, tau-b from the Mann-Kendall trend test, and the p-value from the Mann-Kendall test.
The most recent regression order, RMSE, tau-b, p-value, regression intercept, and period of record mean are stored in the project database as appropriate to the data.
After the first iteration:
If a zero-slope trend line is required because the site is not a ground-water site or because of insufficient record, no further iterations are performed.
If only one iteration has been performed, the SAS-based analyses are performed again.
If more than one iteration has been performed, the RMSE is checked against the previously stored RMSE value as a measure of the fit of the regression curve equation to the data. If the RMSE has decreased by 5 percent or more, the revised regression and Mann-Kendall statistical values are stored and the SAS-based analyses are performed again. The highest regression order used in the automated trend testing is the fourth order. Of the south Florida sites examined, more than 95 percent of the sites are best described by a second- or lower-order regression.
If no improvement of fit is identified by a second-order regression, and if the p-value from the Mann-Kendall trend test > 0.05, then a zero-slope trend line is recorded for the site.
However, if the Seasonal Kendall trend test does show that there is is a significant linear trend in the water-level data, then the linear regression intercept is retained and a linear (first-order) reference is stored in the database.
Another perl/SAS script is run, using maximum regression order information from the above computations to compute the parameters of a regression equation for the data period.
The data for the 25-year period of record are graphed against the computed trend line and reviewed periodically for hydrologic correctness by a USGS hydrologist. At that time, a higher- or lower-order regression may be run, and parameters stored, to obtain a better fit of the trend line to the available data.
To compute a regression line from the stored parameters, the following equation is used:
value = 0 + (intercept + a*(date)1 + b*(date)2 ... )
up to the maximum order of regression selected for best fit to the data.
To compute the de-trended population statistics for data comparison, the following equation is used:
value = stored residual + (intercept + a*(date)1 + b*(date)2 ... )
up to the maximum order of regression selected for best fit to the data.
SAS Institute statistical software is used for the statistical analyses. Data retrieval, formatting, storage, and output processing (i.e., for graphs) are conducted using perl scripts written for the project.
Daily water-level data for the past 25 years, not including the current water year, are retrieved.
Data timestamps are converted to the following values and added to the data set: week of the year, USGS water year, decimal year by month (base 1975 as zero year) and decimal year by second (base 1975 as zero year).
Data for the current water year and data prior to 1975 are assigned a statistical weight of 0. All other data are assigned a statistical weight of 1.
Period of record means are computed and grouped by data element. Because of how the data retrieval procedure is coded, all data should have the same data element ID number, but it remains useful when merging data sets in the SAS scripts used.
Data are sorted by data element, date, and time. Then the means data set is merged into the working data set.
A data characteristics analysis is then run by data element, data value day (day of year, 1-366), and data time of day. The number of elements for a given Julian day are retrieved from the output to determine the number of years of record available. Again, 5 years of record is considered the minimum period necessary for water-level trend determination.
A regression is performed of water-level data relative to decimal date (date recorded to the second). For sites with non-linear trend lines, the regression model is altered to also include decimal date raised to the appropriate order. Predicted and residual values from the regression model are merged back to the working data set. For stations with less than 5 years of record, the predicted values are nulled, and the residual values are reassigned as the original data values.
The regression analysis predicted and residual data data set and the regression results output data set are then saved to disk.
A univariate population statistics analysis is then run on the working data set, and the population statistics for the retrieved period are saved to disk.
A univariate population statistics analysis is then run on the working data set by calendar month, and the period of record by month population statistics are saved to disk.
A univariate population statistics analysis is then run on the working data set by day of the year, and the period of record by day population statistics are saved to disk.
A univariate population statistics analysis is then run on the working data set by week of the year, and the period of record by week population statistics are saved to disk.
These four analyses are then performed on the "predicted" data set from the regression analysis. The corresponding population statistics are saved to disk.
A univariate population statistics analysis is then run on the working data set by water year, and the annual population statistics are saved to disk. This is repeated for the "predicted" data set from the regression analysis.
The data stored on the disk are then transcribed to the appropriate storage locations in the project database.
For periods shorter than the (25-year) period of record, the same analyses are carried out. However, the univariate population statistics are only computed and stored for the "period of record." The yearly, monthly, weekly, and daily statistical analyses are omitted.
The original idea was that water levels at most of the these sites will display a long-term trend with time. For this reason, the scripts that compute the population statistics are primarily concerned with generating a regression curve to describe the long-term trend line. For operational purposes and because of historical site management practices, surface-water sites are treated programatically the same as those sites for which no trend can be quantified to a reasonable confidence level (greater than 95 percent).
The population statistics used on these sites are those derived from the result sets of the regression analyses (Computation of population statistics).
While the presentation of data that have been corrected for long-term trends has proven valuable, there is also a need to present the data in their own context. In the case of periodic measurements data, it is not certain that a standardized method will reliably quantify long-term trends in the data. Because trend removal is not being carried on at periodic measurement sites, the existing period of record, including the current water year but extending no further back than the past 25 years, is used.
The population statistics used on these sites are those derived from the working data set, and not the result sets of the regression analyses (Computation of population statistics).
Helsel, D.R., and Hirsch, R.M., 1992, Statistical methods in water resources: Amsterdam, Elsevier Publishers, 529 p.
Hirsch, R.M., Slack, J.R., and Smith, R.A., 1982, Techniques of trend analysis for monthly water quality data: Water Resources Research No. 18, p. 107-121.
Hirsch, R.M., and Slack, J.R., 1984, A nonparametric trend test for seasonal data with serial dependence: Water Resources Research No. 20, p. 727-732.
Kendall, M.G., 1938, A new measure of rank correlation: Biometrika No. 30, p. 81-93.
McBride, G., 2000, Anomalies and remedies in nonparametric Seasonal Kendall trend tests and estimates: NIWA (Hamilton, NZ), March 2000. This paper is available as an Adobe PDF file (copied to local disk for bandwidth purposes), from the National Institute of Water and Atmospheric Research, Ltd. (New Zealand) Improved Statistical Methods page.
Funding for the USGS to design and maintain this site has been provided through a cooperative agreement with the South Florida Water Management District (SFWMD). Water-level conditions are monitored by the USGS with support from Federal, State, and local cooperators.