This page provides an overview of the configuration work on a Delft-FEWS system for the implementation of the Extreme Values statistics functions developed in Python. In addition to the configuration work, it also provides a description of how the functionality should be used.
Extreme Values statistics
The purpose of the extreme values functionality is to calculate and show the average recurrence interval (ARI) for discharges and water levels. The calculated statistics are generated for a user specified period, calculated on the basis of hourly time series. The extreme values module was developed in Python and is made available as a script with this page. The Python module is managed and started from the Delft-FEWS application.
Delft-FEWS configuration Extreme Values statistics
First, the time series for which the extreme values statistic will be calculated must be selected. In this example, we'll use time series for all locations for both water levels and flows as input. The time series used are all validated / processed time series that have been created by specific Delft-FEWS module instances. The following time series from a Delft-FEWS system will be used.
Parameter | LocationSet | Moduleinstance | Time step |
Q.meting | TMX_LM_Q.meting_10min | Aggregeren_uur | 1 hour |
H.meting | TMX_LM_H.meting_10min | Aggregeren_uur | 1 hour |
For the calculations of the extreme values statistic, the following Delft-FEWS configuration files are important. Since the configuration files are similar for different parameter, we only discuss the files for one parameter here.
- Workflows: \ WorkflowFiles \ Recurrence interval \ Recurrence interval_Save.xml
- Start of official workflow \ DisplayConfigFiles \ TaskRunDialog.xml
- Start of project workflow \ RegionConfigFiles \ Topology.xml
- Transformations: \ ModuleConfigFiles \ Recurrence interval \ Merge_Fill.xml
- EV Module: \ ModuleConfigFiles \ Recurrence interval \ Calculate_Recover times_Report.xml
- Reports: \ ModuleConfigFiles \ Recurrence interval \ Recurrence interval_Report_Report.xml
- Presenting characteristics: \ DisplayConfigFiles \ GridDisplay.xml
- Adjustments to parameters: \ DisplayConfigFiles \ ModifiersDisplay.xml
- Threshold levels: \ RegionConfigFiles \ Thresholds.xml
In addition to the official workflow for calculating extreme values statistics, it is also possible to perform the same calculations locally for a project. No extra configuration files or workflows have been added for these local calculations. The difference with the official calculations is that these calculations are performed on the local PC and do not write data to the Central Delft-FEWS database. Local reports are generated during the calculation, the time series produced is deleted from the temporary memory after the Delft-FEWS SA or OC has been closed. In the following sections, we will go deeper into the configuration files of the official calculations.
WorkflowFiles, Start Task and Local Tasks
A number of new workflow files have been created for the calculation of extreme values. These workflows are stored in the subfolder Recurrence intervals:
- Recurrence intervals_Report.xml
- Recurrence intervals_Report_Location.xml
- Recurrence intervals_Complete.xml
- Recurrence intervals_Waterstands.xml
- Review times_Waterstands_Location.xml
There are different workflows per parameter, and workflows for calculating a location or all locations in 1 time. The workflow Recurrence intervals_Complete.xml can be started via the 'start tasks' screen, the other workflows only via the 'Local tasks' screen. The workflows for the different parameters have the following structure:
Recurrence intervals_Report.xml
Recurrence intervals_Discharges_Location.xml
The biggest difference is that the first workflow performs the calculations for all locations of the TMX Q.metric location set. This location set has been specially added for the bandwidth calculation and contains all TMX locations. The above-mentioned workflows have been added to the RegionConfigFiles \ WorkflowDescriptors.xml . It is not intended that the new workflows be scheduled on the central Delft-FEWS server. It is advisable to update the extreme values calculations annually, thus the workflow only needs to be run once a year. This can be done manually or by scheduling the Recurrence intervals_Complete.xml workflow via the Admin Interface.
The workflow Recurrence intervals_Complete.xml is added to the 'Start Task' Delft-FEWS screen, this requires an extra element in the \ DisplayConfigFiles \ TaskRunDialog.xml file.
The extreme values calculation can be performed locally via the local Tasks screen. To make this possible, the RegionConfigFiles \ Topology.xml has been added. In this file the calculations are organized per parameter. It is possible to start the Recurrence intervals_Replacement.xml workflow so that all discharges are calculated, or the recurrence intervals_ execute_Location.xml workflow to perform the calculation per location.
The Topology specifies that the calculation must be carried out for a period of 10,000 days (coldstate) for the current time.
Per node is specified for which location to use in the calculation, property SelectedLocation is used for this.
Transformations, General Adapter and Reports
For the calculations of the extreme values statistics, three activities are carried out; this can be seen in the workflows. These activities perform the following operations.
- Transformation for merging the input sequences into 1 temporary time series.
- General Adapter for controlling the Python Extreme Values module.
- Report for generating an HTML report per location.
For these activities the configuration files have been added to the \ ModuleConfigFiles \ Recurrence intervals folder.
Merge_Fill.xml
The configuration of the transformation function follows the standard way for Delft-FEWS transformation files. The file is divided into 2 parts:
- Definition of input and output time series (variable)
- Set up the transformation function (transformation
For the input and output time series, the Discharges Time series is used as an example. Because the Q. measurement time series can come from 2 sources (TMX LM and TMX TSI), the time series are first merged into a temporary merged series. This step is necessary so that only the combined series can be worked in the following modules, which simplifies the configuration. What actually happens is that a choice is made from 1 of the 2 sources (LM or TSI locations).
The merged location set is TMX_Q.stimulation_Bandwidths , which is added to the \ RegionConfigFiles \ LocationSets.xml file.
Calculate_Recording times_Decute.xml
The Python module is called by the General Adapter; this is a generic adapter to call external modules from Delft-FEWS. The configuration of a general adapter module contains a number of activities, the most important ones are mentioned below.
In the General section the folders are designated where the Python module is running and where it can get its input and output data. Via an Id mapping file, the parameters used by Delft-FEWS are also converted to the parameters used by the Python module.
In the Activities section, the General Adapter activities to be performed are specified. These are activities to write data (time series and parameters for the Python module), activities to start the Python module and one activity to import the results (calculated time series) again and to store them in the Delft-FEWS database.
In the Execute activity, Python is started ($ PYTHON_EXE $) with the Python script as argument. The variable ($ PYTHON_EXE $) must be entered in the global.properties of the Delft-FEWS OC, FSS and SA applications.
Recurrence intervals_Report_Report.xml
The last activity is to generate an HTML report with the results of the extreme values analysis. The HTML report is made with the Delft-FEWS Report module and a report template file. The configuration of a report module consists of a declarations section and a report section. The declarations section includes the time series to be used, the layout of the tables and the location of the generated reports. The report section shows which location a report should be made for and which template to use.
As in the General Adapter module, the Report module contains a variable ($ EVREPORTSDIR $) that refers to a global.properties variable. This variable indicates in which folder the reports must be stored.
The HTML template used (\ ReportTemplateFiles \ recurrence interval_template.html ) is used for all reports and can be modified with a standard text editor.
Global.properties
For the Extreme Values calculation some variables have been added to the global.properties files of Delft-FEWS. The intention is that the following rules should be in the global.properties of a Delft-FEWS Stand Alone, OC and a FSS.
GA_DUMPFILEDIR =% REGION_HOME% / Dumpfiles
PYTHON_EXE = c: / Program Files / Anaconda2 / python.exe
EVREPORTSDIR =% REGION_HOME% / Reports
#temurally added for 215.02 Delft-FEWS release to prevent error logging. Is resolved in 2016.01 release
LOOP_LOCATION_ID = 0001
The following global.properties files from the RootconfigFiles have been modified:
- global.properties
- fss_global.properties
- citrix_global.properties
Qualfiers.xml
The \ RegionConfigFiles \ Qualifiers.xml file is used as an extra characteristic of a time series, it can be seen as a kind of sub-parameter. Qualifiers are already used in Delft-FEWS, and a number of qualifiers have been added for the extreme values calculations.
The shortName element (column 3) is used for the legend of time series in the graphs of Delft-FEWS.
Modifiers, adjusting parameters
The Delft-FEWS module to adjust parameters has the modifiers module. This module is used in Delft-FEWS to adjust the parameters of the extreme values module. These parameters are location attributes, are stored in a csv file and read in by the \ LocationSets.xml file.
The following files are important for the use of the location attributes:
- \ RootConfigFiles \ TMX.xlsm : This is the Excel file that manages the meta data of the locations in Delft-FEWS. This file will save the Recurrence intervals.csv.
- \ MapLayerFiles \ Recurrence intervals.csv : This is the location attribute file that contains all properties of the extreme values module and recurrence intervals.
- \ RegionConfigFiles \ LocationSets.xml : Reads the Recurrence intervals.csv as location attributes of the TMX locations
The \ RegionConfigFiles \ ModifierTypes.xml file indicates which location attributes may be modified and used in the Python module. This file provides the layout of the Model Parameters display that can be called via the Local Tasks.
The modifiers only work for local runs and are removed after 1 day from the local data cache of Delft-FEWS.
Thresholds and Warning levels
The result of the extreme values module are recurrence intervals of water levels and discharges. These recurrence intervals are calculated and stored in the HTML page; they are not stored in the Delft-FEWS database. The recurrence intervals can also be shown in the graphs of the time series, for this the following configuration files are used.
\ MapLayerFiles \ Recurrence intervals.csv
This file contains the recurrence intervals as warning levels. There are warning levels for discharges and water levels per location.
\ RegionConfigFiles \ LocationSets.xml
This file reads the Recurrence intervals as warning levels and links them as location attributes to the locations.
\ RegionConfigFiles \ Thresholds.xml
This file contains the warning levels (thresholds) that are used within Delft-FEWS. Three groups have been added with a number of thresholds per group with Recurrence intervals.
\ RegionConfigFiles \ ThresholdValueSets.xml
In this file the time series are linked to the thresholds and the location attributes. In the example for the water levels, the TMX LM and TSI hour time series are linked.
\ SystemConfigFiles \ TimeSeriesDisplayConfig.xml
This file should indicate how the thresholds should be presented, and what the default threshold group is that should be shown in the graphs.
Grid display
Not all results of the extreme values module are stored in the database of Delft-FEWS. The time series that are saved are visible in the Grid display; these are the results of the homogeneity tests and other characteristics of the time series that have been used.
Perform extreme values calculations in Delft-FEWS
Perform an official calculation
The calculation of the official Recurrence intervals (extreme values) workflow can be started from the Start Task tab of Delft-FEWS. A new task has been configured with the name 'Recurrence intervals complete continuous'.
After selecting the task, the start time and end time can be specified and pressing the <Execute> button will start the task. For all water level and discharge locations the extreme values analyzes are performed and the statistics are stored in the database of Delft-FEWS.
This calculation can not be performed for one or a few locations via the start tasks display, for this the local tasks functionality has been developed.
When the calculations are complete, the results can be viewed in the Grid display. The most interesting results are the extreme values charts and tables, which are available in the configured folder on the Server.
Perform an alternative calculation
For the hydrologists of the water board, a display has been added, the Local Tasks display. Via this local task display the extreme values analyzes can be performed for a single location, or for a group of locations. To perform the extreme values analysis for a single location, the Local Tasks display must be opened.
Next, a location must be selected from the list and the Run button must be pressed, that is the left button at the top of the screen.
|
When the calculation is complete, the HTML page can be opened which is stored in the \ Reports folder. The HTML page contains some tables and graphs generated by the Python module. When the standard parameters of the Python module have to be adjusted, the Model Parameters screen can be used. In this screen the standard model parameters for the selected location can be adjusted and the calculation can be performed locally. Adjusting and re-running with adjusted model parameters works as follows.
- Select a location in the Local Tasks screen
- Open the Model Parameters screen
- Change a value in the 'adjusted value' column of the Model Parameters screen. This can be eg the Maximum X-axis.
- Press the <Save> button to save the modifier to the local Delft-FEWS cache files.
- Press the Run button in the Local Tasks screen or the "Execute" button in the Model Parameters screen to start the custom calculation.
When the extreme values are calculated and the adjusted model parameters are better than the original model parameter values, the adjusted model parameter values must be adjusted in the TMX.xml file.
Use of Python
To perform the extreme values analysis, it is important to have Python installed on the computer where the calculations are performed. This can be an FSS for official calculations and a local computer or CITRIX profile for local calculations. It is important that Python version 2.7 is installed, preferably 32-bit Anaconda.
When installing Anaconda2 4.2.0 on which PC / server, the following settings must be made:
Install for all users
Install in standard folder (otherwise change reference in relevant global.properties).
Add anaconda to PATH, but do not register as a system Python (may, but possibly conflicts if a Python installation is already available)
Background for calculations performed
When executing the extreme values calculations, a number of operations and calculations are performed in the Python module. Below is a summary of these operations.
Adjust calculation period time series
The time series that is sent to the Python module covers a period of 10000 days by default; from the current system time to 10000 days for the system time. This period can be adjusted in the local Tasks screen, however it is recommended to use a long period. The Python module will read the entire supplied time series and delete all missing values at the beginning and end of the time series before it starts the analysis. The calculation period is in the extreme values graph.
Test homogeneity
Various statistical test methods are available, each using one specific characteristic that is derived from the measurement series. In this memo, the following 3 tests are performed:
Pearson t-test : test on steepness trend line.
Manning-Kendall test : comparing all pairs of annual maximums. This test makes a comparison for all possible combinations of pairs of observations. If the measurement sequence contains N values, then there are a total of N * (N-1) / 2 different combinations of pairs. For example, for a measurement series of 100 values, these are order of 5000 equations. With each comparison, it is determined whether the first measured value is larger or smaller than the other value. In a homogeneous process, both options should occur about as often. If it appears that this is clearly not the case for the test series to be tested, the null hypothesis of in-homogeneity is rejected.
Spearman rank correlation test : ranking of years. This test compares the place in the ranking (in order of magnitude) and the time at which it was observed for all N observations. For the measurement series of an in-homogeneous process, there is "expected" no correlation between the ranking of the measurements and the moment of observation. If this correlation deviates significantly from 0 for a particular measurement series, then the null hypothesis of in-homogeneity is rejected.
Wilcoxon-Mann-Whitney U test : split series into 2 parts. This test splits the measurement sequence in two and compares the median of the two (sub) series. If these differ greatly from each other, then the null hypothesis of in-homogeneity is rejected. Note that the two sub-series do not necessarily have to equal length. So if there is a suspicion of a "jump" in the measurements on, for example, 2/3 of the measurement series, it is advisable to split the sequence at the moment of the jump in this test.
Colors of Exponential fit
A number of time series characteristics are used to determine the color of the exponential fit line in the graph. The color is based on the following criteria.
Configuration files
The Delft-FEWS configuration files, including the Python scripts, as set up for a Dutch system can be downloaded here. For an explanation in Dutch using the Dutch file names and terms, please have a look at this wiki.