hddtools: Hydrological Data Discovery Tools

Claudia Vitolo

2020-05-25

Introduction

hddtools stands for Hydrological Data Discovery Tools. This R package is an open source project designed to facilitate access to a variety of online open data sources relevant for hydrologists and, in general, environmental scientists and practitioners.

This typically implies the download of a metadata catalogue, selection of information needed, formal request for dataset(s), de-compression, conversion, manual filtering and parsing. All those operation are made more efficient by re-usable functions.

Depending on the data license, functions can provide offline and/or online modes. When redistribution is allowed, for instance, a copy of the dataset is cached within the package and updated twice a year. This is the fastest option and also allows offline use of package’s functions. When re-distribution is not allowed, only online mode is provided.

Installation

Get the released version from CRAN:

Or the development version from github using devtools:

Load the hddtools package:

Data sources and Functions

The functions provided can retrieve hydrological information from a variety of data providers. To filter the data, it is advisable to use the package dplyr.

The Koppen Climate Classification map

The Koppen Climate Classification is the most widely used system for classifying the world’s climates. Its categories are based on the annual and monthly averages of temperature and precipitation. It was first updated by Rudolf Geiger in 1961, then by Kottek et al. (2006), Peel et al. (2007) and then by Rubel et al. (2010).

The package hddtools contains a function to identify the updated Koppen-Greiger climate zone, given a bounding box.

The Global Runoff Data Centre

The Global Runoff Data Centre (GRDC) is an international archive hosted by the Federal Institute of Hydrology in Koblenz, Germany. The Centre operates under the auspices of the World Meteorological Organisation and retains services and datasets for all the major rivers in the world. Catalogue, kml files and the product Long-Term Mean Monthly Discharges are open data and accessible via the hddtools.

Information on all the GRDC stations can be retrieved using the function catalogueGRDC with no input arguments, as in the examle below:

It is advisable to use the package dplyr for convenient filtering, some examples are provided below.

The GRDC catalogue (or a subset) can be used to create a map.

Top-Down modelling Working Group (Data60UK and MOPEX)

The Top-Down modelling Working Group (TDWG) for the Prediction in Ungauged Basins (PUB) Decade (2003-2012) is an initiative of the International Association of Hydrological Sciences (IAHS) which collected datasets for hydrological modelling free-of-charge, available here. This package provides a common interface to retrieve, browse and filter information.

The Data60UK dataset

The Data60UK initiative collated datasets of areal precipitation and streamflow discharge across 61 gauging sites in England and Wales (UK). The database was prepared from source databases for research purposes, with the intention to make it re-usable. This is now available in the public domain free of charge.

The hddtools contain two functions to interact with this database: one to retreive the catalogue and another to retreive time series of areal precipitation and streamflow discharge.

SEPA river level data

The Scottish Environment Protection Agency (SEPA) manages river level data for hundreds of gauging stations in the UK. The catalogue of stations is derived from the list here: https://www2.sepa.org.uk/waterlevels/CSVs/SEPA_River_Levels_Web.csv.

The time series of the last few days is available from SEPA website and can be downloaded using the following function:


# Get only catchments with area above 4000 Km2
SEPA_catalogue %>%
  filter(CATCHMENT_AREA >= 4000)
#>   SEPA_HYDROLOGY_OFFICE STATION_NAME LOCATION_CODE NATIONAL_GRID_REFERENCE
#> 1                 Perth        Perth         10048            NO1160525332
#> 2                 Perth    Ballathie         14937            NO1475036680
#> 3            Galashiels       Norham          9514            NT8983647709
#>   CATCHMENT_NAME RIVER_NAME GAUGE_DATUM CATCHMENT_AREA   START_DATE
#> 1           <NA>        Tay      2.0800         4991.0  August 1991
#> 2           <NA>        Tay     26.2000         4587.1 October 1952
#> 3           <NA>      Tweed      2.8796         4390.0    June 1959
#>              END_DATE SYSTEM_ID LOWEST_VALUE   LOW MAX_VALUE  HIGH
#> 1 2020-05-25 07:45:00  58156010        0.000 0.165     4.928 3.504
#> 2 2020-05-25 08:45:00  71098010       -0.023 0.281     7.099 4.377
#> 3 2020-05-25 07:15:00  91922010       -0.100 0.917     7.036 4.714
#>                    MAX_DISPLAY  MEAN UNITS WEB_MESSAGE
#> 1 4.928m @ 17/01/1993 19:30:00 0.887     m        <NA>
#> 2 7.099m @ 17/01/1993 18:15:00 1.192     m        <NA>
#> 3 7.036m @ 22/10/2002 20:45:00 1.101     m        <NA>
#>                                        NRFA_LINK
#> 1 https://nrfa.ceh.ac.uk/data/station/info/15042
#> 2 https://nrfa.ceh.ac.uk/data/station/info/15006
#> 3 https://nrfa.ceh.ac.uk/data/station/info/21009

# Get only catchments within river Ayr
SEPA_catalogue %>%
  filter(RIVER_NAME == "Ayr")
#>   SEPA_HYDROLOGY_OFFICE STATION_NAME LOCATION_CODE NATIONAL_GRID_REFERENCE
#> 1         East Kilbride      Catrine        133071            NS5252225909
#> 2         East Kilbride     Mainholm        133111            NS3617921634
#> 3         East Kilbride     Wellwood        133135            NS6598926182
#> 4         East Kilbride  Limmerhaugh        542772            NS6187226954
#>   CATCHMENT_NAME RIVER_NAME GAUGE_DATUM CATCHMENT_AREA   START_DATE
#> 1           <NA>        Ayr      86.660         166.30 October 1970
#> 2           <NA>        Ayr       2.881         574.00   March 1978
#> 3           <NA>        Ayr     191.762          60.00 October 1991
#> 4           <NA>        Ayr     168.711         126.34 October 2017
#>              END_DATE SYSTEM_ID LOWEST_VALUE   LOW MAX_VALUE  HIGH
#> 1 2020-05-25 07:30:00  64197010        0.032 0.088     2.993 2.045
#> 2 2020-05-25 07:15:00  65712010        0.154 0.259     5.254 3.937
#> 3 2020-05-25 09:00:00  66627010        0.291 0.368     2.285 1.815
#> 4 2020-05-25 03:30:00 143188010        0.515 0.572     2.691 2.458
#>                    MAX_DISPLAY  MEAN UNITS WEB_MESSAGE
#> 1 2.993m @ 10/12/1994 22:15:00 0.314     m        <NA>
#> 2 5.254m @ 02/01/1981 18:00:00 0.755     m        <NA>
#> 3 2.285m @ 16/01/2010 11:30:00 0.559     m        <NA>
#> 4 2.691m @ 11/08/2019 23:00:00 0.781     m        <NA>
#>                                        NRFA_LINK
#> 1 https://nrfa.ceh.ac.uk/data/station/info/83003
#> 2 https://nrfa.ceh.ac.uk/data/station/info/83006
#> 3 https://nrfa.ceh.ac.uk/data/station/info/83011
#> 4

Plese note that these data are updated every 15 minutes and the code will always generate different plots.