Introduction to the efts R package

Sample data

The package includes rainfall data for hydrologic modelling of the upper Murray river in Australia, along the border of the states of New South Wales and Victoria. The data is downsized for inclusion in the package but otherwise realistic.

efts opens netCDF data files by wrapping low-level ncdf4 objects and functions into an R reference object EftsDataSet.

library(efts)
ext_data <- system.file('extdata', package='efts')
rain_file <- file.path(ext_data, 'Upper_Murray_sample_rain.nc')
stopifnot(file.exists(rain_file))

rain_dat <- open_efts(rain_file)
class(rain_dat)

#> [1] "EftsDataSet"
#> attr(,"package")
#> [1] "efts"

The default representation via print simply uses the one of ncdf4, which can be lengthy but remains the best informative overview.

print(rain_dat)

#> File /tmp/RtmpZLthWi/Rinst1e025790d662/efts/extdata/Upper_Murray_sample_rain.nc (NC_FORMAT_CLASSIC):
#> 
#>      9 variables (excluding dimension variables):
#>         double area[station]   
#>             standard_name: area
#>             long_name: subarea area
#>             units: sqm
#>         double lat[station]   
#>             long_name: latitude
#>             units: degrees_north
#>             axis: y
#>         double lon[station]   
#>             long_name: longitude
#>             units: degrees_east
#>             axis: x
#>         double rain_der[station,time]   
#>             standard_name: rain_der
#>             long_name: derived (from observations) rainfall
#>             units: mm
#>             _FillValue: -9999
#>             type: 2
#>             type_description: accumulated over the preceding interval
#>             Location_type: Area
#>         int rain_der_qual[station,time]   
#>             standard_name: rain_der_qual
#>             long_name: derived (from observations) rainfall data quality
#>             units: Quality codes
#>             _FillValue: -1
#>         int station_id[station]   
#>             long_name: station or node identification code
#>         char station_name[strLen,station]   
#>             long_name: station or node name
#>         double x[station]   
#>             standard_name: easting_MGA94_zone55
#>             long_name: easting from the MGA94 datum in Zone 55
#>             axis: x
#>         double y[station]   
#>             standard_name: northing_MGA94_zone55
#>             long_name: northing from the MGA94 datum in Zone 55
#>             axis: y
#> 
#>      5 dimensions:
#>         station  Size:3
#>         ens_member  Size:1
#>             standard_name: ens_member
#>             long_name: ensemble member
#>             units: member id
#>             axis: u
#>         lead_time  Size:1
#>             standard_name: lead time
#>             long_name: forecast lead time
#>             axis: v
#>             units: 
#>         time  Size:8760   *** is unlimited ***
#>             standard_name: time
#>             long_name: time
#>             time_standard: UTC
#>             axis: t
#>             units: hours since 1989-12-31 13:00:00 +0000
#>         strLen  Size:30
#> 
#>     9 global attributes:
#>         title: Subarea rainfall from rain gauges
#>         institution: CSIRO Land & Water
#>         source: 
#>         catchment: Upper Murray
#>         STF_convention_version: 1
#>         STF_nc_spec: https://wiki.csiro.au/display/wirada/NetCDF+for+SWIFT
#>         comment: Rainfall calculated from inverse-distance weighting of four closest rain gauges by James Bennett.
#> Gauged rainfall data cleaned before aggregation by D. Robertson.
#> 
#> Quality codes are as follows:
#> 71 - excellent data: interpolated from 4 nearest gauges
#> 72 - very good data: interpolated from 4 gauges
#> 73 - good data: interpolated from 3 gauges
#> 74 - fair data: interpolated from 2 gauges
#> 75 - poor data: interpolated from 1 gauge
#> 76 - poor data: infilled from the preceding time step
#> 255 - null data: no gauge data available for interpolation
#> 
#> Gauge weightings by subarea for excellent data:
#> 1 - 0.42 * 223214a, 0.25 * 223213, 0.18 * 582016, 0.15 * 71032
#> 2 - 0.48 * 71032, 0.21 * 71042, 0.16 * 582015, 0.16 * 52
#> 3 - 0.46 * 582015, 0.26 * 71032, 0.15 * 72162, 0.13 * 52
#>         history: Fri Mar 30 12:26:30 2018: ncks -d time,140266,149025 /home/per202/Data/UnitTests/Obs_data/Upper_Murray_rain_1hr.nc Upper_Murray_sample_rain.nc
#> 2014-03-19 21:18:47 +10.0 - File created
#>         NCO: 4.7.3

rain_dat has methods to discover and retrieve data in the file.

cat(sprintf("This rainfall data set has data for %s stations, the lead time dimension is '%s' because this is not forecast data\n", rain_dat$get_station_count(),
rain_dat$get_lead_time_count()))

#> This rainfall data set has data for 3 stations, the lead time dimension is '1' because this is not forecast data

rain_dat$get_variable_names()

#> [1] "area"          "lat"           "lon"           "rain_der"     
#> [5] "rain_der_qual" "station_id"    "station_name"  "x"            
#> [9] "y"

rain_dat$get_variable_dim_names("rain_der")

#> [1] "station" "time"

rain_der in this instance has two dimensions, but even if it had been defined as a 3 or 4 dimension data, or in different orders, the method get_all_series just does the low-level processing to present a meaninful multivariate xts series.

d <- rain_dat$get_all_series(variable_name = 'rain_der')
head(d)

#> Warning: timezone of object (UTC) is different than current timezone ().

#>                              1          2          3
#> 2006-01-01 00:00:00 0.00000000 0.00000000 0.00000000
#> 2006-01-01 01:00:00 0.00000000 0.00000000 0.00000000
#> 2006-01-01 02:00:00 0.03022585 0.06189353 0.13274320
#> 2006-01-01 03:00:00 0.00000000 0.00000000 0.00000000
#> 2006-01-01 04:00:00 0.00000000 0.00000000 0.00000000
#> 2006-01-01 05:00:00 0.01819110 0.02752321 0.06931218

xts may insist on warning that the “timezone of object (UTC) is different than current timezone ().”. This is normal and the series is exactly as it should be.

plot(d[1:48], main="Interpolated rainfall (mm/h)")

The data set also has quality codes that can be retrieved:

d_q <- rain_dat$get_all_series(variable_name = 'rain_der_qual')
plot(d_q, main="Interpolated rainfall quality code")

Let’s close the rainfall observation data set and look at a data set of ensemble forecast rainfall.

rain_dat$close()

ext_data <- system.file('extdata', package='efts')
ens_fcast_file <- file.path(ext_data, 'Upper_Murray_sample_ensemble_rain_fcast.nc')
stopifnot(file.exists(ens_fcast_file))
rain_fct_ens <- open_efts(ens_fcast_file)
rain_fct_ens

#> File /tmp/RtmpZLthWi/Rinst1e025790d662/efts/extdata/Upper_Murray_sample_ensemble_rain_fcast.nc (NC_FORMAT_CLASSIC):
#> 
#>      3 variables (excluding dimension variables):
#>         double rain_fcast_ens[lead_time,station,ens_member,time]   
#>             standard_name: rain_fcast_ens
#>             long_name: Post processed ACCESS-G Rainfall Forecasts with Schaake Shuffle (Multiple Sites) Disaggregated from 3h
#>             units: mm
#>             _FillValue: -9999
#>             type: 2
#>             type_description: accumulated over the preceding interval
#>             Location_type: Area
#>         int station_id[station]   
#>             long_name: station or node identification code
#>         char station_name[strLen,station]   
#>             long_name: station or node name
#> 
#>      5 dimensions:
#>         ens_member  Size:10
#>             standard_name: ens_member
#>             long_name: ensemble member
#>             units: member id
#>             axis: u
#>         lead_time  Size:231
#>             standard_name: lead time
#>             long_name: forecast lead time
#>             axis: v
#>             units: hours since time of forecast
#>         time  Size:2   *** is unlimited ***
#>             standard_name: time
#>             long_name: time
#>             time_standard: UTC
#>             axis: t
#>             units: hours since 2010-08-01 21:00:00 +0000
#>         station  Size:3
#>         strLen  Size:30
#> 
#>     10 global attributes:
#>         title: Post Processed ACCESS-G (APS1) Rainfall Forecasts
#>         institution: CSIRO Land & Water
#>         source: 
#>         catchment: Upper_Murray
#>         STF_convention_version: 1
#>         STF_nc_spec: https://wiki.csiro.au/display/wirada/NetCDF+for+SWIFT
#>         comment: Data extracted from:
#> 'X:\Projects\WIRADA_4_1\Work\Activity_2\Testbed\Upper_Murray\Subareas\BJP_Calibration\F1\1h\Upper_Murray_F1_1_2010080121_shuffle.nc'
#> station_ID 7 in source file changed to station_ID 3 to compensate for catchment delineation error
#>         history: Fri Mar 30 12:55:43 2018: ncks -d ens_member,1,10 /home/per202/Data/UnitTests/Fct_Data/tmp/tmp_collate.nc Upper_Murray_sample_ensemble_rain_fcast.nc
#> Fri Mar 30 12:55:22 2018: ncrcat ../Upper_Murray_F1_1_2010080121_shuffle.nc ../Upper_Murray_F1_1_2010080221_shuffle.nc tmp_collate.nc
#> 2013-08-16 12:55:01 +10.0 - File created
#>         NCO: 4.7.3
#>         nco_openmp_thread_number: 1

all_vars_names <- rain_fct_ens$get_variable_names()
station_ids <- rain_fct_ens$get_values("station_id")
variable_names <- "rain_fcast_ens"
issue_times <- rain_fct_ens$get_time_dim()

We can retrieve ensemble of forecasts using get_ensemble_forecasts, only having to specify high level information such as the variable of interest, the station and the forecast issue time. The object rain_fct_ens takes care of the lower level level information and slicing working directly with ncdf4.

We’ll store ensemble forecasts in a list; in the future efts may offer a more convenient structure for multidimensional time series.

ensfcasts = list()
ensfcasts[["1"]] <- rain_fct_ens$get_ensemble_forecasts(variable_names[1], station_ids[1], start_time = issue_times[1])

#> Note: method with signature 'Duration#ANY' chosen for function '*',
#>  target signature 'Duration#array'.
#>  "vector#structure" would also be valid

ensfcasts[["2"]] <- rain_fct_ens$get_ensemble_forecasts(variable_names[1], station_ids[1], start_time = issue_times[2])

demo_efts_plot <- function(index) {
  d <- ensfcasts[[as.character(index)]]
  # TODO: would be nice to have the title retrieved, but not realistic for multivariate files
  dat_desc <- paste("Post Processed Forecasts mm/h",issue_times[index],lubridate::tz(issue_times)) # units: no ylab in plot.xts
  plot(d[,1:3], main=dat_desc)
}

demo_efts_plot(1)

demo_efts_plot(2)

rain_fct_ens$close()

Introduction to the ‘efts’ R package

Jean-Michel Perraud

2018-04-26

Overview

Sample data