efts is an R package to access ensemble forecast time series stored (EFTS) in netCDF format. It offers convenient functions to access time series, hiding the bug-prone details of netCDF array manipulations.
EFTS netCDF data sets follow the schema described at this location at the time of writing.
The package comes with API documentation, as well as the present vignette. You can access both navigating via
?efts
The package includes rainfall data for hydrologic modelling of the upper Murray river in Australia, along the border of the states of New South Wales and Victoria. The data is downsized for inclusion in the package but otherwise realistic.
efts
opens netCDF data files by wrapping low-level ncdf4
objects and functions into an R reference object EftsDataSet
.
library(efts)
ext_data <- system.file('extdata', package='efts')
rain_file <- file.path(ext_data, 'Upper_Murray_sample_rain.nc')
stopifnot(file.exists(rain_file))
rain_dat <- open_efts(rain_file)
class(rain_dat)
#> [1] "EftsDataSet"
#> attr(,"package")
#> [1] "efts"
The default representation via print
simply uses the one of ncdf4
, which can be lengthy but remains the best informative overview.
print(rain_dat)
#> File /tmp/RtmpZLthWi/Rinst1e025790d662/efts/extdata/Upper_Murray_sample_rain.nc (NC_FORMAT_CLASSIC):
#>
#> 9 variables (excluding dimension variables):
#> double area[station]
#> standard_name: area
#> long_name: subarea area
#> units: sqm
#> double lat[station]
#> long_name: latitude
#> units: degrees_north
#> axis: y
#> double lon[station]
#> long_name: longitude
#> units: degrees_east
#> axis: x
#> double rain_der[station,time]
#> standard_name: rain_der
#> long_name: derived (from observations) rainfall
#> units: mm
#> _FillValue: -9999
#> type: 2
#> type_description: accumulated over the preceding interval
#> Location_type: Area
#> int rain_der_qual[station,time]
#> standard_name: rain_der_qual
#> long_name: derived (from observations) rainfall data quality
#> units: Quality codes
#> _FillValue: -1
#> int station_id[station]
#> long_name: station or node identification code
#> char station_name[strLen,station]
#> long_name: station or node name
#> double x[station]
#> standard_name: easting_MGA94_zone55
#> long_name: easting from the MGA94 datum in Zone 55
#> axis: x
#> double y[station]
#> standard_name: northing_MGA94_zone55
#> long_name: northing from the MGA94 datum in Zone 55
#> axis: y
#>
#> 5 dimensions:
#> station Size:3
#> ens_member Size:1
#> standard_name: ens_member
#> long_name: ensemble member
#> units: member id
#> axis: u
#> lead_time Size:1
#> standard_name: lead time
#> long_name: forecast lead time
#> axis: v
#> units:
#> time Size:8760 *** is unlimited ***
#> standard_name: time
#> long_name: time
#> time_standard: UTC
#> axis: t
#> units: hours since 1989-12-31 13:00:00 +0000
#> strLen Size:30
#>
#> 9 global attributes:
#> title: Subarea rainfall from rain gauges
#> institution: CSIRO Land & Water
#> source:
#> catchment: Upper Murray
#> STF_convention_version: 1
#> STF_nc_spec: https://wiki.csiro.au/display/wirada/NetCDF+for+SWIFT
#> comment: Rainfall calculated from inverse-distance weighting of four closest rain gauges by James Bennett.
#> Gauged rainfall data cleaned before aggregation by D. Robertson.
#>
#> Quality codes are as follows:
#> 71 - excellent data: interpolated from 4 nearest gauges
#> 72 - very good data: interpolated from 4 gauges
#> 73 - good data: interpolated from 3 gauges
#> 74 - fair data: interpolated from 2 gauges
#> 75 - poor data: interpolated from 1 gauge
#> 76 - poor data: infilled from the preceding time step
#> 255 - null data: no gauge data available for interpolation
#>
#> Gauge weightings by subarea for excellent data:
#> 1 - 0.42 * 223214a, 0.25 * 223213, 0.18 * 582016, 0.15 * 71032
#> 2 - 0.48 * 71032, 0.21 * 71042, 0.16 * 582015, 0.16 * 52
#> 3 - 0.46 * 582015, 0.26 * 71032, 0.15 * 72162, 0.13 * 52
#> history: Fri Mar 30 12:26:30 2018: ncks -d time,140266,149025 /home/per202/Data/UnitTests/Obs_data/Upper_Murray_rain_1hr.nc Upper_Murray_sample_rain.nc
#> 2014-03-19 21:18:47 +10.0 - File created
#> NCO: 4.7.3
rain_dat
has methods to discover and retrieve data in the file.
cat(sprintf("This rainfall data set has data for %s stations, the lead time dimension is '%s' because this is not forecast data\n", rain_dat$get_station_count(),
rain_dat$get_lead_time_count()))
#> This rainfall data set has data for 3 stations, the lead time dimension is '1' because this is not forecast data
rain_dat$get_variable_names()
#> [1] "area" "lat" "lon" "rain_der"
#> [5] "rain_der_qual" "station_id" "station_name" "x"
#> [9] "y"
rain_dat$get_variable_dim_names("rain_der")
#> [1] "station" "time"
rain_der
in this instance has two dimensions, but even if it had been defined as a 3 or 4 dimension data, or in different orders, the method get_all_series
just does the low-level processing to present a meaninful multivariate xts
series.
d <- rain_dat$get_all_series(variable_name = 'rain_der')
head(d)
#> Warning: timezone of object (UTC) is different than current timezone ().
#> 1 2 3
#> 2006-01-01 00:00:00 0.00000000 0.00000000 0.00000000
#> 2006-01-01 01:00:00 0.00000000 0.00000000 0.00000000
#> 2006-01-01 02:00:00 0.03022585 0.06189353 0.13274320
#> 2006-01-01 03:00:00 0.00000000 0.00000000 0.00000000
#> 2006-01-01 04:00:00 0.00000000 0.00000000 0.00000000
#> 2006-01-01 05:00:00 0.01819110 0.02752321 0.06931218
xts
may insist on warning that the “timezone of object (UTC) is different than current timezone ().”. This is normal and the series is exactly as it should be.
plot(d[1:48], main="Interpolated rainfall (mm/h)")
The data set also has quality codes that can be retrieved:
d_q <- rain_dat$get_all_series(variable_name = 'rain_der_qual')
plot(d_q, main="Interpolated rainfall quality code")
Let’s close the rainfall observation data set and look at a data set of ensemble forecast rainfall.
rain_dat$close()
ext_data <- system.file('extdata', package='efts')
ens_fcast_file <- file.path(ext_data, 'Upper_Murray_sample_ensemble_rain_fcast.nc')
stopifnot(file.exists(ens_fcast_file))
rain_fct_ens <- open_efts(ens_fcast_file)
rain_fct_ens
#> File /tmp/RtmpZLthWi/Rinst1e025790d662/efts/extdata/Upper_Murray_sample_ensemble_rain_fcast.nc (NC_FORMAT_CLASSIC):
#>
#> 3 variables (excluding dimension variables):
#> double rain_fcast_ens[lead_time,station,ens_member,time]
#> standard_name: rain_fcast_ens
#> long_name: Post processed ACCESS-G Rainfall Forecasts with Schaake Shuffle (Multiple Sites) Disaggregated from 3h
#> units: mm
#> _FillValue: -9999
#> type: 2
#> type_description: accumulated over the preceding interval
#> Location_type: Area
#> int station_id[station]
#> long_name: station or node identification code
#> char station_name[strLen,station]
#> long_name: station or node name
#>
#> 5 dimensions:
#> ens_member Size:10
#> standard_name: ens_member
#> long_name: ensemble member
#> units: member id
#> axis: u
#> lead_time Size:231
#> standard_name: lead time
#> long_name: forecast lead time
#> axis: v
#> units: hours since time of forecast
#> time Size:2 *** is unlimited ***
#> standard_name: time
#> long_name: time
#> time_standard: UTC
#> axis: t
#> units: hours since 2010-08-01 21:00:00 +0000
#> station Size:3
#> strLen Size:30
#>
#> 10 global attributes:
#> title: Post Processed ACCESS-G (APS1) Rainfall Forecasts
#> institution: CSIRO Land & Water
#> source:
#> catchment: Upper_Murray
#> STF_convention_version: 1
#> STF_nc_spec: https://wiki.csiro.au/display/wirada/NetCDF+for+SWIFT
#> comment: Data extracted from:
#> 'X:\Projects\WIRADA_4_1\Work\Activity_2\Testbed\Upper_Murray\Subareas\BJP_Calibration\F1\1h\Upper_Murray_F1_1_2010080121_shuffle.nc'
#> station_ID 7 in source file changed to station_ID 3 to compensate for catchment delineation error
#> history: Fri Mar 30 12:55:43 2018: ncks -d ens_member,1,10 /home/per202/Data/UnitTests/Fct_Data/tmp/tmp_collate.nc Upper_Murray_sample_ensemble_rain_fcast.nc
#> Fri Mar 30 12:55:22 2018: ncrcat ../Upper_Murray_F1_1_2010080121_shuffle.nc ../Upper_Murray_F1_1_2010080221_shuffle.nc tmp_collate.nc
#> 2013-08-16 12:55:01 +10.0 - File created
#> NCO: 4.7.3
#> nco_openmp_thread_number: 1
all_vars_names <- rain_fct_ens$get_variable_names()
station_ids <- rain_fct_ens$get_values("station_id")
variable_names <- "rain_fcast_ens"
issue_times <- rain_fct_ens$get_time_dim()
We can retrieve ensemble of forecasts using get_ensemble_forecasts
, only having to specify high level information such as the variable of interest, the station and the forecast issue time. The object rain_fct_ens
takes care of the lower level level information and slicing working directly with ncdf4
.
We’ll store ensemble forecasts in a list; in the future efts
may offer a more convenient structure for multidimensional time series.
ensfcasts = list()
ensfcasts[["1"]] <- rain_fct_ens$get_ensemble_forecasts(variable_names[1], station_ids[1], start_time = issue_times[1])
#> Note: method with signature 'Duration#ANY' chosen for function '*',
#> target signature 'Duration#array'.
#> "vector#structure" would also be valid
ensfcasts[["2"]] <- rain_fct_ens$get_ensemble_forecasts(variable_names[1], station_ids[1], start_time = issue_times[2])
demo_efts_plot <- function(index) {
d <- ensfcasts[[as.character(index)]]
# TODO: would be nice to have the title retrieved, but not realistic for multivariate files
dat_desc <- paste("Post Processed Forecasts mm/h",issue_times[index],lubridate::tz(issue_times)) # units: no ylab in plot.xts
plot(d[,1:3], main=dat_desc)
}
demo_efts_plot(1)
demo_efts_plot(2)
rain_fct_ens$close()