basics

library(foieGras)

Disclaimer

this vignette is an extended set of examples to highlight foieGras’s functionality. Please, do NOT interpret these examples as instructions for conducting analysis of animal movement data. Numerous essential steps in a proper analysis have been left out of this document. It is your job to understand your data, ensure you are asking the right questions of your data, and that the analyses you undertake appropriately reflect those questions. We can not do this for you!

foieGras models

this vignette provides a (very) brief overview of how to use foieGras to filter animal track locations obtained via the Argos satellite system or via processed light-level geolocation (GLS). foieGras provides two state-space models (SSM’s) for filtering (ie. estimating “true” locations and associated movement model parameters, while accounting for error-prone observations):

both models are continuous-time models, that is, they account for the time intervals between successive observations, thereby naturally accounting for the commonly irregularly-timed nature of animal tracking data. We won’t dwell on the details of the models here (see Jonsen et al. 2020 for details on the crw model), except to say there may be advantages to choosing one over the other in certain circumstances. The Random Walk model tends not to deal well with small to moderate gaps (relative to a specified time step) in observed locations and can over-fit to particularly noisy data. The Correlated Random Walk model can often deal better with these small to moderate data gaps and appropriately smooth through noisy data but tends to estimate nonsensical movement through larger data gaps.

additionally, foieGras provides fast models (mpm, jmpm) for estimating a behavioural index along animals’ tracks (see Jonsen et al. 2019 for details). The mpm is fit to individual tracks, whereas the jmpm is fit to multiple tracks simultaneously with a variance parameter that is estimated jointly across the tracks. This latter model can often better resolve subtle changes in movement behaviour along tracks that lack much contrast in movements. Now, both models can be fit to time-regularized locations (discrete-time models) or to time-irregular locations (continuous-time models). See Auger-Méthé et al. 2017 for an example of the latter.

input data

foieGras expects data to be provided in one of several possible formats.

  1. a data.frame or tibble that looks like this
#>      id                date lc      lon       lat  smaj smin eor
#> 1 54591 2012-03-05 05:09:33  1 110.5707 -66.42752  2442  416  42
#> 2 54591 2012-03-06 04:55:14  0 110.3402 -66.39579 49660  391  90
#> 3 54591 2012-03-07 04:23:10  A 110.4778 -66.45266  5032  472  93
#> 4 54591 2012-03-07 21:23:06  A 110.3749 -66.39622  4007  286 116
#> 5 54591 2012-03-09 04:27:49  B 110.4732 -66.48743 13063  956  82
#> 6 54591 2012-03-10 00:10:41  A 110.5014 -66.43516  5099  478  79

where the Argos data are provided via CLS Argos’ Kalman filter model (KF) and include error ellipse information for each observed location.

  1. a data.frame or tibble that looks like this
#>          id                date lc   lon    lat
#> 1 ct36-F-09 2009-02-10 19:42:44  A 70.61 -49.67
#> 2 ct36-F-09 2009-02-11 07:56:36  A 70.18 -50.18
#> 3 ct36-F-09 2009-02-12 01:53:07  A 70.13 -51.07
#> 4 ct36-F-09 2009-02-12 19:06:55  B 69.51 -51.96
#> 5 ct36-F-09 2009-02-13 12:13:19  B 70.95 -53.06
#> 6 ct36-F-09 2009-02-14 01:10:58  B 70.09 -53.37

where the Argos data are provided via CLS Argos’ Least-Squares model (LS) and do not include error ellipse information.

  1. a data.frame or tibble that includes observations with missing KF error ellipse information
#>      id                date lc      lon       lat  smaj smin eor
#> 1 54591 2012-03-05 05:09:33  1 110.5707 -66.42752  2442  416  42
#> 2 54591 2012-03-06 04:55:14  0 110.3402 -66.39579 49660  391  90
#> 3 54591 2012-03-07 04:23:10  A 110.4778 -66.45266    NA   NA  NA
#> 4 54591 2012-03-07 21:23:06  A 110.3749 -66.39622    NA   NA  NA
#> 5 54591 2012-03-09 04:27:49  B 110.4732 -66.48743    NA   NA  NA
#> 6 54591 2012-03-10 00:10:41  A 110.5014 -66.43516  5099  478  79

in this situation, foieGras treats observations with missing error ellipse information as though they are LS-based observations.

  1. an sf-tibble where observations have any of the previous 3 structures and also include CRS information
#>      id                date lc  smaj smin eor                   geometry
#> 1 54591 2012-03-05 05:09:33  1  2442  416  42 POINT (2430.943 -912.3109)
#> 2 54591 2012-03-06 04:55:14  0 49660  391  90  POINT (2437.96 -903.7763)
#> 3 54591 2012-03-07 04:23:10  A  5032  472  93 POINT (2429.753 -907.3739)
#> 4 54591 2012-03-07 21:23:06  A  4007  286 116 POINT (2437.367 -905.2335)
#> 5 54591 2012-03-09 04:27:49  B 13063  956  82 POINT (2426.138 -905.8045)
#> 6 54591 2012-03-10 00:10:41  A  5099  478  79 POINT (2431.234 -909.0692)
  1. a data.frame, tibble or sf-tibble where processed GLS data are provided and include longitude and latitude error SD’s (in degrees). In this case, the lc class is set to GL for all GLS locations.
#>      id                date lc   lon lat    lonerr    laterr
#> 1 54632 2020-07-22 18:16:50 GL 100.0 -55 0.3018386 5.0810790
#> 2 54632 2020-07-23 06:16:50 GL 100.5 -54 0.2621930 1.0652059
#> 3 54632 2020-07-23 18:16:50 GL 101.0 -53 0.3423596 0.5170783
#> 4 54632 2020-07-24 06:16:50 GL 101.5 -52 0.1068469 2.8934039
#> 5 54632 2020-07-24 18:16:50 GL 102.0 -51 0.1248569 0.8679292
  1. a data.frame, tibble or sf-tibble where GPS data are provided. In this case, the lc class is set to G for all GPS locations.
#>         id                date lc  lon   lat
#> 1 F02-B-17 2020-07-22 18:16:50  G 70.1 -49.2
#> 2 F02-B-17 2020-07-22 19:16:50  G 70.6 -48.2
#> 3 F02-B-17 2020-07-22 20:16:50  G 71.1 -47.2
#> 4 F02-B-17 2020-07-22 21:16:50  G 71.6 -46.2
#> 5 F02-B-17 2020-07-22 22:16:50  G 72.1 -45.2
  1. a data.frame, tibble or sf-tibble where any combination of Argos, GLS or GPS locations can be intermixed - though, most typically this would be a combination of Argos and GPS locations.
#>         id                date lc  lon   lat  smaj smin eor
#> 1 F02-B-17 2017-09-17 05:20:00  G 70.1 -49.2    NA   NA  NA
#> 2 F02-B-17 2017-10-04 14:35:01  2 70.2 -49.1  1890   45  77
#> 3 F02-B-17 2017-10-05 04:03:25  G 70.1 -49.3    NA   NA  NA
#> 4 F02-B-17 2017-10-05 06:28:20  A 71.1 -48.7 28532 1723 101
#> 5 F02-B-17 2017-10-05 10:21:18  B 70.8 -48.5 45546 3303  97

fitting a foieGras model

model fitting for quality control of locations is comprised of 2 steps: a prefilter step where a number of checks are made on the input data (see ?foieGras::fit_ssm for details), including applying the trip::sda filter to identify extreme outlier observations. Additionally, if the input data are not supplied as an sf object,the prefilter guesses at an appropriate projection (typically world mercator, EPSG 3395) to apply to the data. The SSM is then fit to this projected version of the data. Users invoke this process via the fit_ssm function:

## load foieGras example data - 2 southern elephant seals
data("ellies")
## prefilter and fit Random Walk SSM using a 24 h time step
fit <- fit_ssm(ellies, model = "crw", time.step = 24, verbose = 0)

these are the minimum arguments required: the input data, the model (rw or crw) and the time.step (in h) to which locations are predicted (the argument verbose = 0 is included for vignette tidyness). Additional control can be exerted over the prefiltering step, via the vmax, ang, distlim, spdf and min.dt arguments. see ?foieGras::fit_ssm for details, the defaults for these arguments are quite conservative (for non-flying species), usually leading to relative few observations being flagged to be ignored by the SSM. Additional control over the SSM fitting step can also be exerted but these should rarely need to be accessed by users and will not be dealt with here.

fit_ssm can be applied to single tracks or to multiple tracks, as shown above. The SSM is fit to each individual separately and the resulting output is a compound tibble with rows corresponding to each individual foieGras fit object. The converged column indicates whether each model fit converged successfully.

## list fit outcomes for both seals
fit
#> # A tibble: 2 x 5
#>   id         ssm          converged pdHess pmodel
#>   <chr>      <named list> <lgl>     <lgl>  <chr> 
#> 1 ct36-F-09  <ssm [15]>   TRUE      TRUE   crw   
#> 2 ct96-16-13 <ssm [15]>   TRUE      TRUE   crw

individual id is displayed in the 1st column, all fit output (ssm) in the 2nd column, convergence status (whether the optimizer found a global minimum) of each model fit is displayed in the 3rd column, whether the Hessian matrix was positive-definite and could be solved to obtain standard errors (pdHess) is displayed in the 4th column, and the specified process model (rw or crw) in the 4th column. In some cases, the optimizer will converge but the Hessian matrix is not positive-definite, which typically indicates the optimizer converged on a local minimum. In this case, some standard errors can often be calculated but not all. One possible solution is to try specifying a longer time.step or set time.step = NA to turn off predictions and return only fitted values (location estimates at the pre-filtered observation times). If pdHess = FALSE persists then careful inspection of the supplied data is warranted to determine if suspect observations not identified by prefilter are present. The excellent glmmTMB troubleshooting vignette may also provide hints at solutions.

accessing and visualizing model fit objects

simple summary information about the foieGras fit can be obtained by calling the fit object:

fit$ssm[[1]]
#> Process model: crw 
#> Time interval: 24 hours 
#> number of observations: 117 
#> number of regularised state estimates: 112 
#> 
#> parameter estimates
#> -------------------
#>       Estimate Std. Error
#> D      0.05573    0.00789
#> rho_o  0.11291    0.15161
#> tau_x  1.29488    0.12246
#> tau_y  0.68194    0.07928
#> -------------------
#> negative log-likelihood: 1257.149 
#> convergence: yes

the table lists all estimated parameters, the specific ones listed depend on the process model selected and the data type. Here, sigma_x and sigma_y are the process error standard deviations in the x and y directions, rho_p is the correlation between sigma_x and sigma_y, tau_x and tau_y are the observation error standard deviations, and rho_o is the correlation between them. The Std. Error column lists the standard errors, calculated via the Delta method (see TMB documentation for details), for each estimated parameter.

a summary plot method allows a quick visual of the SSM fit to the data:

# plot time-series of the predicted values
plot(fit, what = "predicted", type = 1)

plot(fit, what = "fitted", type = 2)

the predicted values (red) are the state estimates predicted at regular time intervals, specified by time.step (here every 24 h). These estimates are plotted on top of the observations that passed the prefilter stage (blue points and blue rug at bottom). Fitted values are the state estimates corresponding to the time of each observation; their time series are plotted by default - plot(fit). A 2-D time series plot of the track is invoked by the argument type = 2.