How to use the R2ucare package to assess the fit of capture-recapture to data?

Olivier Gimenez

2017-04-13

What it does (and does not do)

Ths package contains R functions to perform goodness-of-fit tests for capture-recapture models. It also has various functions to manipulate capture-recapture data. Please email all suggestions for improvements, questions, comments and bugs to olivier.gimenez [AT] cefe.cnrs.fr.

For Cormack-Jolly-Seber models (single-state), we refer to Lebreton et al. (1992) and Pradel et al. (2005) for the theory. For Arnason-Schwarz models (multistate), have a look to Pradel et al. (2003). Chapter 5 of the Gentle Introduction to MARK also provides a good start for understanding goodness-of-fit test.

Warning: to date, no goodness-of-fit test exists for models with individual covariates (unless you discretize them and use groups), individual time-varying covariates (unless you treat them as states) or temporal covariates; therefore, remove these covariates from your dataset before using it with R2ucare. For groups, just treat the group separately as in the Dipper example below.

To install the package

The latest stable version of the package can be downloaded from CRAN with the R command

install.packages("unmarked")

The repository on GitHub https://github.com/oliviergimenez/R2ucare hosts the development version of the package, to install it:

if(!require(devtools)) install.packages("devtools")
library("devtools")
install_github('oliviergimenez/R2ucare')

Despite what its name might suggest, you do not need to download and install U-CARE to run the R2ucare package. This package is basically a Matlab to R translation of U-CARE (Choquet et al. 2009).

Goodness-of-fit tests for the Cormack-Jolly-Seber model

First things first, load the R2ucare package:

library(R2ucare)

Let’s begin by reading in the classical dipper using the read_inp function. Note that data with the Headed format can also be read using the function read_headed. The dataset is provided with the package (thanks to Gilbert Marzolin for sharing his data).

dipper = system.file("extdata", "ed.inp", package = "R2ucare")
dipper = read_inp(dipper,group.df=data.frame(sex=c('Male','Female')))

Get encounter histories, counts and groups:

dip.hist = dipper$encounter_histories
dip.freq = dipper$sample_size
dip.group = dipper$groups

Split the dataset in females/males:

mask = (dip.group == 'Female')
dip.fem.hist = dip.hist[mask,]
dip.fem.freq = dip.freq[mask]
mask = (dip.group == 'Male')
dip.mal.hist = dip.hist[mask,]
dip.mal.freq = dip.freq[mask]

Tadaaaaan, now you’re ready to perform Test.3Sr, Test3.Sm, Test2.Ct and Test.Cl for females:

test3sr_females = test3sr(dip.fem.hist, dip.fem.freq)
test3sm_females = test3sm(dip.fem.hist, dip.fem.freq)
# we need the m-array to perform test2ct and test2cl
test2ct_females = test2ct(dip.fem.hist, dip.fem.freq)
test2cl_females = test2cl(dip.fem.hist, dip.fem.freq)
# display results:
test3sr_females
## $test3sr
##      stat        df     p_val sign_test 
##     4.985     5.000     0.418     1.428 
## 
## $details
##   component  stat p_val signed_test  test_perf
## 1         2 0.858 0.354       0.926 Chi-square
## 2         3 3.586 0.058       1.894 Chi-square
## 3         4 0.437 0.509       0.661 Chi-square
## 4         5 0.103 0.748      -0.321 Chi-square
## 5         6 0.001 0.982       0.032 Chi-square
test3sm_females
## $test3sm
##  stat    df p_val 
## 2.041 3.000 0.564 
## 
## $details
##   component  stat df p_val test_perf
## 1         2 1.542  1 0.214    Fisher
## 2         3     0  1     1    Fisher
## 3         4 0.499  1  0.48    Fisher
## 4         5     0  0     0      None
## 5         6     0  0     0      None
test2ct_females
## $test2ct
##      stat        df     p_val sign_test 
##     3.250     4.000     0.517    -0.902 
## 
## $details
##   component dof stat p_val signed_test test_perf
## 1         2   1    0     1           0    Fisher
## 2         3   1    0     1           0    Fisher
## 3         4   1    0     1           0    Fisher
## 4         5   1 3.25 0.071      -1.803    Fisher
test2cl_females
## $test2cl
##  stat    df p_val 
##     0     0     1 
## 
## $details
##   component dof stat p_val test_perf
## 1         2   0    0     0      None
## 2         3   0    0     0      None
## 3         4   0    0     0      None

Or perform all tests at once:

overall_CJS(dip.fem.hist, dip.fem.freq)
##                           chi2 degree_of_freedom p_value
## Gof test for CJS model: 10.276                12   0.592

Goodness-of-fit tests for the Arnason-Schwarz model

Read in the geese dataset. It is provided with the package (thanks to Jay Hesbeck for sharing his data).

geese = system.file("extdata", "geese.inp", package = "R2ucare")
geese = read_inp(geese)

Get encounter histories and number of individuals with corresponding histories

geese.hist = geese$encounter_histories
geese.freq = geese$sample_size

And now, perform Test3.GSr, Test.3.GSm, Test3G.wbwa, TestM.ITEC and TestM.LTEC:

test3Gsr(geese.hist,geese.freq)
## $test3Gsr
##    stat      df   p_val 
## 117.753  12.000   0.000 
## 
## $details
##    occasion site         stat df        p_val  test_perf
## 1         2    1 3.894777e-03  1 9.502378e-01 Chi-square
## 2         2    2 2.715575e-04  1 9.868523e-01 Chi-square
## 3         2    3 8.129814e+00  1 4.354322e-03 Chi-square
## 4         3    1 1.139441e+01  1 7.366526e-04 Chi-square
## 5         3    2 2.707742e+00  1 9.986223e-02 Chi-square
## 6         3    3 3.345916e+01  1 7.277633e-09 Chi-square
## 7         4    1 1.060848e+01  1 1.125702e-03 Chi-square
## 8         4    2 3.533332e-01  1 5.522323e-01 Chi-square
## 9         4    3 1.016778e+01  1 1.429165e-03 Chi-square
## 10        5    1 1.101349e+01  1 9.045141e-04 Chi-square
## 11        5    2 1.292013e-01  1 7.192616e-01 Chi-square
## 12        5    3 2.978513e+01  1 4.826802e-08 Chi-square
test3Gsm(geese.hist,geese.freq)
## $test3Gsm
##    stat      df   p_val 
## 302.769 119.000   0.000 
## 
## $details
##    occasion site      stat df        p_val  test_perf
## 1         2    1 23.913378 14 4.693795e-02     Fisher
## 2         2    2 24.810007 16 7.324561e-02     Fisher
## 3         2    3 11.231939  8 1.889004e-01     Fisher
## 4         3    1 36.521484 14 8.712879e-04     Fisher
## 5         3    2 21.365358 17 2.103727e-01 Chi-square
## 6         3    3 23.072982 10 1.048037e-02     Fisher
## 7         4    1 55.338866  8 3.793525e-09     Fisher
## 8         4    2 17.172011 11 1.028895e-01     Fisher
## 9         4    3 45.089296 10 2.095523e-06 Chi-square
## 10        5    1  9.061514  3 2.848411e-02 Chi-square
## 11        5    2  5.974357  4 2.010715e-01 Chi-square
## 12        5    3 29.217786  4 7.060092e-06 Chi-square
test3Gwbwa(geese.hist,geese.freq)
## $test3Gwbwa
##    stat      df   p_val 
## 472.855  20.000   0.000 
## 
## $details
##    occasion site       stat df        p_val  test_perf
## 1         2    1 19.5914428  2 5.568936e-05 Chi-square
## 2         2    2 37.8676763  2 5.986026e-09 Chi-square
## 3         2    3  4.4873614  1 3.414634e-02     Fisher
## 4         3    1 80.5903050  1 2.777187e-19 Chi-square
## 5         3    2 98.7610833  4 1.805369e-20 Chi-square
## 6         3    3  0.8071348  1 3.689687e-01     Fisher
## 7         4    1 27.7054638  1 1.412632e-07 Chi-square
## 8         4    2 53.6936048  2 2.190695e-12 Chi-square
## 9         4    3 25.2931602  1 4.924519e-07 Chi-square
## 10        5    1 43.6547442  1 3.917264e-11 Chi-square
## 11        5    2 50.9264976  2 8.738795e-12 Chi-square
## 12        5    3 29.4763896  2 3.974507e-07 Chi-square
testMitec(geese.hist,geese.freq)
## $testMitec
##   stat     df  p_val 
## 68.224 27.000  0.000 
## 
## $details
##   occasion     stat df        p_val  test_perf
## 1        2 14.26712  9 0.1131354478 Chi-square
## 2        3 30.83835  9 0.0003155366 Chi-square
## 3        4 23.11879  9 0.0059349648 Chi-square
testMltec(geese.hist,geese.freq)
## $testMltec
##   stat     df  p_val 
## 20.989 19.000  0.337 
## 
## $details
##   occasion      stat df     p_val  test_perf
## 1        2 14.102324 10 0.1683752 Chi-square
## 2        3  6.886312  9 0.6489547 Chi-square

Or all tests at once:

overall_JMV(geese.hist,geese.freq)
##                            chi2 degree_of_freedom p_value
## Gof test for JMV model: 982.589               197       0

Various useful functions

Several U-CARE functions to manipulate and process capture-recapture data can be mimicked with R built-in functions. For example, recoding multisite/state encounter histories in single-site/state ones is easy:

# Assuming the geese dataset has been read in R (see above):
geese.hist[geese.hist>1] = 1

So is recoding states:

# Assuming the geese dataset has been read in R (see above):
geese.hist[geese.hist==3]=2 # all 3s become 2s

Also, reversing time is not that difficult:

# Assuming the female dipper dataset has been read in R (see above):
t(apply(dip.fem.hist,1,rev))

What about cleaning data, i.e. deleting empty histories, and histories with no individuals?

# Assuming the female dipper dataset has been read in R (see above):
mask = (apply(dip.fem.hist,1,sum)>0 & dip.fem.freq>0) # select non-empty histories, and histories with at least one individual
sum(!mask) # how many histories are to be dropped?
dip.fem.hist[mask,] # drop these histories from dataset
dip.fem.freq[mask] # from counts

Selecting or gathering occasions is as simple as that:

# Assuming the female dipper dataset has been read in R (see above):
dip.fem.hist[,c(1,4,6)] # pick occasions 1, 4 and 6 (might be a good idea to clean the resulting dataset)
gather_146 = apply(dip.fem.hist[,c(1,4,6)],1,max) # gather occasions 1, 4 and 6 by taking the max
dip.fem.hist[,1] = gather_146 # replace occasion 1 by new occasion
dip.fem.hist = dip.fem.hist[,-c(4,6)] # drop occasions 4 and 6

Finally, suppressing the first encounter is achieved as follows:

# Assuming the geese dataset has been read in R (see above):
for (i in 1:nrow(geese.hist)){
occasion_first_encounter = min(which(geese.hist[i,]!=0)) # look for occasion of first encounter
geese.hist[i,occasion_first_encounter] = 0 # replace the first non zero by a zero
}
# delete empty histories from the new dataset
mask = (apply(geese.hist,1,sum)>0) # select non-empty histories
sum(!mask) # how many histories are to be dropped?
geese.hist[mask,] # drop these histories from dataset
geese.freq[mask] # from counts

The R packages plyr, dplyr and tidyr might also be very useful to work with capture-recapture datasets.

Besides these simple manipulations, several useful U-CARE functions need a proper R equivalent. I have coded a few of them, see the list below and ?name-of-the-function for more details.

References