---
title: "Introduction to R2ucare"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to R2ucare}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

## What it does (and does not do)

The `R2ucare` package contains `R` functions to perform goodness-of-fit tests for capture-recapture models. It also has various functions to manipulate capture-recapture data.

First things first, load the `R2ucare` package:
```{r, message=FALSE, warning=FALSE}
library(R2ucare)
```

## Data formats

There are 3 main data formats when manipulating capture-recapture data, corresponding to the 3 main computer software available to fit corresponding models: `RMark`, `E-SURGE` and `Mark`. With `R2ucare`, it is easy to work with any of these formats. We will use the classical dipper dataset, which is provided with the package (thanks to Gilbert Marzolin for sharing his data).

### Read in `RMark` file

```{r, message=FALSE, warning=FALSE}
# # read in text file as described at pages 50-51 in http://www.phidot.org/software/mark/docs/book/pdf/app_3.pdf
dipper <- system.file("extdata", "dipper.txt", package = "RMark")
dipper <- RMark::import.chdata(dipper, field.names = c("ch", "sex"), header = FALSE)
dipper <- as.data.frame(table(dipper))
str(dipper)
```

Get encounter histories, counts and groups:
```{r, message=FALSE, warning=FALSE}
dip.hist <- matrix(as.numeric(unlist(strsplit(as.character(dipper$ch),""))),
                   nrow = length(dipper$ch),
                   byrow = T)
dip.freq <- dipper$Freq
dip.group <- dipper$sex
head(dip.hist)
head(dip.freq)
head(dip.group)
```

### Read in `E-SURGE` files

Let's use the `read_headed` function. 

```{r, message=FALSE, warning=FALSE}
dipper <- system.file("extdata", "ed.txt", package = "R2ucare")
dipper <- read_headed(dipper)
```

Get encounter histories, counts and groups:
```{r, message=FALSE, warning=FALSE}
dip.hist <- dipper$encounter_histories
dip.freq <- dipper$sample_size
dip.group <- dipper$groups
head(dip.hist)
head(dip.freq)
head(dip.group)
```

### Read in `Mark` files

Let's use the `read_inp` function. 

```{r, message=FALSE, warning=FALSE}
dipper <- system.file("extdata", "ed.inp", package = "R2ucare")
dipper <- read_inp(dipper, group.df = data.frame(sex = c("Male", "Female")))
```

Get encounter histories, counts and groups:
```{r, message=FALSE, warning=FALSE}
dip.hist <- dipper$encounter_histories
dip.freq <- dipper$sample_size
dip.group <- dipper$groups
head(dip.hist)
head(dip.freq)
head(dip.group)
```

## Goodness-of-fit tests for the Cormack-Jolly-Seber model

Split the dataset in females/males:
```{r, message=FALSE, warning=FALSE}
mask <- (dip.group == "Female")
dip.fem.hist <- dip.hist[mask,]
dip.fem.freq <- dip.freq[mask]
mask <- (dip.group == "Male")
dip.mal.hist <- dip.hist[mask,]
dip.mal.freq <- dip.freq[mask]
```

Tadaaaaan, now you're ready to perform Test.3Sr, Test3.Sm, Test2.Ct and Test.Cl for females:
```{r, message=FALSE, warning=FALSE}
test3sr_females <- test3sr(dip.fem.hist, dip.fem.freq)
test3sm_females <- test3sm(dip.fem.hist, dip.fem.freq)
test2ct_females <- test2ct(dip.fem.hist, dip.fem.freq)
test2cl_females <- test2cl(dip.fem.hist, dip.fem.freq)
# display results:
test3sr_females
test3sm_females
test2ct_females
test2cl_females
```

Or perform all tests at once:
```{r, message=FALSE, warning=FALSE}
overall_CJS(dip.fem.hist, dip.fem.freq)
```

What to do if these tests are significant? If you detect a transient effect, then 2 age classes should be considered on the survival probability to account for this issue, which is straightforward to do in `RMark` (Cooch and White 2017; [appendix C](http://www.phidot.org/software/mark/docs/book/pdf/app_3.pdf)) or `E-SURGE` (Choquet et al. 2009). If trap dependence is significant, [Cooch and White (2017)](http://www.phidot.org/software/mark/docs/book/pdf/app_3.pdf) illustrate how to use a time-varying individual covariate to account for this effect in `RMark`, while [Gimenez et al. (2003)](https://oliviergimenez.github.io/pubs/Gimenezetal2003BiomJ.pdf) suggest the use of multistate models that can be fitted with `RMark` or `E-SURGE`, and [Pradel and Sanz (2012)](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0032666) recommend multievent models that can be fitted in `E-SURGE` only.

Now how to assess the fit of a model including trap-dependence and/or transience? For example, let's assume we detected a significant effect of trap-dependence, we accounted for it in a model, and now we'd like to know whether our efforts paid off. Because the overall statistic is the sum of the four single components (Test.3Sr, Test3.Sm, Test2.Ct and Test.Cl), we obtain a test for the model with trap-dependence as follows:
```{r}
overall_test <- overall_CJS(dip.fem.hist, dip.fem.freq) # overall test
twoct_test <- test2ct(dip.fem.hist, dip.fem.freq) # test for trap-dependence
stat_tp <- overall_test$chi2 - twoct_test$test2ct["stat"] # overall stat - 2CT stat
df_tp <- overall_test$degree_of_freedom - twoct_test$test2ct["df"] # overall dof - 2CT dof
pvalue <- 1 - pchisq(stat_tp, df_tp) # compute p-value for null hypothesis: 
                                     # model with trap-dep fits the data well
pvalue
```


## Goodness-of-fit tests for the Arnason-Schwarz model

Read in the geese dataset. It is provided with the package (thanks to Jay Hestbeck for sharing his data). 

```{r, message=FALSE, warning=FALSE}
geese <- system.file("extdata", "geese.inp", package = "R2ucare")
geese <- read_inp(geese)
```

Get encounter histories and number of individuals with corresponding histories
```{r, message=FALSE, warning=FALSE}
geese.hist <- geese$encounter_histories
geese.freq <- geese$sample_size
```

And now, perform Test3.GSr, Test.3.GSm, Test3G.wbwa, TestM.ITEC and TestM.LTEC:
```{r, message=FALSE, warning=FALSE}
test3Gsr(geese.hist, geese.freq)
test3Gsm(geese.hist, geese.freq)
test3Gwbwa(geese.hist, geese.freq)
testMitec(geese.hist, geese.freq)
testMltec(geese.hist, geese.freq)
```

Or all tests at once:
```{r, message=FALSE, warning=FALSE}
overall_JMV(geese.hist, geese.freq)
```

If trap-dependence or transience is significant, you may account for these lacks of fit as in the Cormack-Jolly-Seber model example. If there are signs of a memory effect, it gets a bit trickier but you may fit a model to account for this issue using hidden Markov models (also known as multievent models when applied to capture-recapture data).

## Various useful functions

Several `U-CARE` functions to manipulate and process capture-recapture data can be mimicked with `R` built-in functions. For example, recoding multisite/state encounter histories in single-site/state ones is easy:
```{r, message=FALSE, warning=FALSE}
# Assuming the geese dataset has been read in R (see above):
geese.hist[geese.hist > 1] <- 1
```
So is recoding states:
```{r, message=FALSE, warning=FALSE}
# Assuming the geese dataset has been read in R (see above):
geese.hist[geese.hist == 3] <- 2 # all 3s become 2s
```
Also, reversing time is not that difficult:
```{r, message=FALSE, warning=FALSE,eval=FALSE}
# Assuming the female dipper dataset has been read in R (see above):
t(apply(dip.fem.hist, 1, rev))
```
What about cleaning data, i.e. deleting empty histories, and histories with no individuals?
```{r, message=FALSE, warning=FALSE,eval=FALSE}
# Assuming the female dipper dataset has been read in R (see above):
mask = (apply(dip.fem.hist, 1, sum) > 0 & dip.fem.freq > 0) # select non-empty histories, and histories with at least one individual
sum(!mask) # how many histories are to be dropped?
dip.fem.hist[mask,] # drop these histories from dataset
dip.fem.freq[mask] # from counts
```
Selecting or gathering occasions is as simple as that:
```{r, message=FALSE, warning=FALSE, eval=FALSE}
# Assuming the female dipper dataset has been read in R (see above):
dip.fem.hist[, c(1,4,6)] # pick occasions 1, 4 and 6 (might be a good idea to clean the resulting dataset)
gather_146 <- apply(dip.fem.hist[,c(1,4,6)], 1, max) # gather occasions 1, 4 and 6 by taking the max
dip.fem.hist[,1] <- gather_146 # replace occasion 1 by new occasion
dip.fem.hist <- dip.fem.hist[, -c(4,6)] # drop occasions 4 and 6
```
Finally, suppressing the first encounter is achieved as follows:
```{r, message=FALSE, warning=FALSE, eval=FALSE}
# Assuming the geese dataset has been read in R (see above):
for (i in 1:nrow(geese.hist)){
occasion_first_encounter <- min(which(geese.hist[i,] != 0)) # look for occasion of first encounter
geese.hist[i, occasion_first_encounter] <- 0 # replace the first non zero by a zero
}
# delete empty histories from the new dataset
mask <- (apply(geese.hist, 1, sum) > 0) # select non-empty histories
sum(!mask) # how many histories are to be dropped?
geese.hist[mask,] # drop these histories from dataset
geese.freq[mask] # from counts
```

Besides these simple manipulations, several useful `U-CARE` functions needed a proper `R` equivalent. 
I coded a few of them, see the list below and ?name-of-the-function for more details. 

* `marray` build the m-array for single-site/state capture-recapture data;
* `multimarray` build the m-array for multi-site/state capture-recapture data;
* `group_data` pool together individuals with the same encounter capture-recapture history.
* `ungroup_data` split encounter capture-recapture histories in individual ones.