---
title: Community occupancy models with occuComm
author: Ken Kellner
date: October 25, 2024
bibliography: unmarked.bib
csl: ecology.csl
output: 
  rmarkdown::html_vignette:
    fig_width: 5
    fig_height: 3.5
    number_sections: true
    toc: true
vignette: >
  %\VignetteIndexEntry{Community occupancy models with occuComm}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---



# Introduction

In a community occupancy model, detection-nondetection data from multiple species are analyzed together.
Intercepts and slopes in the model are species-specific and come from common distributions, allowing for information sharing across species.
This structure also allows estimation of site richness.
For example, suppose you have sites indexed $i$, occasions $j$ and species $s$.
The true occupancy state at a site is

$$z_{is} \sim \mathrm{Bernoulli}(\psi_{is})$$

with detection data $y_{ijs}$ modeled as

$$y_{ijs} \sim \mathrm{Bernoulli}(p_{ijs} \cdot z_{is}) $$

Occupancy probability $\psi_{is}$ can be modeled as a function of covariates with species-specific random intercepts and slopes coming from common distributions:

$$\psi_{is} = \mathrm{logit}(\beta_{0,s} + \beta_{i,s} \cdot x_i) $$

$$ \beta_{0,s} \sim \mathrm{Normal}(\mu_{\beta_0}, \sigma_{\beta_0}) $$

$$ \beta_{1,s} \sim \mathrm{Normal}(\mu_{\beta_1}, \sigma_{\beta_1}) $$

A similar structure can be implemented for detection probability $p$.

Note there is a variety of this model that incorporates hypothetical completely unobserved species using data augmentation, but that is not a model that `unmarked` is able to fit at this time.

# Example dataset

We'll use an avian point count dataset from the Swiss breeding bird survey (MHB) in 2014.
The data are counts of 158 species at 267 sites. 
The dataset is included in the `AHMbook` package.
For more detailed analyses of this dataset, see [Applied Hiearchical Modeling in Ecology](https://www.sciencedirect.com/book/9780128013786/applied-hierarchical-modeling-in-ecology), Volume 1, section 11.6.
You can also check `?MHB2014` for more information.


``` r
set.seed(123)
library(AHMbook)
data(MHB2014)
names(MHB2014)
```

```
## [1] "species" "sites"   "counts"  "date"    "dur"
```

Included in this dataset:

1. `species`: Species-level information and covariates
2. `sites`: Site covariates
3. `counts`: Counts by site and species
4. `date`: Sampling dates (site by occasion)
5. `dur`: Sampling duration (site by occasion) 

# Organize the dataset

## Detection-nondetection data

We'll start by converting the count data to detection-nondetection data (`y`).
The counts are provided in an array format.
For use with `occuComm`, the `y` array should have dimensions site (M) x occasion (J) x species (S).
You can also specify `y` as a list with length equal to S, with each list element a matrix with dimensions M x J (similar to `occuMulti`).
We'll keep things as an array.


``` r
y <- MHB2014$counts
dim(y)
```

```
## [1] 267   3 158
```

``` r
y[y > 0] <- 1
```

## Species covariates

The model allows for species-level covariates.
The covariates must be provided as a named list.
Each list element is a separate covariate and can have one of three dimensions: either a vector of length S, a matrix M x S, or an array M x J x S depending on how the covariate varies.
For example, you could provide the mean body weight of the species as a vector of length S, or the density of the preferred prey of the species at each site (which could differ by species) as a matrix M x S.
In the MHB data, all the species covariates are body size measurements and thus are vectors of length S.


``` r
spcov <- as.list(MHB2014$species[,c("body.length", "body.mass", "wing.span")])
str(spcov)
```

```
## List of 3
##  $ body.length: int [1:158] 27 48 90 35 102 150 80 58 62 55 ...
##  $ body.mass  : int [1:158] 150 1000 1800 150 3250 11000 3300 1100 1300 1350 ...
##  $ wing.span  : int [1:158] 43 87 165 55 200 220 165 89 91 130 ...
```

## Site covariates

These are organized as usual for `unmarked` models and are contained in `MHB2014$sites`.


``` r
sitecov <- MHB2014$site[c("elev", "rlength", "forest")]
head(sitecov)
```

```
##   elev rlength forest
## 1  450     6.4      3
## 2  450     5.5     21
## 3 1050     4.3     32
## 4  950     4.5      9
## 5 1150     5.4     35
## 6  550     3.6      2
```

## Observation covariates

Again these are organized as usual for `unmarked` (as a named list).


``` r
obscov <- MHB2014[c("date", "dur")]
lapply(obscov, head)
```

```
## $date
##      date141 date142 date143
## Q001      21      52      70
## Q002      26      47      59
## Q003      25      52      73
## Q004      40      55      65
## Q005      16      38      62
## Q006      52      61      69
## 
## $dur
##      dur141 dur142 dur143
## Q001    215    220    240
## Q002    195    175    185
## Q003    210    270    210
## Q004    310    300    285
## Q005    240    240    210
## Q006    180    145    140
```

## Create `unmarkedFrame`

The `unmarkedFrame` type for this model is called `unmarkedFrameOccuComm`.
Note the `speciesCovs` argument which is unique to this `unmarkedFrame` type.


``` r
library(unmarked)
umf <- unmarkedFrameOccuComm(y = y, siteCovs = sitecov,
                             obsCovs = obscov, speciesCovs = spcov)
head(umf)
```

```
## Data frame representation of unmarkedFrame object.
## Only showing observation matrix for species 1.
##      y.1 y.2 y.3 elev rlength forest date.1 date.2 date.3 dur.1 dur.2 dur.3
## Q001   0   0   0  450     6.4      3     21     52     70   215   220   240
## Q002   0   0   0  450     5.5     21     26     47     59   195   175   185
## Q003   0   0   0 1050     4.3     32     25     52     73   210   270   210
## Q004   0   0   0  950     4.5      9     40     55     65   310   300   285
## Q005   0   0   0 1150     5.4     35     16     38     62   240   240   210
## Q006   0   0   0  550     3.6      2     52     61     69   180   145   140
## Q007   0   0   0  750     3.9      6     18     40     60   180   195   180
## Q008   0   0   0  650     6.1     60     17     39     61   195   225   210
## Q009   0   0   0  550     5.8      5     18     45     59   190   180   205
## Q010   0   0   0  550     4.5     13     25     50     76   195   203   215
```

# Fit a model

With the dataset organized we can now fit a model with `occuComm`.
We'll specify a model with an effect of elevation (site-level) and body mass (species-level) on occupancy, and no covariates on detection.
The `occuComm` function automatically sets up the species-level random effects for intercepts and slopes.
Note however that there will be no random slope by species for species-level covariates of length S (as this would not make sense).
We can specify the model formulas exactly the same as with the single-species `occu` function.
This model takes a while to fit given the large number of species.


``` r
fit <- occuComm(~1 ~scale(elev) + scale(body.mass),
                data = umf)
```

```
## Warning: 158 sites have been discarded because of missing data.
```

``` r
fit
```

```
## 
## Call:
## occuComm(formula = ~1 ~ scale(elev) + scale(body.mass), data = umf)
## 
## Occupancy (logit-scale):
## Random effects:
##   Groups        Name Variance Std.Dev.
##  species (Intercept)    9.245    3.041
##  species scale(elev)    3.169    1.780
## 
## Fixed effects:
##                  Estimate    SE      z  P(>|z|)
## (Intercept)        -2.980 0.258 -11.57 5.61e-31
## scale(elev)        -0.717 0.155  -4.63 3.64e-06
## scale(body.mass)   -0.769 0.272  -2.83 4.72e-03
## 
## Detection (logit-scale):
## Random effects:
##   Groups        Name Variance Std.Dev.
##  species (Intercept)    1.663    1.289
## 
## Fixed effects:
##  Estimate    SE    z  P(>|z|)
##     0.633 0.122 5.19 2.15e-07
## 
## AIC: 46368.93 
## Number of species: 158
## Number of sites: 266
## ID of sites removed due to NA: 30
```

We get estimates of the overall (mean) intercepts and slopes.
We also get estimates of the random effects SDs for the intercepts and for the elevation slopes (but not for body mass, as noted above).

There is some missing data in the model which generates a warning.
You can ignore the exact number of missing sites reported by the warning - in fact there is only one missing site (#30) which is reported correctly in the summary.

## Estimates of random intercepts and slopes

We can get estimates of the random intercept and slope terms with `randomTerms`.


``` r
rand_est <- randomTerms(fit)
head(rand_est)
```

```
##   Model  Groups        Name               Level  Estimate       SE     lower
## 1   psi species (Intercept)        Little Grebe -2.659992 1.106212 -4.828127
## 2   psi species (Intercept) Great Crested Grebe -2.359038 1.241233 -4.791811
## 3   psi species (Intercept)          Grey Heron -1.517640 1.168330 -3.807526
## 4   psi species (Intercept)      Little Bittern -3.668043 1.461535 -6.532599
## 5   psi species (Intercept)         White Stork -1.106583 1.439474 -3.927901
## 6   psi species (Intercept)           Mute Swan  2.836215 2.545419 -2.152715
##         upper
## 1 -0.49185740
## 2  0.07373509
## 3  0.77224458
## 4 -0.80348815
## 5  1.71473497
## 6  7.82514579
```

``` r
tail(rand_est)
```

```
##     Model  Groups        Name               Level    Estimate        SE
## 469     p species (Intercept)        Corn Bunting -1.04598590 1.0409023
## 470     p species (Intercept)        Yellowhammer  1.49688629 0.2277654
## 471     p species (Intercept)        Cirl Bunting -0.71401947 0.5916209
## 472     p species (Intercept)     Ortolan Bunting -0.07100565 1.3653792
## 473     p species (Intercept)        Rock Bunting  0.40225149 0.2812063
## 474     p species (Intercept) Common Reed Bunting -0.44832431 0.6252049
##          lower     upper
## 469 -3.0861169 0.9941451
## 470  1.0504743 1.9432983
## 471 -1.8735751 0.4455361
## 472 -2.7470997 2.6050885
## 473 -0.1489027 0.9534057
## 474 -1.6737033 0.7770547
```

The estimates provided by this function are mean-0 (i.e., they come from a normal distribution with mean 0).
Thus to get the complete intercept and slope for each species, we need to add the "fixed" part of the intercept and slope (shown in the `Fixed effects` section of the summary, or with `coef()`) to the random part shown above.
We can do this with an argument:


``` r
rand_est <- randomTerms(fit, addMean = TRUE)
head(rand_est)
```

```
##   Model  Groups        Name               Level   Estimate       SE     lower
## 1   psi species (Intercept)        Little Grebe -5.6402732 1.088450 -7.773597
## 2   psi species (Intercept) Great Crested Grebe -5.3393185 1.226390 -7.742998
## 3   psi species (Intercept)          Grey Heron -4.4979213 1.147969 -6.747900
## 4   psi species (Intercept)      Little Bittern -6.6483243 1.452614 -9.495396
## 5   psi species (Intercept)         White Stork -4.0868639 1.422657 -6.875221
## 6   psi species (Intercept)           Mute Swan -0.1440654 2.533089 -5.108828
##       upper
## 1 -3.506950
## 2 -2.935639
## 3 -2.247943
## 4 -3.801252
## 5 -1.298507
## 6  4.820697
```

## Get estimates of occupancy

As usual with `unmarked`, we can use `predict`.
The `predict` function will return estimates of occupancy for each site and species by default.


``` r
pr <- predict(fit, type='state')
lapply(pr[1:3], head)
```

```
## $`Little Grebe`
##     Predicted          SE       lower      upper
## 1 0.062629441 0.027968118 0.025591420 0.14527977
## 2 0.062629441 0.027968118 0.025591420 0.14527977
## 3 0.006932087 0.006147293 0.001211306 0.03862620
## 4 0.010069015 0.007592241 0.002280547 0.04330204
## 5 0.004767739 0.004893379 0.000634300 0.03489633
## 6 0.043842917 0.018741413 0.018733213 0.09920657
## 
## $`Great Crested Grebe`
##     Predicted          SE        lower      upper
## 1 0.040469236 0.023611754 0.0126461928 0.12194572
## 2 0.040469236 0.023611754 0.0126461928 0.12194572
## 3 0.004956348 0.004995188 0.0006837188 0.03499409
## 4 0.007061007 0.006133672 0.0012785527 0.03800045
## 5 0.003476823 0.004020987 0.0003586577 0.03281439
## 6 0.028694428 0.015960275 0.0095246153 0.08320543
## 
## $`Grey Heron`
##     Predicted          SE        lower      upper
## 1 0.047915787 0.024412407 0.0173265386 0.12560618
## 2 0.047915787 0.024412407 0.0173265386 0.12560618
## 3 0.006356595 0.005769278 0.0010666779 0.03691114
## 4 0.008941182 0.006947514 0.0019367894 0.04025524
## 5 0.004515718 0.004725964 0.0005775214 0.03438506
## 6 0.034456683 0.016614842 0.0132319646 0.08673438
```

## Plot covariate relationships by species

We'll plot estimated occupancy by elevation for four species: Common Buzzard, Northern Raven, Eurasian Jay, and European Crested Tit.
We also need to get the corresponding body mass of each species.


``` r
sp <- c("Common Buzzard", "Northern Raven", "Eurasian Jay", "European Crested Tit")
body_mass <- MHB2014$species$body.mass
names(body_mass) <- MHB2014$species$engname
body_mass <- unname(body_mass[sp])
```

First create a sequence of elevation values.


``` r
elev_seq <- seq(min(sitecov$elev), max(sitecov$elev), length.out=100)
```

For each species we create a `newdata` data frame with elevation and the species name and matching body mass.
Then we use `predict`, and plot the results.


``` r
par(mfrow=c(2,2), mar=c(4,3,2,1))
for (i in 1:length(sp)){
  nd <- data.frame(elev=elev_seq,
                   species = sp[i],
                   body.mass = body_mass[i])
  pr <- predict(fit, 'state', newdata=nd)
  plot(elev_seq, pr$Predicted, type='l', xlab="Elevation", ylab="Occupancy",
       main = sp[i], ylim=c(0,1))
  polygon(c(elev_seq, rev(elev_seq)), c(pr$lower, rev(pr$upper)), col='grey90', border=NA)
  lines(elev_seq, pr$Predicted)
}
```

![Predicted occupancy by species](occuComm-figures/occuComm-predict-1.png)

``` r
par(mfrow=c(1,1))
```

## Richness

There's a built-in function for calculating richness at each site from the model.


``` r
rich <- richness(fit)
head(rich)
```

```
## [1] 33.49 34.93 52.52 42.07 45.55 43.46
```

To get the uncertainty of this estimate, we can return a complete posterior and calculate CIs:


``` r
post <- richness(fit, posterior=TRUE)
est <- apply(post@samples, 1, mean)
low <- apply(post@samples, 1, quantile, 0.025)
up <- apply(post@samples, 1, quantile, 0.975)
```

And compare them to naive richness:


``` r
rich_naive <- MHB2014$counts
rich_naive[rich_naive > 0] <- 1
rich_naive <- apply(rich_naive, c(1,3), 
                    function(x) if(all(is.na(x))) return(NA) else return(max(x, na.rm=TRUE))) 
rich_naive <- apply(rich_naive, 1, sum, na.rm=TRUE)

plot(rich_naive, est, xlab="Observed species", ylab="Estimated richness", 
     pch=19, col=rgb(0,0,0,alpha=0.2))
segments(rich_naive, low, rich_naive, up)
abline(a=0, b=1)
```

![Richness by site](occuComm-figures/occuComm-rich-1.png)

This figure should look similar to Figure 11.8 in the AHM 1 book (it's not quite the same model).

# Handling guilds or similar groups

In community occupancy models it is common to specify an additional hierarchical structure above species, e.g. guild.
In this version of the model, rather than the random intercepts and slopes coming from common distributions across *all* species, instead information is shared only among species belonging to the same guild.
The model structure is the same as above except the random effect hyperparameters are guild-specific where each species $s$ belongs to one guild $g$.

$$\psi_{is(g)} = \mathrm{logit}(\beta_{0,s(g)} + \beta_{i,s(g)} \cdot x_i) $$

$$ \beta_{0,s(g)} \sim \mathrm{Normal}(\mu_{\beta_{0,g}}, \sigma_{\beta_{0,g}}) $$

$$ \beta_{1,s(g)} \sim \mathrm{Normal}(\mu_{\beta_{1,g}}, \sigma_{\beta_{1,g}}) $$

For example, see the analysis in section 11.6.3 of AHM 1.

The `occuComm` function cannot directly handle this extra hierarchical layer as you can in say, JAGS or NIMBLE.
However, we can get pretty close.
Note that often when we fit a guild-based model like this, we are in effect fitting entirely separate models for each guild.
That is, since no information is being shared among guilds (only within them) we can simply separate the data by guild and fit separate models to each piece and get the same result.
We can then combine the results to get richness, if needed.
This `occuComm` is able to do.

## Dividing the data by guild

We'll divide the species into three "guilds" based on body mass.
This code is identical to the code in AHM 1, section 11.6.3.


``` r
# Look at distribution of body mass among 136 observed species
# Get mean species mass
mass <- MHB2014$species$body.mass 
hist(log10(mass), breaks = 40, col = "grey")
```

![Distribution of log10(body mass)](occuComm-figures/occuComm-hist-1.png)

``` r
# Look at log10
gmass <- as.numeric(log10(mass) %/% 1.3 + 1)
# Size groups 1, 2 and 3
gmass[gmass == 4] <- 3 # Mute swan is group 3, too
```

## Fit separate models by guild

Now we simply divide our complete dataset into pieces by guild and fit the same model to each dataset separately.


``` r
# Guild 1
spcovs1 <- lapply(spcov, function(x) x[gmass == 1])
umf1 <- unmarkedFrameOccuComm(y = umf@ylist[gmass == 1],
                              siteCovs = sitecov,
                              obsCovs = obscov,
                              speciesCovs = spcovs1)
fit1 <- update(fit, data = umf1)
```

```
## Warning: 45 sites have been discarded because of missing data.
```

``` r
# Guild 2
spcovs2 <- lapply(spcov, function(x) x[gmass == 2])
umf2 <- unmarkedFrameOccuComm(y = umf@ylist[gmass == 2],
                              siteCovs = sitecov,
                              obsCovs = obscov,
                              speciesCovs = spcovs2)
fit2 <- update(fit, data = umf2)
```

```
## Warning: 86 sites have been discarded because of missing data.
```

``` r
# Guild 3
spcovs3 <- lapply(spcov, function(x) x[gmass == 3])
umf3 <- unmarkedFrameOccuComm(y = umf@ylist[gmass == 3],
                              siteCovs = sitecov,
                              obsCovs = obscov,
                              speciesCovs = spcovs3)
fit3 <- update(fit, data = umf3)
```

```
## Warning: 27 sites have been discarded because of missing data.
```

## Calculate richness

To get richness across the entire community, we get posterior samples of richness for each guild and combine them. 


``` r
post1 <- richness(fit1, posterior=TRUE)
post2 <- richness(fit2, posterior=TRUE)
post3 <- richness(fit3, posterior=TRUE)

rich_all <- post1@samples + post2@samples + post3@samples
rich_all <- drop(rich_all)
```

The result is more or less identical to our previous richness estimate using a single model.


``` r
rich_mean <- apply(rich_all, 1, mean)
plot(rich, rich_mean, xlab="Combined model", ylab="Guilds model")
abline(a=0, b=1)
```

![Richness estimate comparison](occuComm-figures/occuComm-comp-1.png)