---
title: "Introduction to `broom.mixed`"
author: "Ben Bolker"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{introduction to broom.mixed}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r setup, include=FALSE, message=FALSE}
library(knitr)
```

# Introduction

`broom.mixed` is a spinoff of the [broom package](https://CRAN.R-project.org/package=broom). The goal of `broom` is to bring the modeling process into a "tidy"(TM) workflow, in particular by providing standardized verbs that provide information on

- `tidy`: estimates, standard errors, confidence intervals, etc.
- `augment`: residuals, fitted values, influence measures, etc.
- `glance`: whole-model summaries: AIC, R-squared, etc.

`broom.mixed` aims to provide these methods for as many packages and model types in the R ecosystem as possible. These methods have been separated from those in the main `broom` package because there are issues that need to be dealt with for these models (e.g. different types of parameters: fixed, random-effects parameters, conditional modes/BLUPs of random effects, etc.) that are not especially relevant to the broader universe of models that `broom` deals with.

# Mixed-model-specific issues

## Terminology

- the upper-level parameters that describe the distribution of random variables (variance, covariance, precision, standard deviation, or correlation) are called *random-effect parameters* (`ran_pars` in the `effects` argument when tidying)
- the values that describe the deviation of the observations in a group level from the population-level effect (which could be posterior means or medians, conditional modes, or BLUPs depending on the model) are called *random-effect values* (`ran_vals` in the `effects` argument when tidying)
- the parameters that describe the population-level effects of (categorical and continuous) predictor variables are called *fixed effects* (`fixed` in the `effects` argument when tidying)
- the categorical variable (factor) that identifies which group or cluster an observation belongs to is called a *grouping variable* (`group` column in `tidy()` output)
- the particular level of a factor that specifies which level of the grouping variable an observation belongs to is called a *group level* (`level` column in `tidy()` output)
- the categorical or continuous predictor variables that control the expected value (i.e., enter into the linear predictor for some part of the model) are called *terms* (`term` column in `tidy()` output); note that unlike in base `broom`, **the term column may have duplicated values**, because the same term may enter multiple model components (e.g. zero-inflated and conditional models; models for more than one response; fixed effects and random effects)

## Time-consuming computations

Some kinds of computations needed for mixed model summaries are computationally expensive, e.g. likelihood profiles or parametric bootstrapping. In this case `broom.mixed` may offer an option for passing a pre-computed object to `tidy()`, eg. the `profile` argument in the `tidy.merMod` (`lmer`/`glmer`) method.


# Related packages

There are many, many things one might want to do with a fitted model, and `broom.mixed` can only do a few of them.

- `emmeans`
- `multcomp`
- `car`
- `afex`
- `sjStats`/`sjPlots`
- `rockchalk`

## huxtable + broom.mixed

## dotwhisker + broom.mixed

`dotwhisker` is a convenient platform for creating dot-whisker plots - either directly from models or lists of models (`tidy()` methods are automatically called to convert the models to a tidy format), or from the (possibly post-processed) output of a `tidy()` call. There are a couple of caveats and issues to be aware of when using `dotwhisker` in conjunction with `broom.mixed`, however.

1. For fixed effects, the `group` value is set to `NA`: in the current CRAN version (0.5.0), an unfortunate `na.omit()` within the `dwplot` code will eliminate all of the fixed effects unless you drop this column before passing the results to `dwplot` (this [has been fixed](https://github.com/fsolt/dotwhisker/commit/e014e8dba95181dfcf5a68964cd7fdeb844e97cd) in the current GitHub version, which you can install with `devtools::install_github("fsolt/dotwhisker")`).
2. In `broom.mixed` output, it is fairly common for a single tidied model to have duplicated entries in the `term` column (e.g. effects that appear in both the conditional and the zero-inflated model, or intercept standard deviations for several different grouping variables). `dotwhisker::dwplot` takes this as evidence that it has been handed a tidied object containing the results from several different models, and asks for a `model` column that will distinguish the non-unique terms. There are (at least) two strategies you can take:
     - `dwplot(list(fitted_model))` will plot all of the non-unique values together
     - `tidy(fitted_model) %>% tidyr::unite(term, group, term)` will create a new `term` column that's the combination of the `group` and `term` columns (which will disambiguate random-effect terms from different grouping variables); `unite(term, component, term)` will disambiguate conditional and zero-inflation parameters. The code below shows a slightly more complicated (but prettier) approach. (Some sort of `disambiguate_terms()` function could be added in a future version of the package ...)

```{r dwplot1,message=FALSE,warning=FALSE}
library(dplyr)
library(tidyr)
require(rstan) ## workaround for r-devel problem
library(broom.mixed)
if (require("brms") && require("dotwhisker") && require("ggplot2")) {
    L <- load(system.file("extdata", "brms_example.rda", package="broom.mixed"))
    gg0 <- (tidy(brms_crossedRE)
        ## disambiguate
        %>% mutate(term=ifelse(grepl("sd__(Int",term,fixed=TRUE),
                               paste(group,term,sep="."),
                               term))
        %>% dwplot
    )
    gg0 + geom_vline(xintercept=0,lty=2)
}
```

# Capabilities

Automatically retrieve table of available methods:

```{r get_methods}
get_methods()
```

Manually compiled list of capabilities (possibly out of date):

```{r capabilities, results="asis",echo=FALSE, message=FALSE}
cc <- read.csv(system.file("capabilities.csv", package="broom.mixed"))
if (require("pander")) {
    pander::pander(cc,split.tables=Inf)
}
```