%\VignetteIndexEntry{mediation}

\documentclass[nojss]{jss}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% declarations for jss.cls %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% almost as usual
\author{Dustin Tingley\\Harvard \And
  \hspace{.25in}Teppei Yamamoto\\\hspace{.25in}MIT \And
  \hspace{.5in}Kentaro Hirose\\\hspace{.5in}Princeton \And 
  \hspace{.25in}Luke Keele\\\hspace{.25in}Penn State \And
  Kosuke Imai\\Princeton        
}
\title{\pkg{mediation}: \proglang{R} Package for Causal Mediation Analysis}

%% for pretty printing and a nice hypersummary also set:
\Plainauthor{Dustin Tingley, Teppei Yamamoto, Kentaro Hirose, Luke
  Keele, Kosuke Imai} %% comma-separated
\Plaintitle{mediation: R Package for Causal Mediation Analysis} %% without formatting
\Shorttitle{Causal Mediation Analysis}  %% a short title (if necessary)

%% an abstract and keywords
\Abstract{

  In this paper, we describe the \proglang{R} package \pkg{mediation}
  for conducting causal mediation analysis in applied empirical
  research.  In many scientific disciplines, the goal of researchers
  is not only estimating causal effects of a treatment but also
  understanding the process in which the treatment causally affects
  the outcome.  Causal mediation analysis is frequently used to assess
  potential causal mechanisms.  The \pkg{mediation} package implements
  a comprehensive suite of statistical tools for conducting such an
  analysis.  The package is organized into two distinct approaches.
  Using the model-based approach, researchers can estimate causal
  mediation effects and conduct sensitivity analysis under the
  standard research design.  Furthermore, the design-based approach
  provides several analysis tools that are applicable under different
  experimental designs. This approach requires weaker assumptions
  than the model-based approach. We also implement a
  statistical method for dealing with multiple (causally dependent)
  mediators, which are often encountered in practice. Finally, the package
  also offers a methodology for assessing causal mediation in the presence
  of treatment noncompliance, a common problem in randomized trials.

}
\Keywords{causal mechanisms, mediation analysis, \pkg{mediation}, \proglang{R}}
\Plainkeywords{causal mechanisms, mediation analysis, mediation, R} %% without formatting
%% at least one keyword must be supplied

%% publication information
%% NOTE: Typically, this can be left commented and will be filled out by the technical editor
\Volume{59}
\Issue{5}
\Month{August}
\Year{2014}
\Submitdate{2012-06-04}
\Acceptdate{2014-04-17}

%% The address of (at least) one author should be given
%% in the following format:
\Address{
  Dustin Tingley\\
  Department of Government\\
  Harvard University\\
  1737 Cambridge St\\
  Cambridge, MA, United States of America\\
  E-mail: \email{dtingley@gov.harvard.edu}\\
  URL: \url{http://scholar.harvard.edu/dtingley}\\

  Teppei Yamamoto\\
  Department of Political Science\\
  Massachusetts Institute of Technology\\
  77 Massachusetts Avenue, E53-463\\
  Cambridge, MA, United States of America\\
  E-mail: \email{teppei@mit.edu}\\
  URL: \url{http://web.mit.edu/teppei/www/}\\

  Kentaro Hirose, Kosuke Imai\\
  Department of Politics \\
  Princeton University\\
  Princeton, NJ, United States of America\\
  E-mail: \email{hirose@princeton.edu}, \email{kimai@princeton.edu} \\
  URL: \url{http://imai.princeton.edu.}\\

  Luke Keele\\
  Department of Political Science\\
  Pennsylvania State University\\
  211 Pond Lab\\
  University Park, PA, United States of America\\
  Email: \email{ljk20@psu.edu}\\
  URL: \url{http://www.personal.psu.edu/ljk20}
}

%% It is also possible to add a telephone and fax number
%% before the e-mail in the following format:
%% Telephone: +43/1/31336-5053
%% Fax: +43/1/31336-734

%% for those who use Sweave please include the following line (with % symbols):
%% need no \usepackage{Sweave.sty}

%% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% == BibTeX packages
\usepackage{natbib}

% == Other Packages
\usepackage{amsmath, amsfonts, amssymb, url, bm}
\usepackage{rotating}
\usepackage{latexsym}
\usepackage{graphicx}
\usepackage{ulem}

% == New Commands
% = For general typesetting
\newcommand\spacingset[1]{\renewcommand{\baselinestretch}{#1}\small\normalsize}
\renewcommand\r{\right}
\renewcommand\l{\left}

% = For bibtex
\def\citepos#1{\citeauthor{#1}'s (\citeyear{#1})}
\newcommand\citea{\citeauthor}

% = For general math
\newtheorem{theorem}{Theorem}[section]
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{proposition}{Proposition}
\newtheorem{assumption}{Assumption}
\newcommand{\qed}{\hfill \ensuremath{\Box}}
\DeclareMathOperator*\argmax{argmax}
\DeclareMathOperator*\argmin{argmin}

% = For stats & econometrics
\renewcommand{\epsilon}{\varepsilon}
\newcommand\ud{\mathrm{d}}
\newcommand\dist{\buildrel\rm d\over\sim}
\newcommand\ind{\stackrel{\rm indep.}{\sim}}
\newcommand\iid{\stackrel{\rm i.i.d.}{\sim}}
\newcommand\logit{{\rm logit}}
\newcommand\cA{\mathcal{A}}
\newcommand\cN{\mathcal{N}}
\renewcommand\E{\mathbb{E}}
\newcommand\V{\mathbb{V}}
\newcommand\Cov{\textrm{Cov}}
\newcommand\Corr{\textrm{Corr}}
\newcommand\cJ{\mathcal{J}}
\newcommand\cT{\mathcal{T}}
\newcommand\y{{\bm y}}
\newcommand\X{{\bm X}}
\newcommand\x{{\bm x}}
\newcommand\eps{{\bm\varepsilon}}
\newcommand\zero{{\bm 0}}
\newcommand\be{{\bm\beta}}
\renewcommand\b{{\bm b}}
\newcommand\C{{\bm C}}
\newcommand\D{{\bm D}}
\newcommand\I{{\bm I}}
\newcommand\Z{{\bm Z}}
\newcommand\eeta{{\bm \eta}}
\DeclareMathOperator*\plim{plim}
\DeclareMathOperator\rank{rank}
\newcommand{\indep}{\mbox{$\perp\!\!\!\perp$}}
\DeclareMathOperator{\sgn}{sgn}
\def\independenT#1#2{\mathrel{\rlap{$#1#2$}\mkern2mu{#1#2}}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% == document begins here
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{document}

%% include your article here, just as usual
%% Note that you should use the \pkg{}, \proglang{} and \code{} commands.

\SweaveOpts{keep.source=TRUE, echo = TRUE, results = verbatim, eval = TRUE}
%\SweaveOpts{keep.source=TRUE, echo = TRUE, results = hide, eval = FALSE}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}

Scholars across a wide range of disciplines are increasingly interested in
identifying causal mechanisms, going beyond the estimation of causal
effects. Once they ascertain that certain variables causally affect
the outcome, the next natural step is to understand how these
variables exert their influence.  The standard procedure for analyzing
causal mechanisms in applied research is called {\it mediation
  analysis}, where a set of linear regression models are fitted and
then the estimates of ``mediation effects'' are computed from the
fitted models
\citep[e.g.,][]{haav:43,baro:kenn:86,shad:cook:camp:01,MacKinnon:2008}. In
recent years, however, causal mechanisms have been studied within the
modern framework of causal inference with an emphasis on the
assumptions required for identification. This approach has highlighted
limitations of earlier methods and pointed the way towards a more
flexible estimation strategy.  In addition, new research designs have
been proposed for identifying causal mechanisms.

In this paper, we introduce a full featured \proglang{R} package,
\pkg{mediation} \citep{mediation}, for studying causal mechanisms. The
\pkg{mediation} package allows users to (1) investigate the role of
causal mechanisms using different types of data and statistical
models, (2) explore how results change as identification assumptions
are relaxed, and (3) calculate quantities of interest under
alternative research designs. We focus on the demonstration of the
functionalities available through the \pkg{mediation} package.  The
statistical theory that underlies the procedures implemented in the
\pkg{mediation} package is presented elsewhere along with various
empirical examples
\citep{imai:keel:yama:10,ImaiAPSR,imai:keel:ting:10,ImaiDesign,yama:13}.

The \pkg{mediation} package is freely available for download via the
Comprehensive \proglang{R} Archive Network (CRAN) at
\url{http://CRAN.R-project.org/package=mediation} and runs on a
variety of computing platforms \citep{CRAN:2012}.  In addition, a
\proglang{Stata} \citep{Stata13} version of the package is available
but has a more limited functionality \citep{MediateStata}.  The first
version of the \pkg{mediation} package appeared at CRAN in 2009, and
\citet{imai:etal:10} discuss an earlier version of the package.  Since
then, however, we have dramatically improved the package with a
significant number of new functionalities and improvements. The
current paper thus provides an up-to-date description of the analyses
that can be conducted via the \pkg{mediation} package.  To install the
\pkg{mediation} package, use the following standard syntax for
installing an \proglang{R} package,
<<eval=TRUE,echo=FALSE,results=hide,keep.source=TRUE>>=
options(prompt = "R> ")
@ 
<<eval=FALSE>>=
install.packages("mediation")
@
where users may be prompted to select a CRAN mirror from
which the package will be downloaded.  This step needs to be done only
once (unless one wishes to update the \pkg{mediation} package to the
new version).

In the next section, we present an overview of the \pkg{mediation}
package.  We then describe the functionalities of the package for the
model-based causal mediation analysis (Section~\ref{sec:mediate}),
multilevel mediation analysis (Section~\ref{sec:multilevel}),
the design-based causal mediation analysis (Section~\ref{sec:design}),
the analysis of causally dependent multiple mediators
(Section~\ref{sec:multimech}), and causal mediation analysis with
treatment noncompliance (Section~\ref{sec:ivmediate}). Finally,
Section~\ref{sec:conclude} concludes.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section[Overview of the mediation package]{Overview of the
  \pkg{mediation} package}

The \pkg{mediation} package consists of several main functions as well
as various methods for summarizing output from these functions (e.g.,
\code{plot} and \code{summary}). The package requires little
programming knowledge on the user's side. Figure~\ref{fg:structure}
illustrates the core structure of the \pkg{mediation} package, which
distinguishes between model-based and design-based inference.
Model-based inference has been standard practice in the mediation
analysis to date. In the experimental setting, the treatment variable
is randomized and the mediating and outcome variables are observed
without any intervention by researchers. \citet{imai:keel:ting:10}
show that a range of parametric and semi-parametric models may then be
used to estimate the average causal mediation effect, defined below,
and other quantities of interest. This modeling approach relies on the
sequential ignorability assumption for point identification, which as
\citet{imai:keel:ting:10} show, provides a general purpose algorithm
for estimating quantities of interest. In contrast, design-based
inference primarily employs the features of the experimental design
and does not require the sequential ignorability assumption. The
formal identification properties of these designs are studied by
\citet{ImaiDesign} and the examples from experimental and
observational studies are contained in \citet{ImaiAPSR,ImaiDesign}.
We refer readers to these papers for the details about the statistical
methods implemented via the \pkg{mediation} package.

\begin{figure}[t!]
\centering
  \includegraphics[width=\textwidth]{mediation-chart-new.jpg}
\caption{Core structure of the \pkg{mediation} package as of version 4.0.}
\label{fg:structure}
\end{figure}

Before describing the functions available in \pkg{mediation}, we
briefly define the quantities of interest that our software is
designed to estimate.  Here, we use the potential outcomes framework to
define these quantities. Let $M_i(t)$ denote the potential value of a
mediator of interest for unit $i$ under the treatment status $T_i =
t$.  Let $Y_i(t,m)$ denote the potential outcome that would result if
the treatment and mediating variables equal $t$ and $m$, respectively.
Consider a standard experimental design where only the treatment
variable is randomized.  We observe only one of the potential
outcomes, and the observed outcome, $Y_i$, equals $Y_i(T_i, M_i(T_i))$
where $M_i(T_i)$ represents the observed value of the mediator $M_i$.
With this notation, the total unit treatment effect can be written as,
\begin{eqnarray}
  \tau_i & \equiv & Y_i(1,M_i(1))-Y_i(0,M_i(0)).
\end{eqnarray}
We can decompose this total effect into the two components.  First,
the {\it causal mediation effects} are represented by
\citep{robi:gree:92,pear:01},
\begin{eqnarray}
  \delta_i(t) & \equiv & Y_i(t, M_i(1)) - Y_i(t, M_i(0)), \label{eq:deltai}
\end{eqnarray}
for each treatment status $t=0,1$. All other causal mechanisms can be
represented by the {\it direct effects} of the treatment as,
\begin{eqnarray}
  \zeta_i(t) & \equiv & Y_i(1, M_i(t)) - Y_i(0,
  M_i(t)), \label{eq:zetai}
\end{eqnarray}
for each unit $i$ and each treatment status $t=0,1$.  Together, we see
that they sum up to the total effect,
\begin{eqnarray}
  \tau_i & = & \delta_i(t) + \zeta_i(1-t)
\end{eqnarray}
for $t=0,1$.  The case of multiple candidate mediating variables
requires additional notation and is
discussed in Section~\ref{sec:multimech}.  The {\it average causal
  mediation effects} (ACME) $\bar\delta(t)$ and the average direct
effects (ADE) $\bar\zeta(t)$, represent the population averages of
these causal mediation and direct effects.

Identification of the ACME requires an additional assumption beyond
the strong ignorability of the treatment, which is sufficient for
identifying the average total effect of the treatment.  Let $X_i$ be
a vector of the observed pre-treatment confounders for unit $i$. The
key identifying assumption is called sequential ignorability and can
be written as,
\begin{assumption}[Sequential Ignorability;
  \citealt{imai:keel:yama:10}] \label{asm:ignorable} \spacingset{1.25}
\begin{eqnarray}
 \{Y_i(t^\prime,m), M_i(t)\} & \indep & T_i  \mid X_i = x, \label{eq:YMindepTgivenX}\\
 Y_i(t^\prime,m) & \indep & M_i(t) \mid T_i = t, X_i = x, \label{eq:YindepMgivenTX}
\end{eqnarray}
where $0<\Prob(T_i = t \mid X_i = x)$ and $0 < p(M_i = m \mid T_i =
t, X_i = x)$ for $t = 0,1$, and all $x$ and $m$ in the support of
$X_i$ and $M_i$, respectively.
\end{assumption}
Equation~\ref{eq:YMindepTgivenX} is the standard strong ignorability
of the treatment assignment and is satisfied, for example, if the
treatment is randomized (possibly conditional on $X_i$).  However,
Equation~\ref{eq:YindepMgivenTX} requires that the mediator is also
ignorable given the observed treatment and pre-treatment confounders.
This additional assumption is quite strong because it excludes the
existence of (measured or unmeasured) post-treatment confounders as
well as that of unmeasured pre-treatment confounders.  This
assumption, therefore, rules out the possibility of multiple
mediators that are causally related to each other (see
Section~\ref{sec:multimech} for the method that is designed to deal
with such a scenario).

\section{Model-based causal mediation analysis}
\label{sec:mediate}

In this section, we discuss the functionalities of the
\pkg{mediation} package for model-based causal mediation analysis
under the assumption of sequential ignorability.  Many of these
functionalities are described in detail in \citet{imai:etal:10}, but
the current version of the package accommodates a larger class of
statistical models.

The model-based causal mediation analysis proceeds in two steps.
First, the researcher specifies two statistical models, the mediator
model for the conditional distribution of the mediator $M_i$ given the
treatment $T_i$ and a set of the observed pre-treatment covariates
$X_i$ and the outcome model for the conditional distribution of the
outcome $Y_i$ given $T_i$, $M_i$, and $X_i$.  These models are fitted
separately and then their fitted objects comprise the main inputs to
the \code{mediate} function, which computes the estimated ACME and
other quantities of interest under
these models and the sequential ignorability assumption. Since the
sequential ignorability assumption is untestable, we recommend that
the researchers conduct a sensitivity analysis via the \code{medsens}
function, which can be applied to certain statistical models.  We now
illustrate these functionalities with an empirical example.
\pagebreak
\subsection{Estimation of the average causal mediation effects}
\label{sec:mediateest}

\begin{table}[t]
 \spacingset{1}
  \begin{center}
    \setlength{\tabcolsep}{4pt}
\begin{tabular}{lccccccc}
\hline
 & \multicolumn{7}{c}{\it Outcome model types} \\
\cline{2-8}
{\it Mediator model types} & Linear & GLM & Ordered & Censored & Quantile & GAM & Survival \\
\hline
Linear (\code{lm}/\code{lmer}) & \checkmark & \checkmark & \checkmark$^\ast$ & \checkmark & \checkmark & \checkmark$^\ast$ & \checkmark  \\
GLM (\code{glm}/\code{bayesglm}/  & \checkmark & \checkmark & \checkmark$^\ast$ & \checkmark & \checkmark & \checkmark$^\ast$ & \checkmark  \\
\phantom{GLM (}\code{glmer})\\
Ordered (\code{polr}/\code{bayespolr}) & \checkmark & \checkmark & \checkmark$^\ast$ & \checkmark & \checkmark & \checkmark$^\ast$ & \checkmark  \\
Censored (tobit via \code{vglm}) & -- & -- & -- & -- & -- & -- & -- \\
Quantile (\code{rq}) & \checkmark$^\ast$ & \checkmark$^\ast$ & \checkmark$^\ast$ & \checkmark$^\ast$ & \checkmark$^\ast$ & \checkmark$^\ast$ & \checkmark \\
GAM (\code{gam})   & \checkmark$^\ast$ & \checkmark$^\ast$ & \checkmark$^\ast$ & \checkmark$^\ast$ & \checkmark$^\ast$ & \checkmark$^\ast$ & \checkmark$^\ast$ \\
Survival (\code{survreg}) & \checkmark & \checkmark & \checkmark$^\ast$ & \checkmark & \checkmark & \checkmark$^\ast$ & \checkmark  \\
\hline
\end{tabular}
\caption{Types of statistical models that can be used with the
  \code{mediate} function.  Asterisks, $^\ast$, indicate the model
  combinations that can only be estimated using the nonparametric
  bootstrap (i.e., with the argument \code{boot = TRUE} for the
  \code{mediate} function).}
  \label{tab:MediateOptions}
  \end{center}
\end{table}


The \code{mediate} function takes various standard model objects (such
as obtained with \code{lm} and \code{glm}), which correspond to
mediator and outcome models, and returns the estimates of the average
causal mediation effects along with other causal quantities of
interest.  The output of the \code{mediate} function can be passed to
the \code{plot} and \code{summary} functions in order to obtain
graphical and numerical summaries, respectively. The \code{mediate}
function automatically detects the type of models used for the
mediator and outcome models and calculates the estimates of the ACME
and other quantities of interest via the general algorithms described
in \citet{imai:keel:ting:10}.  Our estimation strategy overcomes the
limitation of the standard methods based on the product or difference
of coefficients, which are only appropriate for the analysis of causal
mediation effects when both the mediator and outcome models are linear
regressions where $T_i$ and $M_i$ enter the models additively (e.g.,
without interaction). In contrast, the algorithms used in the
\pkg{mediation} package nest this as a special case and accommodate a
greater range of statistical models as shown in
Table~\ref{tab:MediateOptions}.

We now illustrate the use of the \code{mediate} function with an
empirical example based on the \code{framing} data of
\citet{Brader:2008}.  This data set is a part of the \pkg{mediation}
package and can be loaded via the following syntax,
<<>>=
library("mediation")
set.seed(2014)
data("framing", package = "mediation")
@
A brief description of each variable in the data can be obtained
through a help file,
<<eval=FALSE>>=
?framing
@
\citet{Brader:2008} conducted a randomized experiment where subjects
are exposed to different media stories about immigration and the
authors investigated how their framing influences attitudes and
political behavior regarding immigration policy. They posit anxiety as
the mediating variable for the causal effect of framing on public
opinion.  We first fit the mediator model where the measure of anxiety
(\code{emo}) is modeled as a function of the framing treatment
(\code{treat}) and pre-treatment covariates (\code{age}, \code{educ},
\code{gender}, and \code{income}).  Next, we model the outcome
variable, which is a binary variable indicating whether or not the
participant agreed to send a letter about immigration policy to his or
her member of Congress (\code{cong_mesg}).  The explanatory variables
of the outcome model include the mediator, treatment status, and the
same set of pre-treatment variables as those used in the mediator
model.\footnote{Using different sets of pre-treatment covariates for
  the mediator and outcome models may be justified depending on the
  causal relationships assumed for those covariates. See
  \citet{Pearl:2013} and \citet{imai:etal:2013}.}  In this example,
the treatment is expected to increase the level of respondents'
emotional response, which in turn is hypothesized to make subjects
more likely to send a letter to his or her member of Congress.  We use
the linear regression fit with least squares and the probit regression
for the mediator and outcome models, respectively.
<<>>=
med.fit <- lm(emo ~ treat + age + educ + gender + income, data = framing)
out.fit <- glm(cong_mesg ~ emo + treat + age + educ + gender + income,
               data = framing, family = binomial("probit"))
@

We now use the \code{mediate} function to estimate the ACME and ADE. As
the inputs to this function, we must specify the model fits (in this
case \code{med.fit} and \code{out.fit}) as well as the names of the
treatment and mediating variables, which are represented as the
arguments \code{treat} and \code{mediator}, respectively.  Here and
throughout the rest of this paper, we use a small
number % Vignette-version
of simulations (\code{sims = 100}) for the purpose of illustration
to % Vignette-version
calculate the uncertainty estimates, but one may wish to use the
default % Vignette-version
(1000) or even larger number if the estimates % Vignette-version
vary too much from one simulation to another.  % Vignette-version
The default simulation type is the quasi-Bayesian Monte Carlo method
based on normal approximation \citep{imai:keel:ting:10}. We use
White's heteroskedasticity-consistent estimator for the covariance
matrix from the \pkg{sandwich} package 
\citep[\code{vcovHC};][]{Zeileis:2006} by setting the \code{robustSE} argument to
\code{TRUE}. This argument can be omitted if standard uncertainty
estimates are desired. Finally, like most functions in \proglang{R},
the results of the \code{mediate} function can be summarized
numerically by the \code{summary} function, which calculates point
estimates, confidence intervals, and the $p$-values, for the average
direct, indirect, and total effects.\footnote{Note that the results
  will be slightly different in each run of \code{mediate} because of
  Monte Carlo errors, especially when \code{sims} is set to a small
  number. If exact reproduction of results is desired, users can set a
  specific randomness seed (\code{set.seed}) before calling the
  \code{mediate} function.}  The syntax is now given as,
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
med.out <- mediate(med.fit, out.fit, treat = "treat", mediator = "emo",
                   robustSE = TRUE, sims = 100)
summary(med.out)
@

One new feature in the tabular output from the \code{mediate}
functions is the addition of $p$-values for the various estimates.  In
this example, the estimated ACMEs are statistically significantly
different from zero but the estimated average direct and total effects
are not. The results suggest that the treatment in the framing
experiment may have increased emotional response, which in turn made
subjects more likely to send a message to his or her member of
Congress. Here, since the outcome is binary all estimated effects are
expressed as the increase in probability that the subject sent a
message to his or her Congress person.

In addition, we can use the nonparametric bootstrap rather than the quasi-Bayesian
Monte Carlo simulation for variance estimation via the \code{boot =
  TRUE} argument,
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
med.out <- mediate(med.fit, out.fit, boot = TRUE, treat = "treat",
                   mediator = "emo", sims = 100)
summary(med.out)
@
The output now indicates that the bootstrap is used for inferences.
As expected, the results are largely the same.  In general, as long as
computing power is not an issue, analysts should estimate confidence
intervals via the bootstrap with more than 1000 resamples, which is
the default number of simulations.

Two types of methods for calculating bootstrap-based confidence intervals
are available via the \code{boot.ci.type} argument. The basic percentile
intervals are calculated by default or setting the argument to \code{"perc"}.
The bias-corrected and accelerated (BCa) intervals are computed if the argument
is set to \code{"bca"} \citep[see][for the definition of the method]{dici:efro:96}.
The latter has better asymptotic properties and is often recommended for the estimation of mediation effects \citep{prea:haye:08}.

As an alternative to the numerical summary, using the output from the
\code{mediate} function as the input to the \code{plot} command
provides a graphical summary of the three parameters (indirect,
direct, and total effects) along with their confidence intervals.
Figure~\ref{fig:mediate} shows the result of plotting the \code{med.out}
object.\footnote{Users may make further modifications to the plot via
standard \code{plot} options, including changes to the labels.}

\begin{figure}[t!]
\begin{center}
<<fig=TRUE,echo=FALSE>>=
plot(med.out)
@
\caption{Graphical display of results from the \code{mediate} function.}
\label{fig:mediate}
\end{center}
\end{figure}

\subsubsection{Treatment and mediator interaction}
It is possible that the ACME takes different values depending on the
baseline treatment status.  In such a situation, the researcher can
add an interaction term between the treatment and mediator to the
outcome model.  Then, the \code{mediate} function automatically
detects the change in the specification and explicitly estimates the
ACME conditional on treatment status.\footnote{When the outcome model
  is nonlinear, the ACME and direct effect estimates will differ
  between the treatment and control conditions even when the model
  does not include an interaction term.  The \code{summary} output in
  such cases includes average values of these two estimates to ease
  interpretation of the results.}  In the output given below, the
estimated ACME now varies with treatment status.
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
med.fit <- lm(emo ~ treat + age + educ + gender + income, data=framing)
out.fit <- glm(cong_mesg ~ emo * treat + age + educ + gender + income,
               data = framing, family = binomial("probit"))
med.out <- mediate(med.fit, out.fit, treat = "treat", mediator = "emo",
                   robustSE = TRUE, sims = 100)
summary(med.out)
@
The statistical significance of the treatment-mediator interaction can be
tested via the \code{test.TMint} function in the following manner.
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
test.TMint(med.out, conf.level = .95)
@

The \code{mediate} function's output contains a range of additional
quantities that users might find helpful. Each is stored as part of
the model's output. This includes vectors of the simulation output for
all quantities of interests (see \code{?mediate} for details), which
can be used for a variety of tasks, such as more intensive plotting.


\subsubsection{Missing data}

Our simulation-based approach to the estimation of mediation effects
enables users to deal with missing data via standard multiple
imputation procedures in a straightforward fashion. The
\pkg{mediation} package includes a pair of utility functions --
\code{mediations} and \code{amelidiate} -- to facilitate such
analysis. First, users simulate multiple data sets using their
preferred imputation software. Next, run \code{mediate} on each data
set by simply passing the data sets through \code{mediations}. Next,
pass the output of \code{mediations} to the \code{amelidiate}
function, which combines the components of the output from
\code{mediations} into a format that can be analyzed with the standard
\code{summary} and \code{plot} commands.\footnote{Note that
  \code{amelidiate} does not support some models and features yet; see
  \code{?amelidiate} for details.} Alternatively, users can manually
run \code{mediate} on their imputed data sets and simply stack the
vectors of quantities they are interested in, and use basic functions
like \code{quantile} to calculate confidence intervals.

\subsection{Moderated mediation}
\label{sec:modmed}

One new important feature of the \code{mediate} function is the
ability to study moderated mediation.  Often analysts hypothesize that
the magnitude of the ACME depends on (or is moderated by) a
pre-treatment covariate.  Such a pre-treatment covariate is called a
moderator. In the framing example, the ACME may be much stronger among
older respondents than younger ones. In other words, the ACME may be
moderated by age.

There are two alternative routes to the analysis of moderated
mediation with the \pkg{mediation} package. The first method involves
alteration of both the statistical models as well as the syntax for
the \code{mediate} function.  First, the mediator and outcome models
should contain the moderator and its interaction terms with respect to
the treatment and mediating variables that are theoretically
justified.  For example, we may modify the previous models as follows,
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
med.fit <- lm(emo ~ treat * age + educ + gender + income, data=framing)
out.fit <- glm(cong_mesg ~ emo + treat * age + emo * age + educ + gender
               + income, data = framing, family = binomial("probit"))
@

Once the two models are fitted, the researcher must specify the levels
of the moderator at which effects will be calculated by the
\code{mediate} function.\footnote{If the models include
  moderator-treatment interactions and yet this option is not
  specified, then the resulting ACME and direct effect estimates will
  simply be averages over the empirical distribution of the
  covariates.}  In the current example, this can be done by setting
the \code{age} covariate to a specific value. To allow the mediation
effects to be moderated by age, we set the value of \code{age} to be
20 in one model and 60 in another model. More complicated moderated
mediation involving multiple moderators could be specified by
expanding the list of the covariates.
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
med.age20 <- mediate(med.fit, out.fit, treat = "treat",
                     mediator = "emo", covariates = list(age = 20), sims = 100)
med.age60 <- mediate(med.fit, out.fit, treat = "treat",
                     mediator = "emo", covariates = list(age = 60), sims = 100)
summary(med.age20)
summary(med.age60)
@
Thus, the researcher receives two different sets of output. In the
first output, the average mediation effect is estimated for those who
are 20 years old.  In contrast, the second output applies to those who
are 60 years old.

The second approach to moderated mediation consists of directly testing the
statistical significance of the difference in the ACME and ADE between two chosen
levels of pre-treatment covariates. This analysis is conducted via the
\code{test.modmed} function. For example, the following syntax can be
used to test whether the ACME and ADE significantly differ between the subjects
who are 20 years old and those who are 60 years old.
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
med.init <- mediate(med.fit, out.fit, treat = "treat", mediator = "emo", sims=2)
test.modmed(med.init, covariates.1 = list(age = 20),
                      covariates.2 = list(age = 60), sims = 100)
@
Unlike the first approach, the initial \code{mediate} fit does not need the
\code{covariates} argument set to specific values; the choice of covariate
levels is made directly in the call to the \code{test.modmed} function. Note
that the initial \code{mediate} call does not require a large number of
simulation draws, for the actual calculation of uncertainty for the test happens
within the \code{test.modmed} function.

\subsection{Non-binary treatment variables}
\label{sec:cattreat}

Experimental manipulations are often complex and go beyond simple
treatment and control conditions. In the framing experiment, for
example, the researchers actually used a $2 \times 2$ factorial
design.  That is, each subject was exposed to two different binary
treatments, yielding four different experimental manipulations.  In
the analysis presented above, we have focused on a comparison of one
of these conditions relative to the other three conditions.  The
\code{mediate} function, however, has the capability to handle more
complex experimental contrasts, which can be represented by a
non-binary treatment variable.

Here, instead of using the binary \code{treat} variable, we use a
variable named \code{cond}, which records which of the four conditions
the subject was randomly exposed to.  Using the \code{control.value}
and \code{treat.value} options, the user can calculate the specific
contrast of interest.  For example, the comparison between the second
and third conditions can be done with the following code.
<<keep.source=TRUE, echo = TRUE, results = hide>>=
med.fit <- lm(emo ~ cond + age + educ + gender + income, data = framing)
out.fit <- glm(cong_mesg ~ emo + cond + age + educ + gender + income,
               data = framing, family = binomial("probit"))
med23.out <- mediate(med.fit, out.fit, treat = "cond", mediator = "emo",
                     control.value = 2, treat.value = 3, sims = 100)
summary(med23.out)
@

Similarly, the researcher can compare the first and fourth
experimental conditions via the following syntax,
<<keep.source=TRUE, echo = TRUE, results = hide>>=
med14.out <- mediate(med.fit, out.fit, treat = "cond", mediator = "emo",
                     control.value = 1, treat.value = 4, sims = 100)
summary(med14.out)
@

Nothing changes in the format of the output, but the contrasts differ
depending on the categories chosen for comparison by the
researcher. In the case of a continuous treatment variable, the
researcher would specify two values of the treatment to make the
contrast \citep{imai:keel:ting:10}. For example, the causal mediation
effects can be defined for any two levels of the treatment,
\begin{eqnarray}
\delta_i(t; t_1, t_0) & \equiv & Y_i(t, M_i(t_1)) - Y_i(t,M_i(t_0)),
\end{eqnarray}
where $t_1 \ne t_0$.  The corresponding average causal mediation
effect is defined as $\bar\delta(t; t_1,t_0)\equiv\E(\delta_i(t;
t_1,t_0))$. Thus, the researcher can set \code{control.value} to $t_0$
and \code{treat.value} to $t_1$.  The researcher may also vary the
value of $t_1$, while fixing the base line value of $t_0$, to examine
how the ACME changes as the function of $t_1$.



\subsection{Sensitivity analysis for sequential ignorability}
\label{subsec:medsens}


\begin{table}[t!]
 \spacingset{1}
  \centering
\begin{tabular}{lcc}
\hline
                     &\multicolumn{2}{c}{\it Outcome model types} \\
\cline{2-3}
{\it Mediator model types} & Linear & Binary probit \\
\hline
Linear                     & \checkmark & \checkmark \\
Binary probit              & \checkmark & -- \\
\hline
\end{tabular}
\caption{The types of models that can be handled by \code{medsens} for
  sensitivity analysis. } \label{tab:SensitivityOptions}
\end{table}

Sequential ignorability is a strong assumption, and therefore a
sensitivity analysis is recommended.  The \pkg{mediation} package
allows the researcher to conduct a sensitivity analysis for the
possible existence of unobserved pre-treatment covariates.
Specifically, the output of the \code{mediate} function can be passed
to the \code{medsens} function, which then computes the values of
causal quantities as a function of sensitivity parameters. Both
\code{summary} and \code{plot} functions are available for sensitivity
analysis, and they display the results in a tabular and graphical
form, respectively. Since derivation of sensitivity formulas must be
done on a case-by-case basis, the range of options for conducting
sensitivity analyses is somewhat limited.
Table~\ref{tab:SensitivityOptions} gives the model combinations
currently supported by the \code{medsens} function.

In our running example, after computing the ACME, we conduct a
sensitivity analysis by passing the object from \code{mediate} to the
\code{medsens} function.  We first choose as the sensitivity parameter
the correlation $\rho$ between the residuals of the mediator and
outcome regressions \citep{imai:keel:ting:10,imai:keel:yama:10}.  If
there exist unobserved pre-treatment confounders which affect both the
mediator and the outcome, we expect that the sequential ignorability
assumption is violated and $\rho$ is no longer zero.  The sensitivity
analysis is conducted by varying the value of $\rho$ and examining how
the estimated ACME changes.  The following syntax can be used to
conduct this analysis,
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
med.fit <- lm(emo ~ treat + age + educ + gender + income, data = framing)
out.fit <- glm(cong_mesg ~ emo + treat + age + educ + gender + income,
               data = framing, family = binomial("probit"))
med.out <- mediate(med.fit, out.fit, treat = "treat", mediator = "emo",
                   robustSE = TRUE, sims = 100)
sens.out <- medsens(med.out, rho.by = 0.1, effect.type = "indirect", sims = 100)
summary(sens.out)
@
where \code{rho.by = 0.1} specifies that $\rho$ will vary from $-0.9$
to $0.9$ by $0.1$ increments, and \code{effect.type = "indirect"}
means that sensitivity analysis is conducted for the
ACME. Alternatively, specifying \code{effect.type = "direct"} performs
sensitivity analysis for the ADE and \code{"both"} returns sensitivity
analysis for the ACME and ADE.

The tabular output from the \code{summary} function displays the
values of $\rho$ at which the confidence intervals contain zero for
the ACME.  For both the control and treatment conditions, the
confidence intervals for the ACME contain zero when $\rho$ equals
$0.3$ and $0.4$.  An alternative but mathematically equivalent way to
conduct sensitivity is in terms of the product of $R^2$ (or
coefficients of determination) statistics from the mediator and
outcome models. Discussed in more detail elsewhere
\citep{imai:keel:yama:10,ImaiAPSR,imai:keel:ting:10}, the first row
captures the point at which the ACME is $0$ as a function of the
proportions of residual variance in the mediator and outcome explained
by the hypothesized unobserved confounder. The second line uses the
total variance instead of residual variance. We use ${R^\ast}^2$ for
residual variance and $\tilde{R}^2$ for total variance. For example,
when the product of the original variance explained by the omitted
confounding is $.049$ the point estimate for ACME would be $0$.

\begin{figure}[t!]
\begin{center}
<<fig=TRUE,echo=FALSE>>=
par(mfrow = c(2,2))
plot(sens.out, sens.par = "rho", main = "Anxiety", ylim = c(-.2,.2))
@
\vspace{-2.5in}
\caption{Graphical display of results from the \code{medsens} function. Results as a function of $\rho$.}
\label{fig:medsens}
\end{center}
\end{figure}

A graphical display is often more intuitive and useful for the
sensitivity analysis, especially for the $R^2$ interpretations.  This
can be done, as before, by passing the object from the \code{medsens}
function to the \code{plot} function.  The \code{plot} function allows
the researcher to graphically summarize the results of sensitivity
analysis either in terms of $\rho$ (\code{sens.par = "rho"}) or $R^2$
statistics (\code{sens.par = "R2"}).
<<eval=FALSE,fig=FALSE,echo=TRUE>>=
plot(sens.out, sens.par = "rho", main = "Anxiety", ylim = c(-0.2, 0.2))
@

When using the $R^2$ statistic version of sensitivity analysis the
user must specify whether the hypothesized confounder affects the
mediator and outcome variables in the same direction or in different
directions. This matters because the sensitivity analysis is in terms
of the product of $R^2$ statistics. In the current example, we assume
that the confounder influences both variables in the same direction by
setting \code{sign.prod = "positive"} (rather than \code{sign.prod =
  "negative"}). Here, we plot the total variance version of the
sensitivity analysis. The bold line represents the various
combinations of the $R^2$ statistics where the ACME would be $0$ (in
this case the product equals $.049$). The graphical display also
presents the corresponding contour plots for other products of the
$R^2$ statistics.
<<eval=FALSE,fig=FALSE,echo=TRUE>>=
plot(sens.out, sens.par = "R2", r.type = "total", sign.prod = "positive")
@


\begin{figure}[t!]
\begin{center}
<<fig=TRUE,echo=FALSE>>=
par(mfrow = c(2,2))
plot(sens.out, sens.par = "R2", r.type = "total", sign.prod = "positive")
@
\vspace{-2.5in}
\caption{Graphical display of results from the \code{medsens}
  function. Results as a function of $\tilde{R}^2$.}
\label{fig:medsens2}
\end{center}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Causal mediation analysis of multilevel data}
\label{sec:multilevel}

As of version 4.2, the \pkg{mediation} package supports causal
mediation analysis of multilevel data via the \code{lmer} and
\code{glmer} functions in the \pkg{lme4} package
\citep{lme4}. Researchers are often interested in analyzing data where
individual observations such as students, patients, and employees are
clustered within groups such as schools, hospitals, and
companies. Data on individuals may be correlated within groups, but
also different groups may have different data generating
processes. Multilevel models take into account such heterogeneity
within and between groups simultaneously.

Mediation analysis of multilevel data can be categorized into various
types depending on whether the treatment, mediator and outcome
variables are each measured at the individual or group level
\citep[see][]{Krull:2001,Zhang:2009}. Regardless of these types,
researchers can use the \code{mediate} function to analyze multilevel
data by choosing appropriate statistical models for the mediator and
outcome variables. In this section, we illustrate the use of our package
for multilevel data by focusing on two types of data structure: (1) the
treatment is assigned at the group level whereas the mediator and
outcome are measured at the individual level, and (2) both the treatment
and mediator are group-level variables while the outcome is recorded at
the individual level. Other combinations of data levels can be handled
via straightforward modifications to the syntax used in these examples.\footnote{We note that as of the writing of this article the lme4 package is known to generate slightly different random number draws across computing platforms (Windows, Mac, etc.) for a given seed which given the simulation method used can generate small numerical differences in some cases.}  

To illustrate the usage, we analyze data from the Education
Longitudinal Study (2002)\footnote{To protect the anonymity of the
  individuals involved in the study, we generated hypothetical
  individual-level variables via multiple imputation.  The results
  reported below do not take into account any statistical uncertainty
  due to the imputation procedure and should thus be regarded only as
  illustration.  The original data can be obtained from
  \textit{Education Longitudinal Study (ELS), 2002: Base Year (ICPSR
    4275)} by the United States Department of Education, National
  Center of Education
  Statistics. \url{http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/4275}.}
where students are clustered within schools. The \pkg{mediation}
package contains two related data sets. The \code{student} data set
contains both student- and school-level variables organized at the
student level. The \code{school} data set only contains school-level
variables, such that the number of observations in this data set
equals the number of unique levels of the school identifier variable
(\code{SCH_ID}) in the \code{student} data set. As explained below in
detail, the group-level data set (\code{school}) is required only when
we analyze the data where both the treatment and the mediator are
group-level variables.

\subsection{Group-level treatment and individual-level mediator}

First, consider the case where the treatment is a group-level variable but
the mediator and outcome variables are measured at the individual level. In this
case, we only need the student-level data set,
<<>>=
data("student", package = "mediation")
@
Here, we analyze as an example whether a school is Catholic or not
(\code{catholic}) affects a student's likelihood of fighting
(\code{fight}) at the school, and hypothesize that a student's
emotional attachment to the school (\code{attachment}) functions as
the causal mechanism. That is, we postulate that students in a
Catholic school may have an increased sense of attachment to their
school, which may in turn decrease their likelihood of getting
involved in a fight. We model these causal processes using the
following hierarchical logistic-normal regression model for the
(binary) mediator,
\begin{eqnarray*}
\Prob(M_{ij}=1) & = & \logit^{-1}\l(\alpha_j + \gamma^\top X_{ij}\r), \\
\alpha_j & = & \alpha + \beta T_{j} + \epsilon_{j},
\end{eqnarray*}
where $i$ and $j$ are student and school indicators, respectively, $\epsilon_j$ is a normally distributed group-level stochastic error with mean zero, and $X_{ij}$ represents the vector of student-level pre-treatment covariates (\code{gender}, \code{income} and \code{pared}). Likewise, we use the following model for the (binary) outcome,
\begin{eqnarray*}
\Prob(Y_{ij}=1) & = & \logit^{-1}\l(\lambda_j + \phi_j M_{ij} + \zeta^\top X_{ij}\r), \\
\lambda_j & = & \lambda + \psi T_j + \upsilon_j, \\
\phi_j & = & \phi + \theta T_j + \nu_j,
\end{eqnarray*}
where $\upsilon_j$ and $\nu_j$ are group-level errors jointly bivariate normally distributed with mean zero. If desired, more complex data generating processes can be assumed (with appropriate changes in the syntax for the models below), such as allowing for group-varying slopes on the treatment variable or incorporating group-level pre-treatment covariates.

Now, note that these two models can be equivalently written as follows,
$$ \Prob(M_{ij}=1) \ = \ \logit^{-1}\l( (\alpha+\epsilon_j) + \beta T_j + \gamma^\top X_{ij} \r),$$
and
$$ \Prob(Y_{ij}=1) \ = \ \logit^{-1}\l((\lambda+\upsilon_j) + \psi T_j + (\phi+\nu_j)M_{ij} + \theta T_jM_{ij} + \zeta^\top X_{ij}\r),$$
which can both be estimated via the \code{glmer} function,
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
library(lme4)  
med.fit <- glmer(attachment ~ catholic + gender + income + pared + (1|SCH_ID),
                 family = binomial(link = "logit"), data = student)
out.fit <- glmer(fight ~ catholic*attachment +
                         gender + income + pared + (1 + attachment|SCH_ID),
                 family = binomial(link = "logit"), data = student)
@
These fitted objects can then be fed into the \code{mediate} function in the usual manner.
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
med.out <- mediate(med.fit, out.fit, treat = "catholic", mediator = "attachment",
                   sims = 100)
summary(med.out)
@
The estimated mediation, direct, and total effects are all
significantly different from zero. The results suggest that the
school-level treatment (\code{catholic}) increases the value of the
individual-level mediator (\code{attachment}), which in turn decreases
the value of the outcome (\code{fight}), and also that the
treatment decreases the value of the outcome directly or in different causal
paths.

\subsection{Group-level treatment and mediator}

Next, consider the case where both the treatment and mediator are
group-level variables while the outcome is measured at the individual
level. In this case, we need the second data set containing only the
group-level variables,
<<>>=
data("school", package = "mediation")
@
Note that the group-level data set must also contain the group
indicator used in the individual-level data set under the same
variable name (\code{SCH_ID} in our running example). The current
version of \code{mediate} also requires that the model frames of the
mediator and outcome models contain the exact same set of groups,
which becomes important when each model contains different covariates
and some groups drop out of the model frames due to missingness.

As an illustration, we investigate the effects of school-level
economic condition (\code{free}; proportion of students who receive
free lunch) on students' tardiness (\code{late}; days per semester
they are late for school). As a causal path, we postulate that
school-level poverty negatively impacts school-level morale
(\code{smorale}), which in turn increases tardiness among students.
We use the following hierarchical regressions to model the
hypothesized causal mechanism,
\begin{eqnarray*}
M_j & = & \alpha + \beta T_j + \gamma^\top V_j + \epsilon_j, \\
Y_{ij} & = & \lambda_j + \zeta^\top X_{ij} + \upsilon_{ij}, \\
\lambda_j & = & \lambda + \theta T_j + \phi M_j + \kappa^\top V_j + \nu_j,
\end{eqnarray*}
where $V_j$ is the vector of school-level covariates (\code{catholic}
and \code{coed}), $X_{ij}$ is the vector of student-level covariates
(\code{gender}, \code{income} and \code{pared}), and $\epsilon_j$,
$\upsilon_{ij}$ and $\nu_j$ are each normally distributed stochastic
errors with mean zero. Again, more complex models can be used (e.g.,
adding a treatment-mediator interaction term to the outcome model) if
desired.

In this case, the mediator model is solely composed of the
school-level variables and fixed coefficients. Hence the mediator
model can be fit via the \code{lm} function using the school-level
data set,
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
med.fit <- lm(smorale ~  free + catholic + coed, data = school)
@
and the outcome model, which can be equivalently written as,
$$ Y_{ij} \ = \ (\lambda + \upsilon_j) + \theta T_j + \phi M_j +
(\gamma^\top + \kappa^\top) V_j + \zeta^\top X_{ij} + \upsilon_{ij}, $$
can be estimated with the \code{lmer} function and the student-level data set,
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
out.fit <- lmer(late ~ free + smorale + catholic + coed +
                       gender + income + pared + (1|SCH_ID),
                data = student)
@
These fitted model objects can then be passed to the \code{mediate}
function.  Since the treatment variable is a continuous variable, we
use the values of $3$ and $4$ as the control and treatment values,
respectively, and estimate the quantities of interest in terms of
these values.
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
med.out <- mediate(med.fit, out.fit, treat = "free", mediator = "smorale",
                   control.value = 3, treat.value = 4, sims = 100)
summary(med.out)
@
The estimated mediation effect is significantly different from zero,
suggesting that the school-level treatment (\code{free}) decreases the
value of the school-level mediator (\code{smorale}), which in turn
increases the value of the outcome (\code{late}).

We conclude this section by providing more details about the current
version of our package for multilevel mediation analysis. First, the
\code{summary} function can produce estimates of group-specific
effects by adding the \code{output} argument, which can be set to
either \code{"bygroup"} or \code{"byeffect"}. In the above example,
\code{summary(med.out, output = "bygroup")} produces the output
organized by schools, and \code{summary(med.out, output = "byeffect")}
produces the output organized into quantities of
interest. Group-specific effects can also be graphically displayed by
\code{plot(med.out, group.plots = TRUE)}. Second, the \code{mediate}
function allows researchers to specify different groups in the
mediator and outcome models (nested or non-nested). For example, it
may be reasonable to assume that the mediator variable is correlated
within schools but the outcome variable is clustered at the district
level. In such a case, the \code{group.out} argument for the
\code{mediate} function allows researchers to choose the group into
which the estimated group-specific effects are aggregated.

The current version of the package also has some limitations for
multilevel mediation analysis. First, it only allows for one group
type for each model. For example, it is not possible to let
coefficients of the mediator (or outcome) model vary not only for
schools but also for districts. Second, the bootstrap-based
uncertainty estimates for the mediation effects are not yet
available. Third, the \code{medsens} function for sensitivity analysis
cannot be applied to the \code{mediate} outputs based on multilevel
regression models. Future updates may add these missing
functionalities. Finally, it is important to reiterate that the
validity of the estimates crucially rests on
Assumption~\ref{asm:ignorable}, regardless of whether hierarchical
models are fitted to the data or not.




%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Design-based causal mediation analysis}
\label{sec:design}

An alternative approach to model-based inference is to use different
research designs that are specifically designed for identifying causal
mechanisms.  \citet{ImaiDesign} propose several such designs and
describe the assumptions required for the identification of causal
mediation effects under each of the designs.  In this section we
briefly illustrate how to use our software to calculate the estimates
of the quantities of interest under each design.

%%%%%%%%%%%%%%
\subsection{Single experiment design}

The single experiment design randomizes the treatment variable and
measures the mediating and outcome variables. In
Section~\ref{sec:mediate}, we discussed estimation functions that can
be used with parametric and semi-parametric models. If the researchers
wish to pursue a completely nonparametric approach the
\pkg{mediation} package offers two options via the \code{mediate.sed}
function. First, the researchers can continue to make the sequential
ignorability assumption and nonparametrically estimate the ACME.  This
approach works only when the mediator variable is discrete. Second,
the sharp bounds on the ACME can be computed under the assumption that
only the treatment is randomized.  \citet{ImaiDesign}
derive the bounds in the case with all binary variables (treatment,
mediator, and outcome) and show that, unfortunately, the bounds are never informative
about the sign of the ACME (i.e., they always include 0).

Most mediation analysis proceeds under the sequential ignorability
assumption.  Those analyses also tend to be model-based, but they need
not be. \citet{imai:keel:yama:10} outline a design-based estimator for
the ACME for when the mediator is discrete.  This estimator for the
ACME is fully nonparametric.  One drawback to this estimator is that
one can encounter mediator-treatment combinations for which there are
no subjects because of data sparsity. Standard error calculation for
this estimator is based on either the Delta method or the nonparametric
bootstrap.

The \code{mediate.sed} function requires the names of the outcome,
mediator, and treatment variables, along with the name of the data
frame that contains these variables.  When \code{SI = TRUE}, the
function will return the point estimates under the sequential
ignorability assumption, and otherwise the results will be a set of
sharp bounds for the ACME.  The method for inference also differs
slightly from the \code{mediate} function.  When \code{boot = TRUE}
the bootstrap is used, but when \code{boot = FALSE}, the Delta method
is used to compute standard errors.

Below, we present an example using the framing data from
\citet{Brader:2008}.  The treatment variable is the same as before,
i.e., \code{treat}, and the mediator is \code{anx}, which refers to a
subject's reported level of anxiety.  This four level measure is one
component of the \code{emo} variable that was previously used as the
mediator and in the data all treatment-mediator combinations are
present (a requirement for the estimator). The outcome variable in
this example is \code{english} and measures on a four point scale how
much someone supports English only laws, from strongly support to
strongly oppose. Note that the \code{mediate.sed} function only takes
numeric variables as arguments.  Variables that are stored as factors
must be converted to numeric variables as we show below.
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
framing$english <- as.numeric(framing$english)
framing$anx <- as.numeric(framing$anx)
sed.est <- mediate.sed("english", "anx", "treat", data = framing, SI = TRUE,
                       boot = TRUE, sims = 100)
summary(sed.est)
@
The results from the \code{summary} function display the mediation
effects along with the default 95\% confidence intervals. In this
example both $\bar\delta(0)$ and $\bar\delta(1)$ are not significantly
different from 0.

%DT Note, we previously did not do an example for this. We can, but I'm not sure we need to.
%<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
%sed.2 <- mediate.sed("english", "anx", "treat", data=framing, SI=FALSE,
%boot=TRUE, sims=50)
%summary(sed.2)
%@

\subsection{Parallel design}
\label{subsec:parallel}
An alternative to the single experiment design is the ``parallel
design'' proposed by \citet{ImaiDesign}. In this design there are two
separate experiments that are run in parallel with subjects randomly
assigned to one of the two experiments. The first experiment follows
the single experiment design. In the second experiment, subjects are
randomly assigned to treatment or control. Then, a randomly selected
set of subjects in each condition is assigned a value of the mediating
variable. A key assumption of this design is that the manipulation of
the mediating variable is possible and has no direct effect on the
outcome variable.

Under the parallel design, the ACME is not point identified without an
additional assumption.  The \pkg{mediation} package offers two options
via the \code{mediate.pd} function.  First, the researchers can assume
no interaction between the treatment and mediating variables by
setting \code{NINT = TRUE}.  In this case, the \code{mediate.pd}
function will calculate the ACME along with its bootstrap confidence
intervals.  Second, the assumption of no-interaction between treatment
and mediator can be dropped via \code{NINT = FALSE}, and then the
sharp bounds can be calculated for the ACME.  These bounds may be
informative about the sign (i.e., do not cover 0) and are always
narrower compared to the bounds under the single experiment design
where the only assumption is randomization of the treatment.

For illustration, we simulated data based on the media framing
experiment by \citet{Brader:2008} by creating a population
distribution of potential mediators and outcomes (see
\citet{ImaiDesign} for more details). We then sampled 1000 cases from
this distribution. In this example, \code{out} represents the outcome
variable (immigration attitudes), \code{med} represents the mediator
(anxiety), and \code{ttt} represents the treatment. All variables are
binary. The variable \code{manip} represents whether the subject had the mediator
manipulated ($-1$ if mediator is manipulated down, $0$ if no
manipulation, and $1$ if manipulated up). First, the no-interaction
assumption is made and options for the number of bootstrap simulations
and confidence intervals are specified. In this case, the mediation
effect is estimated at $-0.12$ with $95\%$ confidence intervals
spanning $[-0.21, -0.03]$. In the second example, the no interaction
assumption is dropped and the sharp bounds are calculated to span
$[-0.3, 0.3]$ for the control condition and $[0.2, 0.77]$ for the
treatment condition.
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
data("boundsdata", package = "mediation")
pd <- mediate.pd("out", "med", "ttt", "manip", boundsdata,
                 NINT = TRUE, sims = 100, conf.level = 0.95)
summary(pd)
pd1 <- mediate.pd("out", "med", "ttt", "manip", boundsdata,
                  NINT = FALSE)
summary(pd1)
@

\subsection{Parallel encouragement design}

In many situations, perfect manipulation of the mediating variable may
be difficult. In the parallel encouragement design, subjects
are split into two separate experiments. The first experiment is based
on the single experiment design. In the second experiment subjects are
randomly assigned to the treatment and control conditions and then,
within each condition, a subset of subjects are randomly encouraged to
have a high or low value of the mediator. Both the mediator and
outcome variable are then measured. The \code{mediate.ped} function
reports the sharp bounds on two estimands. First is the ACME and
second is the ACME for the ``compliers'' who respond to the
encouragement. The calculation of these bounds is accomplished via a
standard linear programming approach as discussed in
\citet{ImaiDesign}.

The parallel encouragement design requires the analyst to specifically
design some form of encouragement.  The functionality of the
\code{mediate.ped} closely mirrors that of \code{mediate.sed}. The key
difference is that the analyst must also include an indicator for
encouragement.  For illustration, we simulated data based on the media
framing experiment by \citet{Brader:2008}. We did this by creating a
population distribution of potential mediators and outcomes, and
compliance types. We then randomly draw the joint probabilities of the
causal types and assign an encouragement status for those in the
encouragement condition (see \citet{ImaiDesign} for more
details). Based on the encouragement condition and encouragement
status (\code{enc}, $-1$ if mediator is encouraged down, $0$ if no
encouragement, and $1$ if encouraged up), the observed binary values
of the mediator (\code{med.enc}) and outcome (\code{out.enc}) are
determined. Using this simulated data we can then pass it to the
\code{mediate.ped} function for the parallel encouragement design.
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
data("boundsdata", package = "mediation")
ped <- mediate.ped("out.enc", "med.enc", "ttt", "enc", boundsdata)
summary(ped)
@
Here, the results from \code{mediate.ped} function are a set of sharp
bounds. We see that for the compliers, the sharp bounds on ACME under
the treatment condition are informative as they do not cross 0.


\subsection{Crossover encouragement design}

The fourth experimental design included in the \pkg{mediation} package
is the crossover encouragement design.  Under this design,
subjects are exposed to two experiments, with each subject participating in each experiment. In the first experiment, the
treatment variable is randomized and the mediator and outcome
variables observed. In the second experiment, the treatment condition
is set to the opposite value from the first period, but an encouragement
is given to a randomly selected set of subjects
so that the mediator variable will take on the value
observed in the first experiment. Under this design, the ACME is
point identified for the set of subjects that are able to have their
mediator value manipulated (known as ``pliable units''). A crucial
identification assumption is that the first experiment does not
influence behavior in the second experiment. For this experimental
design the \code{mediate.ced} function calculates point estimates and
the bootstrap is used for estimates of uncertainty.

For illustration, we simulated data based on the identification
assumptions necessary for this design. \code{Y2} is the value of the
outcome variable in the second experiment, \code{M1} and \code{M2} are
the mediator values for the first and second experiment, \code{T1} is
the value of the treatment in the first experiment, and \code{Z}
indicates whether the subject's mediator value in the second
experiment is encouraged to take on the value opposite to that observed
in the first experiment. All variables are binary.
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
data("CEDdata", package = "mediation")
ced <- mediate.ced("Y2", "M1", "M2", "T1", "Z", CEDdata, sims = 100)
summary(ced)
@
The results from the \code{mediate.ced} function are point estimates
and confidence intervals for the ACME under the treatment and control
conditions. These estimates apply only to the pliable units. In this
example, both values of the ACME are positive but the 95\% confidence
intervals overlap with zero.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Analysis of causally dependent multiple mechanisms}
\label{sec:multimech}

Our discussion so far has focused on a single mediator, $M$.
Frequently, however, researchers take measurements for more than one
mediating variable.  Accounting for alternative
mechanisms is indeed crucial for the identification of the mechanism
of primary interest, especially when such mechanisms are causally not
independent. This is because the alternative dependent mediators
affect both the mediator of primary interest and the outcome variable,
which, by definition, violates the sequential ignorability assumption
(Assumption~\ref{asm:ignorable}).

\subsection{The methodology}

\citet{imai:yama:11} develop methods for dealing with multiple mediators based
on the current framework. We briefly review this framework. First, in the case of
causally unrelated multiple mediators, it turns out that there is no need to
fundamentally modify the current framework or estimation procedure. To see this,
suppose that there are multiple causally unrelated mediators, and one is interested in
estimating the causal mediation effects with respect to each of them.
In this scenario, note that for each mediator, the other mediators are neither
pre-treatment nor post-treatment confounders (since by construction
they have no causal effect on the mediator of interest). Therefore, one can consistently
estimate the desired effects by simply applying the \code{mediate} function successively
for the mediators as explained in Section~\ref{sec:mediate},
ignoring the existence of the other, causally unrelated, mediators each time. Likewise,
sensitivity analysis via the \code{medsens} function can be conducted for the
mediators of interest in the usual fashion. The \code{mediations} function can
be useful for such analysis.

Second, when the multiple mediators are causally related (or equivalently, when
one mediator acts as a post-treatment confounder for the other mediator
on the outcome), we need to expand the notational framework, and the analysis requires new assumptions.  Let $W_i(t)$ denote the vector of the potential
values of those alternative mediators given treatment status $t$. To
allow the causal dependence of both the primary mediator and outcome
on $W$, we write the potential mediator and outcome as $M_i(t,w)$ and
$Y_i(t,m,w)$, respectively. The observed values of these potential
response variables can then be expressed as $W_i = W_i(T_i)$, $M_i =
M_i(T_i, W_i(T_i))$, and $Y_i = Y_i(T_i, M_i(T_i,W_i(T_i)),
W_i(T_i))$.  The causal mediation effects can now be re-expressed using
this notation as,
\begin{eqnarray*}
    \delta_i(t) & = & Y_i(t, M_i(1,W_i(1)), W_i(t)) - Y_i(t, M_i(0,W_i(0)), W_i(t)),
\end{eqnarray*}
for $t = 0,1$. Note that this quantity represents the treatment
effects that are transmitted through the mediator of primary interest
$M_i$, irrespective of whether they also come through the alternative
mediators $W_i$ or not.  Therefore, the quantity of interest remains
unchanged from the previous sections, except that the existence of the
other mediators are now explicitly taken into consideration.

The framework of \citet{imai:yama:11} is based on the following varying
coefficient linear structural equations model,
\begin{eqnarray}
    M_i(t, w) & = & \alpha_2 + \beta_{2i}t + \xi_{2i}^\top w + \mu_{2i}^\top tw
        + \lambda_{2i}^\top x + \epsilon_{2i}, \label{eq:varsemM} \\
    Y_i(t, m, w) & = & \alpha_3 + \beta_{3i}t + \gamma_i m + \kappa_i tm
        + \xi_{3i}^\top w + \mu_{3i}^\top tw + \lambda_{3i}^\top x_i + \epsilon_{3i}, \label{eq:varsemY}
\end{eqnarray}
where $\E(\epsilon_{2i}) = \E(\epsilon_{3i}) = 0$ without loss of
generality.  Although these equations may resemble a traditional
linear structural equations model at a first glance, they are
considerably more flexible because the coefficients are all allowed to
vary arbitrarily across individual units.

\citet{imai:yama:11} propose two strategies for the analysis of the
average causal mediation effects, $\bar\delta(t) \equiv \E(\delta_i(t))$.
First, it can be shown that the ACME is point identified
under the above model and sequential ignorability (a weaker version allowing
for post-treatment confounding; see \citeauthor{imai:yama:11}) if the
{\it homogeneous interaction} assumption is satisfied. This additional
assumption is formally written as,
\begin{eqnarray*}
  Y_i(1,m,W_i(1)) - Y_i(0, m, W_i(0)) & = & B_i + Cm \label{eq:homoint}
\end{eqnarray*}
for any $m$. The assumption states that the degree of interaction between the
treatment and the primary mediator is constant across individual units, which
may or may not be plausible depending on the empirical context.

Second, when this assumption is violated, one can express the sharp
bounds on the ACME as functions of a parameter representing the degree
of the violation, and conduct a sensitivity analysis. The sensitivity
parameter here is the standard deviation of the coefficient on the
treatment-mediator interaction term, i.e.,
\begin{eqnarray*}
	\sigma & \equiv & \sqrt{\VAR(\kappa_i)},
\end{eqnarray*}
and the expression for the sharp bounds are given in \citet[Footnote
6]{imai:yama:11}.  Researchers can then analyze robustness to the
potential violation of the homogeneous interaction assumption by
examining how the location and width of the bounds vary as $\sigma$
changes.

The sensitivity analysis can also be formulated in terms of two
alternative sensitivity parameters, both based on coefficients of
determination as in the single mediator case (see
Section~\ref{subsec:medsens}). Specifically, we use the proportion of
the residual or total variance of the outcome variable that would be
explained by allowing the heterogeneity in the treatment-mediator
interaction in the outcome model. These parameters are formally
defined as
\begin{eqnarray}
  R^{2\ast} & = & \frac{\VAR(\tilde\kappa_i T_i
    M_i)}{\VAR(\eta_{3i}(T_i,M_i,W_i))} \quad {\rm and} \quad
  \widetilde{R}^2
  \ = \ \frac{\VAR(\tilde\kappa_i T_i M_i)}{\VAR(Y_i)}, \label{eq:R2}
\end{eqnarray}
where $\tilde{\kappa_i} = \kappa_i - \E(\kappa_i)$. Researchers may
find these parameters to be easier to interpret in substantive terms,
as they represent how important it would be to incorporate the
interaction heterogeneity in order to explain the variation in the
outcome model. \citet{imai:yama:11} show that these parameters have a
one-to-one relationship with $\sigma$, implying that the ACME can also
be written as a function of $R^{2\ast}$ or $\widetilde{R}^2$.


\subsection{Single experiment design}

The above framework has been implemented in the \pkg{mediation}
package as the \code{multimed} function. The function takes a data
frame containing the necessary variables (outcome, primary mediator,
alternative mediator, treatment, and pre-treatment covariates if any)
and outputs an object of class `\code{multimed}', a list consisting of
estimated bounds along with uncertainty estimates.  In the current
version, only a single post-treatment confounder is allowed, although
the theoretical framework accommodates more than one such confounder.

The functionality of \code{multimed} differs in important ways from
\code{mediate}. First, there is not a separate function for
sensitivity analysis. Instead, a sensitivity analysis is conducted
within the function along with the estimates of the mediation effects.
Second, the arguments for the \code{multimed} function are rather
different. Here, the names of the
outcome (\code{outcome}), first mediator (\code{med.main}),
second mediator (\code{med.alt}) and treatment (\code{treat})
variables are passed to the function along with a vector of
the names of the pre-treatment covariates to condition on
(\code{covariates}). In the \code{multimed} function,
inference can only be done with the nonparametric bootstrap.

To illustrate the use of the function we revisit the media framing example
in Section~\ref{sec:mediate}. Here, we use a different outcome variable
\code{immigr}, which is a five category measure of whether immigration
should be increased or decreased (treated as a continuous measure for the purpose
of illustration). The main mediator is the same composite measure of anxiety, \code{emo},
and the treatment and pre-treatment covariates are defined as before.
We now introduce an alternative mediator \code{p_harm}, which is an eight
category measure of the perceived economic harm of immigrants. The
reasoning behind the inclusion of this variable is that the media framing
treatment may also affect participants' opinion about immigrants by
changing their factual belief about the economic impact of increased
immigration, which may also affect the level of anxiety and therefore
confound the mediator-outcome relationship.
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
Xnames <- c("age", "educ", "gender", "income")
m.med <- multimed(outcome = "immigr", med.main = "emo", med.alt = "p_harm",
                  treat = "treat", covariates = Xnames,
                  data = framing, sims = 100)
summary(m.med)
@

The \code{summary} function produces two tables. The first table shows
the estimated ACME and total treatment effect and their confidence intervals
(default at 95\%) under the homogeneous interaction assumption.
Three variants of the ACME are shown: the ACME conditional on the
treatment group, the control group, and the weighted average of the two with
the weights being equal to the proportions of the treatment and control groups.
The second table shows key summary results from the sensitivity analysis
with respect to possible heterogeneity in treatment-mediator interactions.
Specifically, the table presents the values of $\sigma$ (column 1),
$R^{2\ast}$ (column 3), and $\widetilde{R}^2$ (column 5) at which the
estimated ACMEs equal zero. The remaining columns (2, 4 and 6) show
those values for the confidence bands of the three ACMEs.

The results from the \code{multimed} function can also be analyzed graphically
using the \code{plot} function. One can produce two types of plots, corresponding
to the two tables in the \code{summary} output.  First, one can plot the
point estimates under the homogeneous interaction assumption by setting the
\code{type} argument to \code{"point"}, as shown below. The output is in
Figure~\ref{fig:multimedpoint}.
<<eval=FALSE,fig=FALSE,echo=TRUE>>=
plot(m.med, type = "point")
@

\begin{figure}[!t]
\begin{center}
<<fig=TRUE,echo=FALSE>>=
plot(m.med, type = "point")
@
\caption{Graphical summary of the results from the \code{multimed} function
under the homogeneous interaction assumption.}
\label{fig:multimedpoint}
\end{center}
\end{figure}

Second, the results from the sensitivity analysis with respect to
$\sigma$, $R^{2\ast}$ or $\widetilde{R}^2$ can be plotted.
In this case, the \code{type} argument can be used to specify
which parameter(s) the estimated ACME should be plotted against.
The possible values are \code{"sigma"}, \code{"R2-residual"},
or \code{"R2-total"}. One can also choose the types of the ACME
from \code{"treated"}, \code{"control"} and \code{"average"}
via the \code{tgroup} argument. In the example below, we plot the
estimated ACME for both treatment and control conditions as a function
of $\sigma$ and $\widetilde{R}^2$. The output is in Figure~\ref{fig:multimedsens}.
<<eval=FALSE,fig=FALSE,echo=TRUE>>=
plot(m.med, type = c("sigma", "R2-total"), tgroup = c("treated", "control"))
@

\begin{figure}[!t]
\begin{center}
<<fig=TRUE,echo=FALSE>>=
oldpar <- par(mfrow = c(2,2))
plot(m.med, type = c("sigma", "R2-total"), tgroup = c("treated", "control"))
par(oldpar)
@
\caption{Graphical summary of sensitivity analysis using the \code{multimed} function.
Results as a function of $\sigma$ and $\widetilde{R}^2$.}
\label{fig:multimedsens}
\end{center}
\end{figure}


\subsection{Parallel design}

\citet[][Section 7]{imai:yama:11} show that the above framework can
also be applied to the data collected under the parallel design.  As
discussed in Section~\ref{subsec:parallel}, the parallel design
consists of two separate experiments to which subjects are randomly
selected. In one experiment, only the treatment is randomized and the
researcher observes the mediator and outcome variables, whereas in the
other experiment both the treatment and mediator are randomly
manipulated and the outcome variable is measured and recorded.

Unlike the single experiment design, one need not assume any kind of
sequential ignorability under the parallel design. This is due to the
existence of the second experimental group where both the treatment
and mediator are randomly assigned. This implies that there is no need
to explicitly incorporate an alternative mediator in the analysis, for
any kind of post-treatment confounding (observed or unobserved) is
allowed to exist in the natural causal process to identify the ACME
under the parallel design using the proposed framework.

To apply the framework for the parallel design, one can use the
\code{multimed} function with slightly modified inputs. First, the
\code{med.alt} is no longer needed because the estimation framework is
agnostic about what particular alternative mechanisms are confounding
the mediator-outcome relationship. Second, one need to supply an
additional variable (\code{experiment}) indicating whether units are
assigned to the experiment with (\code{1}) or without (\code{0})
mediator manipulations. Finally, the \code{design} argument must be
set to \code{"parallel"}, as opposed to the default value of
\code{"single"}. For illustration, we again use the simulated data
introduced in Section~\ref{subsec:parallel} and apply the varying
coefficient linear structural equations framework.
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
m.med.para <- multimed(outcome = "out", med.main = "med", treat = "ttt",
                       experiment = "manip", design = "parallel",
                       data = boundsdata, sims = 100)
summary(m.med.para)
@
The \code{plot} function can also be used in the same manner as in the
single experiment case. The key differences between the above analysis
and Section~\ref{subsec:parallel} are threefold. First, the point
estimates in the first summary table here only rely on the homogeneous
interaction assumption, not the stronger assumption of no
interaction. Second, however, the estimates here depend on the
additional assumption of additivity, which is embodied in the varying
coefficient structural equations model in
Equations~\ref{eq:varsemM}~and~\ref{eq:varsemY}. The additivity
assumption may not be plausible in some applications and needs to be
carefully examined. Finally, the second summary table shows the result
of the sensitivity analysis where the homogeneous interaction
assumption is gradually relaxed until an arbitrary interaction
heterogeneity is allowed. This may be preferred to the nonparametric
bounds approach in Section~\ref{subsec:parallel}, which offers less
nuanced information about how robust the point estimates are to the
violation of the identification assumption.


\section{Causal mediation analysis with treatment noncompliance}
\label{sec:ivmediate}

A common complication in randomized controlled trials is treatment
noncompliance. That is, experimental subjects may not follow the
assigned treatment and instead choose to take another treatment. This
poses a serious challenge to causal analysis in randomized experiments
because, even though the {\it assigned} treatment is randomized by the
researcher, the {\it actual} treatment is selected by the subjects
themselves, quite possibly depending on certain characteristics
unobserved to the researcher. In this section, we provide an overview
of the method developed by \citet{yama:13} to cope with the challenge
of treatment noncompliance in the context of causal mediation
analysis. The estimation method is implemented by the \code{ivmediate}
function in our package, as discussed below.

\subsection{The methodology}
\label{subsec:ivmethod}

Analysis of causal mediation in the presence of treatment
noncompliance requires additional notation and assumptions. Let
$Z_i\in\{0,1\}$ denote the binary indicator of treatment assignment,
or encouragement, for unit $i$. Then, we use $T_i(z) \in \{0,1\}$ to
denote the potential treatment which unit $i$ would actually receive
when the unit were assigned to the treatment ($z=1$) or control
($z=0$) condition. The observed treatment for unit $i$ can then be
written as $T_i = T_i(Z_i)$. Following the standard practice in the
analysis of treatment noncompliance \citep[see][]{angr:imbe:rubi:96},
we assume that the treatment assignment itself does not directly
affect either the mediator or the outcome. Under these {\it exclusion
  restrictions}, we can write the potential mediator and outcome as,
respectively, $M_i(t)$ and $Y_i(t,m)$. Likewise, the observed
mediator and outcome can respectively be expressed as $M_i = M_i(T_i)$
and $Y_i(T_i, M_i(T_i))$. Another standard assumption we make is the
{\it monotonicity} of treatment reception. That is, we assume that
there is no unit in the population who would only take the treatment
if assigned to the control condition. This assumption is thus often
called the ``no defier'' assumption and can be written in our notation
as $T_i(0)\leq T_i(1)$ for any $i$.

The final and key assumption is {\it local sequential ignorability},
which can be formally written as follows.
\begin{assumption}[Local Sequential Ignorability;
  \citealt{yama:13}] \label{asm:localSI} \spacingset{1.25}
\begin{eqnarray}
 \{Y_i(t^\prime,m), M_i(t), T_i(z)\} & \indep & Z_i  \mid X_i = x, \label{eq:localSI1}\\
 Y_i(t^\prime,m) & \indep & M_i(t) \mid T_i = t, T_i(0)=0, T_i(1)=1, X_i = x, \label{eq:localSI2}
\end{eqnarray}
for $t = 0,1$, $z=0,1$, and all $x$ and $m$ in the support of
$X_i$ and $M_i$, respectively.
\end{assumption}
This assumption is closely related to but slightly weaker than the
{\it global} sequential ignorability introduced earlier
(Assumption~\ref{asm:ignorable}). Equation~\ref{eq:localSI1} implies
that the treatment assignment, $Z_i$, must be either randomized or can
be regarded to be ``as-if randomized'' after conditioning on a set of
observed pre-encouragement covariates,
$X_i$. Equation~\ref{eq:localSI2}, on the other hand, requires that
the observed mediator in each treatment condition be as-if randomized
among the {\it compliers}, i.e., those units whose actual treatment
would always agree with their assigned treatment ($T_i(0) = 0$ and
$T_i(1)=1$). In other words, Assumption~\ref{asm:localSI} implies that
sequential ignorability must hold locally among the compliers.

In the context of treatment noncompliance, researchers typically focus
on the estimation of the intent-to-treat (ITT) effect and the local
average treatment effect (LATE) among the compliers. The former
quantity refers to the average causal effect of the treatment
assignment itself (regardless of the actual treatment) on the outcome,
whereas the latter represents the average effect of the actual
treatment on the outcome among the compliers. Here, we consider the
problem of decomposing each of these effects into the direct and
indirect portions with respect to the mediator of interest. First, the
ITT effect can be written as the sum of these two quantities.

{\it Mediated intent-to-treat (ITT) effect:}
\begin{eqnarray}
\bar\lambda(z) & \equiv & \E[Y_i(T_i(z),M_i(T_i(1))) - Y_i(T_i(z),M_i(T_i(0)))]. \label{eq:medITT}
\end{eqnarray}

{\it Unmediated ITT effect:}
\begin{eqnarray}
\bar\mu(z) & \equiv & \E[Y_i(T_i(1),M_i(T_i(z))) - Y_i(T_i(0),M_i(T_i(z)))]. \label{eq:unmedITT}
\end{eqnarray}

Second, the LATE can be decomposed into the following two quantities.

{\it Local average causal mediation effect (LACME):}
\begin{eqnarray}
\bar\phi(t) & \equiv & \E[Y_i(t,M_i(1)) - Y_i(t,M_i(0))\mid T_i(0)=0,T_i(1)=1]. \label{eq:LACME} 
\end{eqnarray}

{\it Local average natural direct effect (LANDE):}
\begin{eqnarray}
\bar\psi(t) & \equiv & \E[Y_i(1,M_i(t)) - Y_i(0,M_i(t))\mid T_i(0)=0,T_i(1)=1]. \label{eq:LANDE}
\end{eqnarray}

\citet{yama:13} shows that the above four quantities (each defined for
$z=0,1$ or $t=0,1$) can be nonparametrically identified under the set
of assumptions introduced thus far in this section. More specifically,
each of $\bar\lambda(z)$, $\bar\mu(z)$, $\bar\phi(t)$ and
$\bar\psi(t)$ can be expressed in terms of the conditional
expectations and distributions of the observed outcome, mediator and
treatment variables. These effects of interest can therefore be
estimated by fitting regression models to each of those conditionals
(i.e., a model for $Y_i$ given $M_i$, $T_i$, $Z_i$ and $X_i$, a model
for $M_i$ given $T_i$, $Z_i$ and $X_i$, and a model for $T_i$ given
$Z_i$ and $X_i$) and calculating the sample analogues of the
identified quantities. Uncertainty estimates can then be obtained via
simulation-based methods. The \code{ivmediate} function in our package
implements this procedure for a selection of models, as we illustrate
with an empirical example in the next section.


\subsection{Illustration}
\label{subsec:ivjobs}

We illustrate the use of the \code{ivmediate} function through an
analysis of data from the Job Search Intervention Study \citep[JOBS
II; see][for more information about the study]{Vinokur:1995}. JOBS II
was a randomized job training intervention for unemployed workers
which aimed at increasing the prospect of reemployment and improving
mental health of the job seekers involved in the study. A random
subsample of the participants were offered to receive the treatment of
job-skills workshops which covered skills for job search and coping
with stress, while the remaining participants were assigned to the
control group and sent a booklet containing job-search tips. Despite
the random assignment of the treatment conditions, some participants
failed to comply with their assigned treatment status. Namely, 39\% of
those who were offered to participate in job-skills workshops actually
did not enroll. None of the workers in the control group, on the other
hand, participated in workshops, so noncompliance in this study was
strictly one-sided. Several outcome measures were taken after the
completion of the program via a survey. Here, we focus on the question
of whether participation in the workshops improved the mental health
of the unemployed workers (measured with a continuous scale based on
the Hopkins Symptom Checklist) by increasing their self-confidence in
job search ability (measured by a dichotomous indicator).

The relevant portion of the JOBS II data is included as part of the
\pkg{mediation} package.
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
data("jobs", package = "mediation")
@
We begin by estimating three regression models. First, we model the
actual treatment status (\code{comply}) conditional on the assigned
treatment (\code{treat}) and observed pre-encouragement covariates
(\code{sex}, \code{age}, \code{marital}, \code{nonwhite}, \code{educ},
and \code{income}). Here we postulate a linear probability model for
the probability of actually participating in the job-skills workshops.
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
a <- lm(comply ~ treat + sex + age + marital + nonwhite + educ + income, 
        data = jobs)
@
Next, we model the mediator (\code{job_dich}) and outcome
(\code{depress2}) as functions of causally precedent variables. That
is, we fit a logit model for the dichotomous mediator conditional on
the actual treatment, assigned treatment, and observed
pre-encouragement covariates, and a linear regression model for the
outcome as a function of the mediator, actual treatment, assigned
treatment, and pre-encouragement covariates.
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
b <- glm(job_dich ~ comply + treat + sex + age + marital + 
          nonwhite + educ + income, data = jobs, family = binomial)
c <- lm(depress2 ~ job_dich * (comply + treat) + sex + age + marital + 
          nonwhite + educ + income, data = jobs)
@
Generally, it is wise to include an interaction term between the
actual and assigned treatment in these models in order to allow for
the regression functions to vary across treatment conditions. Here,
however, the interaction term is not needed because noncompliance is
strictly one-sided (i.e., there is no observation for which
\code{comply} equals \code{1} and \code{treat} equals \code{0}). These
three models can in theory be of any form, as in the case of
estimating the ACME via the \code{mediate} function. However, the
current version of the \code{ivmediate} function only supports binary
outcome models (fitted via \code{glm} with \code{family = binomial})
and linear models (fitted via \code{lm}).

After fitting the three models, the LACME and LANDE can be easily
estimated via the \code{ivmediate} function, which takes those three
fitted model objects as main inputs.\footnote{Currently,
  \code{ivmediate} only estimates the complier average effects and is
  not capable of estimating the mediated and unmediated ITT
  effects. This limitation will be addressed in a future update.}
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
out <- ivmediate(a, b, c, sims = 100, boot = FALSE, 
                 enc = "treat", treat = "comply", mediator = "job_dich")
@
In the \code{ivmediate} function, the analyst must specify the names
of the assigned treatment, actual treatment, and mediator (as they
appear in the data frames used to fit the three models) via the
\code{enc}, \code{treat}, and \code{mediator} arguments,
respectively. The analyst can also set the number of simulations used
for the construction of confidence intervals (\code{sims}) as well as
whether the nonparametric bootstrap should be used for the confidence
intervals (\code{boot}; if \code{FALSE}, the quasi-Bayesian Monte
Carlo method will be used). Regardless of this choice, only the
intervals for the confidence levels specified by the \code{conf.level}
argument (defaults to \code{.95}) will be calculated and retained in
the output object (unless the \code{long} option is set to
\code{TRUE}, in which case the entire set of simulation draws will be
available).

An important remark is required on the computational demand of the
\code{ivmediate} function. The estimation method of \cite{yama:13}
involves numerical integration over the support of the mediator for
each observation in each simulation iteration. This implies that, if
the mediator is continuous (i.e., if \code{model.m} is an `\code{lm};
object) and the model contains any pre-encouragement covariate, the
calculation of confidence intervals via \code{ivmediate} is extremely
time-consuming. To ameliorate the situation, we have implemented a
parallel execution of the simulation routine across multiple CPU
cores. To utilize this function, the analyst should set the
\code{multicore} option to \code{TRUE} and \code{mc.cores} to the
desired number of cores available on his or her machine. Using $N$
cores will approximately decrease the total computation time by the
factor of $N$. This option, unfortunately, is implemented via
\code{mclapply} and therefore not currently available on Windows
machines.

The output of \code{ivmediate} can be summarized using the usual
\code{summary} function.
<<keep.source=TRUE, echo = TRUE, results = verbatim>>=
summary(out)
@
The resulting summary table shows the point estimates of the LACME for
the control and treatment baseline conditions ($\bar\phi(0)$ and
$\bar\phi(1)$), the LANDE for the control and treatment baselines
($\bar\psi(0)$ and $\bar\psi(1)$), and the total LATE for the
compliers in the first column, as well as their confidence intervals
in the next two columns (unless the \code{ci} argument in
\code{ivmediate} is set to \code{FALSE}, which makes the evaluation
much faster). The confidence level is by default set at the first
level used in the original \code{ivmediate} run, but can be changed to
any level used via the \code{conf.level} option. Here, we observe that
the indirect effect of job-skills workshop on depressive symptoms via
increase in job-search self-efficacy is on average negative and barely
statistically significant among the compliers, although the overall
negative effect of treatment on depression among the compliers misses
the conventional level of statistical significance.

The results can also be represented graphically via the \code{plot}
function.
<<eval=FALSE,fig=FALSE,echo=TRUE>>=
plot(out)
@

\begin{figure}[t!]
\begin{center}
<<fig=TRUE,echo=FALSE>>=
plot(out)
@
\caption{Graphical display of results from the \code{ivmediate} function.}
\label{fig:ivmediate}
\end{center}
\end{figure}

Again, the plotted confidence level can be changed via the
\code{conf.level} option to any of the levels used in the original
\code{ivmediate} call. The \code{effect.type} option can be used to
specify which of the estimated quantities will be plotted (the default
plots everything). Most of the standard graphical parameters can be
set in the usual manner.

\section{Concluding remarks}
\label{sec:conclude}

In this paper, we described the functionalities of the \pkg{mediation}
package, which allows applied researchers to conduct causal mediation
analysis in a variety of settings.  The package implements a general
algorithm for estimating causal mediation effects with a variety of
statistical models.  Since the causal mediation analysis under the
standard research design requires a strong and untestable assumption,
we recommend sensitivity analysis which is also implemented in our
package.  Finally, this strong identification assumption can be
relaxed by adopting alternative research designs and we show how to
use our package to conduct causal mediation analysis under those new
designs.  The literature on causal mediation analysis is fast growing,
and we expect new methods to be developed in the coming years.  We
hope that the \pkg{mediation} package can serve as a platform to which
other researchers can add new methods.


\clearpage
\pdfbookmark[1]{References}{References}
\bibliography{v59i05}

\end{document}