\documentclass[article,nojss]{jss} %% -- LaTeX packages and custom commands --------------------------------------- %% recommended packages \usepackage{thumbpdf,lmodern} %% other packages \usepackage{amsmath} \usepackage{amssymb} \usepackage{makecell} \usepackage{rotating} % \usepackage{longtable} % \usepackage{pdflscape} %% new custom commands \newcommand{\mat}[1]{\boldsymbol{#1}} % matrix \newcommand{\vect}[1]{\boldsymbol{#1}} % vector \newcommand{\obs}[1]{\mathbf{#1}} % observation \newcommand{\cor}{\mathrm{cor}} % correlation \newcommand{\argmin}{\mathop{\text{argmin}}} % argmin %% stronger penalty of widow and orphan lines \widowpenalty=10000 \clubpenalty=10000 %% -- Article metainformation (author, title, ...) ----------------------------- %% - \author{} with primary affiliation %% - \Plainauthor{} without affiliations %% - Separate authors by \And or \AND (in \author) or by comma (in \Plainauthor). %% - \AND starts a new line, \And does not. \author{Andreas Alfons \\ Erasmus University Rotterdam \And N\"{u}fer Y. Ate\c{s} \\ Sabanc{\i} University \And Patrick J.F. Groenen \\ Erasmus University Rotterdam} \Plainauthor{Andreas Alfons, Nufer Y. Ates, Patrick J.F. Groenen} %% - \title{} in title case %% - \Plaintitle{} without LaTeX markup (if any) %% - \Shorttitle{} with LaTeX markup (if any), used as running title \title{Robust Mediation Analysis: The \proglang{R} Package \pkg{robmed}} \Plaintitle{Robust Mediation Analysis: The R Package robmed} \Shorttitle{Robust Mediation Analysis: The \proglang{R} Package \pkg{robmed}} %% - \Abstract{} almost as usual \Abstract{ Mediation analysis is one of the most widely used statistical techniques in the social, behavioral, and medical sciences. Mediation models allow to study how an independent variable affects a dependent variable indirectly through one or more intervening variables, which are called mediators. The analysis is often carried out via a series of linear regressions, in which case the indirect effects can be computed as products of coefficients from those regressions. Statistical significance of the indirect effects is typically assessed via a bootstrap test based on ordinary least-squares estimates. However, this test is sensitive to outliers or other deviations from normality assumptions, which poses a serious threat to empirical testing of theory about mediation mechanisms. The \proglang{R} package \pkg{robmed} implements a robust procedure for mediation analysis based on the fast-and-robust bootstrap methodology for robust regression estimators, which yields reliable results even when the data deviate from the usual normality assumptions. Various other procedures for mediation analysis are included in package \pkg{robmed} as well. Moreover, \pkg{robmed} introduces a new formula interface that allows to specify mediation models with a single formula, and provides various plots for diagnostics or visual representation of the results. } %% - \Keywords{} with LaTeX markup, at least one required %% - \Plainkeywords{} without LaTeX markup (if necessary) %% - Should be comma-separated and in sentence case. \Keywords{mediation analysis, robust statistics, bootstrap, \proglang{R}} \Plainkeywords{mediation analysis, robust statistics, bootstrap, R} %% - \Address{} of at least one author %% - May contain multiple affiliations for each author %% (in extra lines, separated by \emph{and}\\). %% - May contain multiple authors for the same affiliation %% (in the same first line, separated by comma). \Address{ Andreas Alfons\\ Econometric Institute\\ Erasmus School of Economics\\ Erasmus University Rotterdam\\ PO Box 1738\\ 3000DR Rotterdam, The Netherlands\\ E-mail: \email{alfons@ese.eur.nl}\\ URL: \url{https://personal.eur.nl/alfons/} } %\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{Robust Mediation Analysis: The R Package robmed} %\VignetteDepends{robmed} %\VignetteKeywords{mediation analysis, robust statistics, bootstrap, R} %\VignettePackage{robmed} \begin{document} % ------------- % knitr options % ------------- <>= library("knitr") options(prompt="R> ", continue = "+ ", width = 75, useFancyQuotes = FALSE) opts_chunk$set(fig.path = "knitr-figures/figure-", fig.align = "center", fig.lp = "fig:", fig.pos = "t", tidy = FALSE) render_sweave() # use Sweave environments set_header(highlight = "") # do not use the Sweave.sty package @ %% -- Introduction ------------------------------------------------------------- %% - In principle "as usual". %% - But should typically have some discussion of both _software_ and _methods_. %% - Use \proglang{}, \pkg{}, and \code{} markup throughout the manuscript. %% - If such markup is in (sub)section titles, a plain text version has to be %% added as well. %% - All software mentioned should be properly \cite-d. %% - All abbreviations should be introduced. %% - Unless the expansions of abbreviations are proper names (like "Journal %% of Statistical Software" above) they should be in sentence case (like %% "generalized linear models" below). This vignette is an updated version of \citet{alfons22c}. Specifically, it has been adapted to reflect the change in default values for the bootstrap confidence intervals introduced in version~1.1.0. \section{Introduction} \label{sec:intro} In the social, behavioral, and medical sciences, mediation analysis is a popular statistical technique for studying how an independent variable affects a dependent variable indirectly through an intervening variable called a mediator. For instance, \citet{erreygers18} find that poor sleep quality in adolescents explains cyberbullying through anger, and \citet{gaudiano10} report that the believability of hallucinations after treatment for psychotic disorders mediates the relationship between the type of treatment and distress after treatment. Figure~\ref{fig:simple} shows a diagram of the mediation model in its simplest form, which is given by the equations \begin{align} M &= i_{1} + a X + e_{1}, \label{eq:simpleM} \\ Y &= i_{2} + b M + c X + e_{2}, \label{eq:simpleY} \\ Y &= i_{3} + c' X + e_{3}, \label{eq:simpleY'} \end{align} where $Y$ denotes the dependent variable, $X$ the independent variable, $M$ the hypothesized mediator, $i_{1}$, $i_{2}$, $i_{3}$, $a$, $b$, $c$, and $c'$ are regression coefficients to be estimated, and $e_{1}$, $e_{2}$, and $e_{3}$ are random error terms. The coefficients $c$ and $c'$ are called the direct effect and total effect, respectively, of $X$ on $Y$. The product of coefficients $ab$ is called the indirect effect of $X$ on $Y$ and constitutes the main parameter of interest in mediation analysis. Under the usual assumption of independent and normally distributed error terms $e_{1}$, $e_{2}$, and $e_{3}$, it holds that $c' = ab + c$ \citep[e.g.,][]{mackinnon95}, and the same holds for the corresponding ordinary least-squares (OLS) estimates. \begin{figure}[t] \begin{center} \includegraphics[width=0.5\textwidth]{figures/mediation-simple} \end{center} \caption{Diagram visualizing a simple mediation model.} \label{fig:simple} \end{figure} The indirect effect $ab$ can be interpreted in the following way: a change of one unit in $X$ explains a change of $a$ units in $M$, which in turn explains a change of $ab$ units in $Y$. It is therefore an important question whether or not to standardize the variables in some way. If the scales of the variables differ by orders of magnitude, certain coefficients may dominate the relationship $c' = ab + c$. However, variables used in mediation analysis often measure constructs that are aggregated from several rating-scale items (e.g., on a scale of 1--5). In such cases, a researcher may prefer not to standardize to keep the interpretation in terms of the original measurement scales. Similarly, a researcher may prefer not to standardize a binary $X$ variable to keep the interpretation in terms of a change from one group to the other. For a more detailed discussion on whether or not to use standardized coefficients in mediation analysis, we refer to \citet[p.~519]{hayes18}. Mediation analysis goes back to \citet{judd81} and \citet{baron86}, however their stepwise approach has been superseded by approaches that focus on the indirect effect. \citet{sobel82} proposed a test for the indirect effect that assumes a normal distribution of the corresponding estimator, which is an unrealistic assumption for a product of coefficients. \citet{bollen90}, \citet{shrout02}, \citet{mackinnon04}, and \citet{preacher04, preacher08} therefore advocate for a bootstrap test based on OLS regressions, which is the most frequently applied method for mediation analysis according to literature reviews of \citet{wood08} and \citet{alfons22a}. More recently, several authors have emphasized that outliers or deviations from normality assumptions are detrimental to the reliability and validity of mediation analysis, and introduced more robust procedures. \citet{zu10} propose a bootstrap test after an initial data cleaning step, whereas \citet{yuan14} suggest a bootstrap test based on median regressions. \citet{alfons22a} combine the robust MM-estimator of regression \citep{yohai87} with the the fast-and-robust bootstrap \citep{salibian02, salibian08}, and demonstrate that this procedure outperforms the aforementioned approaches for a wide range of error distributions (with different levels of skewness and kurtosis) and outlier configurations. Various software packages are available for mediation analysis. The macro \code{INDIRECT} \citep{preacher04, preacher08} and its successor \code{PROCESS} \citep{hayes18} for \proglang{SPSS} \citep{SPSS} and \proglang{SAS} \citep{SAS} implement the bootstrap test based on OLS regressions. For the statistical computing environment \proglang{R} \citep{R}, the general purpose packages \pkg{psych} \citep{psych} and \pkg{MBESS} \citep{MBESS} for statistical analysis in the behavioral sciences also provide functionality for a bootstrap test in mediation analysis. Package \pkg{WRS2} \citep{mair20} is a collection of robust statistical methods, which offers mediation analysis via the bootstrap test after data cleaning proposed by \citet{zu10}. Other packages concentrate on mediation analysis or specific aspects thereof. Package \pkg{mediation} \citep{tingley14} is focused on causal mediation analysis in a potential outcome framework, and package \pkg{medflex} \citep{steen17} implements recent developments in mediation analysis embedded within the counterfactual framework. Bayesian multilevel mediation models can be estimated with package \pkg{bmlm} \citep{bmlm}, while package \pkg{mma} \citep{mma} offers functionality for general multiple mediation analysis with continuous or binary/categorical variables. In addition, general purpose software for structural equation modeling such as \proglang{Mplus} \citep{Mplus} or the \proglang{R} packages \pkg{sem} \citep{sem} and \pkg{lavaan} \citep{rosseel12} can be used for mediation analysis. The former also allows for maximum likelihood estimation with skew-normal, $t$, or skew-$t$ error distributions \citep{asparouhov16}. Despite the growing number of \proglang{R} packages that address mediation analysis, there are no common interfaces or class structures. Instead, each package uses its own way of specifying mediation models and storing the results. Additionally, only package \pkg{WRS2} contains some functionality for robust mediation analysis. Package \pkg{robmed} \citep{robmed} aims to address these issues. Its main functionality is the robust bootstrap procedure proposed in \citet{alfons22a}, which is highly robust to outliers and other deviations from normality assumptions. Furthermore, \pkg{robmed} implements various other methods of estimating mediation models, as well as different tests for the indirect effects. All implemented methods share the same function interface and a clear class structure. In addition, \pkg{robmed} introduces a simple formula interface for specifying mediation models, and provides several plots for diagnostics or visualization of the results from mediation analysis. % Finally, we take advantage of the \proglang{R} integration plug-in for % \proglang{SPSS} and provide the extension bundle \pkg{ROBMED}, which allows % to use the main functionality of the \proglang{R} package \pkg{robmed} from % within \proglang{SPSS}. Package \pkg{robmed} is available on CRAN (the Comprehensive \proglang{R} Archive Network, \url{https://CRAN.R-project.org/}) and can be installed from the \proglang{R} console with the following command: % <>= install.packages("robmed") @ The rest of the paper is organized as follows. Section~\ref{sec:methodology} discusses various extensions of the simple mediation model, as well as the implemented methodology for estimation and testing. Implementation details are provided in Section~\ref{sec:implementation}, while Section~\ref{sec:illustrations} illustrates the use of package \pkg{robmed} with code examples. The final Section~\ref{sec:summary} concludes. %% -- Manuscript --------------------------------------------------------------- %% - In principle "as usual" again. %% - When using equations (e.g., {equation}, {eqnarray}, {align}, etc. %% avoid empty lines before and after the equation (which would signal a new %% paragraph. %% - When describing longer chunks of code that are _not_ meant for execution %% (e.g., a function synopsis or list of arguments), the environment {Code} %% is recommended. Alternatively, a plain {verbatim} can also be used. %% (For executed code see the next section.) % ----------- % methodology % ----------- \section{Methodology} \label{sec:methodology} We first provide overviews of the mediation models and estimation techniques supported by package \pkg{robmed} in Sections~\ref{sec:models} and~\ref{sec:methods}, respectively. Section~\ref{sec:ROBMED} then gives technical details of the robust bootstrap procedure of \citet{alfons22a}. \subsection{Extensions of the simple mediation model} \label{sec:models} The simple mediation model \eqref{eq:simpleM}--\eqref{eq:simpleY'} can easily be extended in various ways, for instance with (i) multiple parallel mediators, (ii) multiple serial mediators, and (iii) multiple independent variables to be mediated. All those extensions may include additional control variables (covariates) as well. \subsubsection{Parallel multiple mediator model} In the parallel multiple mediator model, an independent variable $X$ is hypothesized to influence a dependent variable $Y$ through multiple mediators $M_{1}, \dots, M_{k}$, while the mediator variables do not influence each other. A diagram of the model is displayed in Figure~\ref{fig:parallel}, and the corresponding regression equations are \begin{align} M_{j} &= i_{j} + a_{j} X + e_{j}, \qquad j = 1, \dots, k, \label{eq:parallelM} \\ Y &= i_{k+1} + b_{1} M_{1} + \dots + b_{k} M_{k} + c X + e_{k+1}, \label{eq:parallelY} \\ Y &= i_{k+2} + c' X + e_{k+2}, \label{eq:parallelY'} \end{align} where $i_{1}, \dots, i_{k+2}$, $a_{1}, \dots, a_{k}$, $b_{1}, \dots, b_{k}$, $c$, and $c'$ are regression coefficients to be estimated, and $e_{1}, \dots, e_{k+2}$ are random error terms. With the usual assumptions of independent and normally distributed error terms, we now have that $c' = \sum_{j = 1}^{k} a_{j} b_{j} + c$. The main parameters of interest are the individual indirect effects $a_{1}b_{1}, \dots, a_{k}b_{k}$, and it can also be of interest to make pairwise comparisons between the individual indirect effects or their absolute values \citep[e.g.,][Chapter~5.1]{hayes18} if the hypothesized mediators are scaled similarly. \begin{figure}[b] \begin{center} \includegraphics[width=0.6\textwidth]{figures/mediation-parallel} \end{center} \caption{Diagram visualizing a parallel multiple mediator model.} \label{fig:parallel} \end{figure} \subsubsection{Serial multiple mediator model} A distinctive feature of the serial multiple mediator model is that the hypothesized mediators $M_{1}, \dots, M_{k}$ may influence each other in a sequential manner, unlike the parallel multiple mediator model in which the mediators do not affect one another. Figure~\ref{fig:serial} contains a diagram of the model with two serial mediators, while the model in its general form is given by \begin{align} \begin{split} M_{1} &= i_{1} + a_{1} X + e_{1}, \\ M_{2} &= i_{2} + d_{21} M_{1} + a_{2} X + e_{2}, \\ &\hspace{0.5em} \vdots \\ M_{k} &= i_{k} + d_{k1} M_{1} + \dots + d_{k,k-1} M_{k-1} + a_{k} X + e_{k}, \end{split} \label{eq:serialM} \\ Y &= i_{k+1} + b_{1} M_{1} + \dots + b_{k} M_{k} + c X + e_{k+1}, \label{eq:serialY} \\ Y &= i_{k+2} + c' X + e_{k+2}, \label{eq:serialY'} \end{align} where $i_{1}, \dots, i_{k+2}$, $a_{1}, \dots, a_{k}$, $d_{j1}, \dots, d_{j,j-1}$, $j = 2, \dots, k$, $b_{1}, \dots, b_{k}$, $c$, and $c'$ are regression coefficients to be estimated, and $e_{1}, \dots, e_{k+2}$ are random error terms. It is easy to see that the serial multiple mediator model quickly grows in complexity with increasing number of mediators due to the combinatorial increase in indirect paths through the mediators (the number of indirect paths is given by $\sum_{j = 1}^{k} {k \choose j}$ for $k$ serial mediators). In package \pkg{robmed}, it is therefore only implemented for two and three mediators to maintain a focus on easily interpretable models. Here, we only discuss the model for two serial mediators, and we refer to \citet[p.169--171]{hayes18} for a diagram and a description of the various indirect effects in the case of three serial mediators. \begin{figure}[t] \begin{center} \includegraphics[width=0.7\textwidth]{figures/mediation-serial} \end{center} \caption{Diagram visualizing a serial multiple mediator model with two mediators.} \label{fig:serial} \end{figure} For two serial mediators, the three indirect effects $a_{1}b_{1}$ ($X \rightarrow M_{1} \rightarrow Y$), $a_{2}b_{2}$ ($X \rightarrow M_{2} \rightarrow Y$), and $a_{1}d_{21}b_{2}$ ($X \rightarrow M_{1} \rightarrow M_{2} \rightarrow Y$) are the main parameters of interest. However, not all pairwise comparisons of the indirect effects may be meaningful (even if the mediators are scaled similarly), as $a_{1}d_{21}b_{2}$ can be expected to be different in magnitude from $a_{1}b_{1}$ and $a_{2}b_{2}$. Finally, we have that $c' = a_{1}b_{1} + a_{2}b_{2} + a_{1}d_{21}b_{2} + c$ under the usual assumptions of independent and normally distributed error terms. \subsubsection{Multiple independent variables to be mediated} Instead of having multiple mediators, one can also allow for multiple independent variables $X_{1}, \dots, X_{l}$ to influence the dependent variable $Y$ through a hypothesized mediator $M$. The resulting model is visualized in Figure~\ref{fig:multiple} and defined by the equations \begin{align} M &= i_{1} + a_{1} X_{1} + \dots + a_{l} X_{l} + e_{1}, \label{eq:multipleM} \\ Y &= i_{2} + b M + c_{1} X_{1} + \dots + c_{l} X_{l} + e_{2}, \label{eq:multipleY} \\ Y &= i_{3} + c_{1}' X_{1} + \dots + c_{l}' X_{l} + e_{3}, \label{eq:parallelY'} \end{align} where $i_{1}, i_{2}, i_{3}$, $a_{1}, \dots, a_{l}$, $b$, $c_{1}, \dots, c_{l}$, and $c_{1}', \dots, c_{l}'$ are regression coefficients to be estimated, and $e_{1}$, $e_{2}$, and $e_{3}$ are random error terms. The indirect effects $a_{1}b, \dots, a_{l}b$ are the main parameters of interest, and with the direct effects $c_{1}, \dots, c_{l}$ and total effects $c_{1}', \dots, c_{l}'$, it holds that $c_{j}' = a_{j}b + c_{j}$, $j = 1, \dots, l$, under the usual independence and normality assumptions on the error terms. If the independent variables are on a comparable scale, it can also be of interest to make pairwise comparisons between the indirect effects or their absolute values. \begin{figure}[t!] \begin{center} \includegraphics[width=0.525\textwidth]{figures/mediation-multiple} \end{center} \caption{Diagram visualizing a mediation model with multiple independent variables.} \label{fig:multiple} \end{figure} This model is commonly used when the hypothesized mediator is the main (explanatory) variable of interest and its antecedents are being studied. Furthermore, an important special case of this model occurs when a categorical independent variable is represented by a group of dummy variables. \subsubsection{Control variables} In many study designs, it may be necessary to isolate the effects of the independent variables of interest from other factors. For instance, consider a study on whether exercise-induced feelings such as physical exhaustion mediate the relationship between physical activity and depression \citep[cf.][]{pickett12}. If the participants vary in demographics such as age and gender, the researcher may need to control for the effects of those variables \citep[e.g., older people may be less capable of engaging in strenuous activities;][]{cerin09}. Such control variables should be added to all regression equations of a mediation model. This means that there is no intrinsic difference between independent variables of interest and control variables in terms of the model or its estimation. The difference is purely conceptual in nature: for the control variables, the estimates of the direct and indirect paths are not of particular interest to the researcher. Package \pkg{robmed} therefore allows to specify control variables separately from the independent variables of interest. Only for the latter, results for the indirect effects are included in the output. While we omitted control variables from the above equations and diagrams for notational simplicity, package \pkg{robmed} supports additional control variables in all implemented models. \subsubsection{More complex models} The models described above do not exist in isolation and some of them can be combined. For instance, \pkg{robmed} supports parallel and serial multiple mediator models with multiple independent variables of interest. Other variations of the mediation model, such as the moderated mediation and mediated moderation models \citep[e.g.,][]{muller05} are out of scope for this paper. They are not yet implemented in package \pkg{robmed} but we aim to add support in future versions. \subsection{Overview of implemented methods} \label{sec:methods} \begin{sidewaystable} \centering \begin{tabular}{llp{1.6cm}lp{8.75cm}ccccc} \hline\noalign{\smallskip} \code{test} & \code{method} & \code{robust} & \code{family} & Description & \rotatebox{90}{Simple mediation} & \rotatebox{90}{Parallel mediators} & \rotatebox{90}{Serial mediators} & \rotatebox{90}{\makecell[l]{Multiple independent \\ variables of interest}} & \rotatebox{90}{Control variables} \\ \noalign{\smallskip}\hline\noalign{\medskip} % \code{"boot"} & \code{"regression"} & \code{"MM"} or $\enskip$ \code{TRUE} & & MM-regression \citep{yohai87} and the fast-and-robust bootstrap \citep{salibian02} are used to construct a confidence interval \citep{alfons22a}. & \checkmark & \checkmark & \checkmark & \checkmark & \checkmark \\ \noalign{\smallskip} % \code{"boot"} & \code{"regression"} & \code{"median"} & & A bootstrap confidence interval is computed based on median regressions \citep{yuan14}. & \checkmark & \checkmark & \checkmark & \checkmark & \checkmark \\ \noalign{\smallskip} \code{"boot"} & \code{"regression"} & \code{FALSE} & \code{"gaussian"} & % A bootstrap confidence interval is computed based on OLS regressions \citep{bollen90, shrout02, mackinnon04, preacher04, preacher08}. & \checkmark & \checkmark & \checkmark & \checkmark & \checkmark \\ \noalign{\smallskip} % \code{"boot"} & \code{"regression"} & \code{FALSE} & \code{"select"} & Regression models with normal, skew-normal, $t$, or skew-$t$ error distributions are estimated \citep{azzalini13}, and a bootstrap confidence interval is computed. The best fitting error distribution is selected via BIC. & \checkmark & \checkmark & \checkmark & \checkmark & \checkmark \\ \noalign{\smallskip} % \code{"boot"} & \code{"covariance"} & \code{TRUE} & & Following multivariate winsorization, a bootstrap confidence interval is computed for coefficient \mbox{estimation} via the maximum likelihood estimator of the covariance matrix \citep{zu10}. & \checkmark & & & & \\ \noalign{\smallskip} % \code{"boot"} & \code{"covariance"} & \code{FALSE} & & A bootstrap confidence interval is computed for coefficient estimation via the maximum likelihood \mbox{estimator} of the covariance matrix. & \checkmark & & & & \\ \noalign{\smallskip} % \noalign{\medskip}\hline\noalign{\bigskip} \end{tabular} \caption{Overview of bootstrap procedures for mediation analysis implemented in the \proglang{R} package \pkg{robmed}, and corresponding argument values to use in function \code{test\_mediation()}.} \label{tab:methods-boot} \end{sidewaystable} \begin{sidewaystable} \centering \begin{tabular}{llp{1.6cm}lp{8.5cm}ccccc} \hline\noalign{\smallskip} \code{test} & \code{method} & \code{robust} & \code{family} & Description & \rotatebox{90}{Simple mediation} & \rotatebox{90}{Parallel mediators} & \rotatebox{90}{Serial mediators} & \rotatebox{90}{\makecell[l]{Multiple independent \\ variables of interest}} & \rotatebox{90}{Control variables} \\ \noalign{\smallskip}\hline\noalign{\medskip} % \code{"sobel"} & \code{"regression"} & \code{"MM"} or $\enskip$ \code{TRUE} & & A variation of the Sobel test based on coefficient estimation via MM-regressions. & \checkmark & & & & \checkmark \\ % \code{"sobel"} & \code{"regression"} & \code{"median"} & & A variation of the Sobel test based on coefficient estimation via median regressions. & \checkmark & & & & \checkmark \\ \noalign{\medskip} % \code{"sobel"} & \code{"regression"} & \code{FALSE} & \code{"gaussian"} & A variation of the Sobel test based on coefficient estimation via OLS regressions. & \checkmark & & & & \checkmark \\ \noalign{\medskip} % \code{"sobel"} & \code{"regression"} & \code{FALSE} & \code{"select"} & A variation of the Sobel test in which regression models with normal, skew-normal, $t$, or skew-$t$ error distributions are estimated, with the best fitting distribution being selected via BIC. & \checkmark & & & & \checkmark \\ \noalign{\medskip} % \code{"sobel"} & \code{"covariance"} & \code{TRUE} & & Following multivariate winsorization, a variation of the Sobel test is applied based on coefficient \mbox{estimation} via the maximum likelihood estimator of the covariance matrix. & \checkmark & & & & \\ % \code{"sobel"} & \code{"covariance"} & \code{FALSE} & & A variation of the Sobel test based on coefficient \mbox{estimation} via the maximum likelihood estimator of the covariance matrix. & \checkmark & & & & \\ \noalign{\medskip} % \noalign{\medskip}\hline\noalign{\bigskip} \end{tabular} \caption{Overview of variations of the Sobel test \citep{sobel82} for mediation analysis implemented in the \proglang{R} package \pkg{robmed}, and corresponding argument values to use in function \code{test\_mediation()}. Those tests are included for benchmarking purposes and are not recommended for empirical analyses.} \label{tab:methods-sobel} \end{sidewaystable} While package \pkg{robmed} is focused on the fast-and-robust bootstrap procedure for mediation analysis introduced by \citet{alfons22a}, various other methods are available as well. Tables~\ref{tab:methods-boot} and~\ref{tab:methods-sobel} provide an overview of the available methods together with the corresponding argument values to use in function \code{test_mediation()}, which implements mediation analysis in \pkg{robmed}. A bootstrap test is considered state-of-the art for mediation analysis, with many authors advocating to use the bootstrap with ordinary least-squares (OLS) estimation of the coefficients in the mediation model \citep{bollen90, shrout02, mackinnon04, preacher04, preacher08}. However, a bootstrap test can easily be applied to other methods of estimation. For instance, the mediation model can also be estimated via regressions with more flexible error distributions such as the skew-normal, $t$, or skew-$t$ distributions \citep[see][for maximum likelihood estimation of such regression models]{azzalini13}. Note that a similar procedure for mediation analysis, but using structural equation modeling, has been suggested in \citet{asparouhov16}. Package \pkg{robmed} goes a step further in that it allows to select the best fitting error distribution via the Bayesian information criterion (BIC) \citep{schwarz78}. In addition, other robust methods for mediation analysis are implemented in \pkg{robmed}. \citet{yuan14} proposed a bootstrap test that replaces OLS estimation with median regression. \citet{zu10} proposed to first winsorize the data via a Huber M-estimator of the covariance matrix, and then to perform a bootstrap test on the cleaned data with coefficient estimation based on the maximum likelihood covariance matrix. A discussion of advantages and disadvantages of those approaches, as well as a comparison in extensive simulation studies, can be found in \citet{alfons22a}. Besides bootstrap tests, variations of the Sobel test \citep{sobel82} are implemented in \pkg{robmed}. The Sobel test was originally proposed for maximum likelihood estimation of structural equation models, of which mediation models are a special case. It assumes a normal distribution of the indirect effect estimator and simplifies the calculation of the standard error by taking a first- or second-order approximation. This test has been criticized in the literature for its incorrect assumptions \citep[e.g.,][]{mackinnon02}, and a bootstrap test is generally preferred. Nevertheless, the Sobel test can easily be generalized to other \mbox{estimation} methods, and it is implemented in \pkg{robmed} for all estimation procedures of the (simple) mediation model. We emphasize that the Sobel tests are implemented for comparisons in benchmarking experiments and that they are not recommended for empirical analyses. \subsection{Fast-and-robust bootstrap test for mediation analysis} \label{sec:ROBMED} The robust procedure of \citet{alfons22a} follows the state-of-the-art bootstrap approach for testing mediation \citep{bollen90, shrout02, mackinnon04, preacher04, preacher08}, but it replaces OLS regressions with the robust MM-estimator of regression \citep{yohai87} and the standard bootstrap with the fast-and-robust bootstrap \citep{salibian02, salibian08}. \subsubsection{Robust regression} For a response variable $Y$, a $(p+1)$-dimensional random vector $\vect{X}$ in which the first component is fixed at~1, and a random error term $\varepsilon \sim N(0, \sigma^{2})$, the linear regression model is given by \begin{equation*} Y = \vect{X}^\top \vect{\beta} + \varepsilon. \end{equation*} Denoting the corresponding observations by $(y_{i}, \obs{x}_{i}^\top)^\top$, $i = 1, \dots, n$, the MM-estimate of regression with loss function $\rho$ \citep{yohai87} is defined as \begin{equation} \label{eq:MM} \vect{\hat{\beta}}_{n} = \argmin_{\beta} \sum_{i=1}^{n} \rho \left( \frac{r_{i}(\vect{\beta})}{\hat{\sigma}_{n}} \right), \end{equation} where $r_{i}(\vect{\beta}) = y_{i} - \obs{x}_{i}^\top \vect{\beta}$, $i = 1, \dots, n$, are the residuals, and $\hat{\sigma}_{n}$ is an initial estimate of the residual scale. We take $\hat{\sigma}_{n}$ from a highly robust but inefficient S-estimator of regression \citep{rousseeuw84, salibian06}, i.e., \begin{equation*} \hat{\sigma}_{n} = \min_{\vect{\beta}} \hat{\sigma}_{n}(\vect{\beta}), \end{equation*} where $\hat{\sigma}_{n}(\vect{\beta})$ is defined implicitly as the solution of \begin{equation*} \frac{1}{n} \sum_{i=1}^{n} \rho_{S} \left( \frac{r_{i}(\vect{\beta})}{\hat{\sigma}_{n}(\vect{\beta})} \right) = \delta \end{equation*} with loss function $\rho_{S}$ and $\delta = E \left[ \rho_{S} \left( \frac{X}{\sigma} \right) \right]$ for %a normal distribution a random variable $X \sim N(0, \sigma^{2})$. For both $\rho$ and $\rho_{S}$, we use Tukey's bisquare loss function defined as \begin{equation} \label{eq:loss} \rho(x) = \left\{ \begin{array}{ll} \displaystyle \frac{x^{6}}{6c^{4}} - \frac{x^4}{2c^{2}} + \frac{x^{2}}{2}, & \text{if } |x| \leq c \\ \noalign{\smallskip} \displaystyle \frac{c^{2}}{6}, & \text{if } |x| > c. \end{array} \right. \end{equation} The value of the tuning constant $c$ in $\rho_{S}$ determines the robustness of the MM-estimator, and the value of $c$ in $\rho$ determines the efficiency \citep[cf.][]{yohai87}. By default, we set $c = 1.54764$ in $\rho_{S}$ for maximal robustness and $c = 3.443689$ in $\rho$ for 85\% efficiency at the model with normally distributed errors. Taking the derivative of the objective function in~\eqref{eq:MM} and equating the derivative to~$\vect{0}$ yields the system of equations \begin{equation} \label{eq:EstEq} \sum_{i=1}^{n} \rho' \left( \frac{r_{i}(\vect{\beta})}{\hat{\sigma}_{n}} \right) \obs{x}_{i} = \vect{0}. \end{equation} With weights \begin{equation*} w_{i} = \frac{\rho'(r_{i}(\vect{\beta})/\hat{\sigma}_{n})}% {r_{i}(\vect{\beta})/\hat{\sigma}_{n}}, \qquad i = 1, \dots, n, \end{equation*} the system of equations in \eqref{eq:EstEq} can be rewritten as a weighted version of the normal equations: \begin{equation} \label{eq:wEstEq} \sum_{i=1}^{n} w_{i} r_{i}(\vect{\beta}) \obs{x}_{i} = \vect{0}. \end{equation} Therefore, the MM-estimator can be seen as a weighted least-squares estimator with data-dependent weights. Those weights indicate how much each observation deviates, as an observation with a large residual (large deviation) receives a weight of 0 or close to 0, while an observation with a small residual (small deviation) receives a weight close to 1. The loss function from \eqref{eq:loss} and the resulting weight function are displayed in Figure~\ref{fig:loss}, which also includes the loss function and weight function from OLS regression for comparison. \begin{figure}[t!] \begin{center} \includegraphics[width=0.975\textwidth]{figures/loss} \caption{Loss function (left) and corresponding weight function (right) for OLS regression and the robust MM-estimator of regression.} \label{fig:loss} \end{center} \end{figure} \subsubsection{Fast-and-robust bootstrap} Here we briefly present the main idea of the fast-and-robust bootstrap. For a detailed discussion and complete derivations, the reader is referred to \citet{salibian02} and \citet{salibian08}. There are two concerns with bootstrapping robust estimators: \begin{enumerate} \item Outliers may be oversampled in some bootstrap samples to the extent that those samples contain more outliers than the robust estimator can handle, in which case bootstrap confidence intervals become unreliable. \item Robust estimators typically have higher computational complexity than their nonrobust counterparts, which is amplified when computing many bootstrap replicates. \end{enumerate} In many empirical applications of mediation analysis, the first concern is unlikely to be an issue when using the MM-estimator of regression due to its high robustness. We therefore use the fast-and-robust bootstrap mainly for its computational efficiency, although the extra robustness does provide additional peace of mind. To derive the fast-and-robust bootstrap for the MM-estimator, note that the solution of \eqref{eq:wEstEq} can be written as \begin{equation*} \vect{\hat{\beta}}_{n} = \left( \sum_{i=1}^{n} w_{i} \obs{x}_{i} \obs{x}_{i}^\top \right)^{-1} \sum_{i=1}^{n} w_{i} \obs{x}_{i} y_{i}. \end{equation*} For a bootstrap sample $(y_{i}^{*}, {\obs{x}_{i}^{*}}^\top)^\top$, $i = 1, \dots, n$, one can compute $r_{i}^{*} = y_{i}^{*} - {\obs{x}_{i}^{*}}^\top \vect{\hat{\beta}}_{n}$ and $w_{i}^{*} = \rho'(r_{i}^{*}/\hat{\sigma}_{n}) / (r_{i}^{*}/\hat{\sigma}_{n})$ for $i = 1, \dots, n$. It is important to note that $\vect{\hat{\beta}}_{n}$ and $\hat{\sigma}_{n}$ are computed in advance on the original sample such that the robustness weights $w_{i}^{*}$ are inherited from the respective observations in the original sample. Then only a weighted least-squares fit is computed on the bootstrap sample to obtain \begin{equation} \label{eq:WLS} \vect{\hat{\beta}}_{n}^{\text{WLS}} = \left( \sum_{i=1}^{n} w_{i}^{*} \obs{x}_{i}^{*} {\obs{x}_{i}^{*}}^\top \right)^{-1} \sum_{i=1}^{n} w_{i}^{*} \obs{x}_{i}^{*} y_{i}^{*}. \end{equation} However, using \eqref{eq:WLS} for the bootstrap distribution would not capture all the variability in the MM-estimator, as the robustness weights are not recomputed on the bootstrap samples. Nevertheless, a linear correction of the coefficients can be applied to overcome this loss of variability. The correction matrix only needs to be computed once based on the original sample and is given by \begin{equation*} \mat{K}_{n} = \left( \sum_{i=1}^{n} \rho''(r_{i}/\hat{\sigma}_{n}) \obs{x}_{i} \obs{x}_{i}^\top \right)^{-1} \sum_{i=1}^{n} w_{i} \obs{x}_{i} \obs{x}_{i}^\top. \end{equation*} Then the fast-and-robust bootstrap replicate on the given bootstrap sample is computed as \begin{equation} \label{eq:FRB} \vect{\hat{\beta}}_{n}^{*} = \vect{\hat{\beta}}_{n} + \mat{K}_{n} \left( \vect{\hat{\beta}}_{n}^{\text{WLS}} - \vect{\hat{\beta}}_{n} \right). \end{equation} Since the MM-estimator $\vect{\hat{\beta}}_{n}$ is consistent for $\vect{\beta}$ \citep{yohai87}, $\sqrt{n} (\vect{\hat{\beta}}_{n}^{*} - \vect{\hat{\beta}}_{n})$ has the same asymptotic distribution as $\sqrt{n} (\vect{\hat{\beta}}_{n} - \vect{\beta})$ \citep{salibian08, salibian02}. \subsubsection{Bootstrapping the indirect effects in mediation analysis} For simplicity, we focus on the indirect effect in the simple mediation model from~\eqref{eq:simpleM}--\eqref{eq:simpleY'}. Similar calculations apply to the indirect effects in the mediation models described in Section~\ref{sec:models}. On each bootstrap sample, \eqref{eq:FRB} is used to obtain estimates $\hat{a}$, $\hat{b}$, and $\hat{c}$ of the coefficients in~\eqref{eq:simpleM} and~\eqref{eq:simpleY}, and therefore estimates $\hat{a}\hat{b}$ of the indirect effect. Note that we do not perform the regression corresponding to~\eqref{eq:simpleY'} and instead estimate the total effect by $\hat{c}' = \hat{a}\hat{b} + \hat{c}$. With the empirical distribution of the indirect effect over the bootstrap samples, we construct a percentile-based confidence interval. By default, we report a simple percentile confidence interval \citep{davison97}. Furthermore, we advocate to report the mean over the bootstrap distribution as the final point estimates of the indirect effect. % -------------- % implementation % -------------- \section{Package contents and implementation} \label{sec:implementation} We describe the included data set in Section~\ref{sec:data}, introduce the formula interface for specifying mediation models in Section~\ref{sec:formula}, and briefly discuss the main functions as well as the class structure of package \pkg{robmed} in Section~\ref{sec:classes}. Moreover, we load the package and the data in order to use them in code examples. % <>= library("robmed") data("BSG2014") @ \subsection{Example data} \label{sec:data} The \code{BSG2014} data included in package \pkg{robmed} come from a business simulation game that was played by senior business administration students as part of a course at a Western European university. The simulation game was played twice, and a survey was conducted in three waves (before the first game, in between the two games, and after the second game). A total of 354 students formed 92 randomly assigned teams, and the responses of the individual students were aggregated to the team level. Leaving out teams with less than 50 percent response rate yields a sample size of $n = 89$ teams. Below, we provide an overview of the variables that are used later on in the case studies in Section~\ref{sec:illustrations}. For a complete description of the data, we refer to its \proglang{R} help file, which can be accessed from the console with \code{?BSG2014}. \begin{description} % \item{\code{ValueDiversity}:} Using the short Schwartz’s value survey \citep{lindeman05}, the team members rated ten items on the importance of certain values (1~= not important, 10 = highly important). For each value item, the coefficient of variation of the individual responses across team members was computed, and the resulting coefficients of variation were averaged across the value items. % \item{\code{TaskConflict}:} Using the intra-group conflict scale of \citet{jehn95}, the team members rated four items on the presence of conflict regarding the work on a 5-point scale (1~= none, 5~= a lot). The individual responses were aggregated by taking the average across items and team members. % \item{\code{TeamCommitment}:} The team members indicated the extent to which they agree or disagree with four items on commitment to the team, which are based on \citet{mowday79}, using a 5-point scale (1~= strongly disagree, 5~= strongly agree). The individual responses were aggregated by taking the average across items and team members. % \item{\code{TeamScore}:} The team performance scores on the second game were computed at the end of the simulation through a mix of five objective performance measures: return on equity, earnings-per-share, stock price, credit rating, and image rating. The computation of the scores is handled by the simulation game software, and details can be found in \citet{mathieu09}. % \item{\code{SharedLeadership}:} Following \citet{carson07}, every team member assessed each of their peers on the question of ``To what degree does your team rely on this individual for leadership?'' using a 5-point scale (1~= not at all, 5~= to a very large extent). The leadership ratings were aggregated by taking the sum and dividing it by the number of pairwise relationships among team members. % \item{\code{AgeDiversity}:} Following \citet{harrison07}, age diversity was operationalized by the coefficient of variation of the team members' ages. % \item{\code{GenderDiversity}:} Gender diversity was measured with Blau's index, $1 - \sum_{j} p_{j}^{2}$, where $p_{j}$ is the proportion of team members in the $j$-th category \citep{blau77}. % \item{\code{ProceduralJustice}:} Based on the intra-unit procedural justice climate scale of \citet{li09}, the team members indicated the extent to which they agree or disagree with four items on a 5-point scale (1~= strongly disagree, 5~= strongly agree). The individual responses were aggregated by taking the average across items and team members. % \item{\code{InteractionalJustice}:} Using the intra-unit interactional justice climate scale of \citet{li09}, the team members indicated the extent to which they agree or disagree with four items on a 5-point scale (1~= strongly disagree, 5~= strongly agree). The individual responses were aggregated by taking the average across items and team members. % \item{\code{TeamPerformance}:} Following \citet{hackman86}, the team members indicated the extent to which they agree or disagree with four items on the team's functioning, using a 5-point scale (1~= strongly disagree, 5~= strongly agree). The individual responses were aggregated by taking the average across items and team members. % \end{description} To gain some insight into the distribution of those variables (including their ranges), we extract them from the data set and produce a summary: % <<>>= keep <- c("ValueDiversity", "TaskConflict", "TeamCommitment", "TeamScore", "SharedLeadership", "AgeDiversity", "GenderDiversity", "ProceduralJustice", "InteractionalJustice", "TeamPerformance") summary(BSG2014[, keep]) @ % For instance, the objective team performance scores in variable \code{TeamScore} range from~$\Sexpr{min(BSG2014$TeamScore)}$ to~$\Sexpr{max(BSG2014$TeamScore)}$. \subsection{Formula interface} \label{sec:formula} The equations in the mediation model follow a specific structure regarding which variable is used as the response variable and which variables are the explanatory variables. Some \proglang{R} packages for mediation analysis, e.g., \pkg{mediation} \citep{tingley14}, require the user to specify one formula for each equation, which can be tedious and prone to mistakes, in particular for models with multiple mediators and multiple independent variables or control variables. Other packages, e.g., \pkg{psych} \citep{psych} or \pkg{MBESS} \citep{MBESS}, do not provide a formula interface at all, despite formulas being the standard way of describing models in \proglang{R}. For package \pkg{robmed}, we designed a formula interface that builds upon the standard formula interface in \proglang{R}, but allows to specify the mediation model with a single formula. As usual, the dependent variable is defined on the left hand side of the formula, and the independent variable is given on the right hand side. In addition, the functions \code{m()} and \code{covariates()} can be used on the right hand side to define the hypothesized mediators and any control variables, respectively. If multiple mediators are supplied, function \code{m()} provides the argument \code{.model}, which accepts the values \code{"parallel"} for parallel mediators (the default) and \code{"serial"} for serial mediators. The corresponding wrapper functions \code{parallel_m()} and \code{serial_m()} are available for convenience. For example, a simple mediation model can be defined as follows (see also the case study in Section~\ref{sec:simple}): % <>= TeamCommitment ~ m(TaskConflict) + ValueDiversity @ % An example for a serial multiple mediator model is specified with the following formula (see also the case study in Section~\ref{sec:serial}), where the serial mediators are listed in consecutive order from left to right: % <>= TeamScore ~ serial_m(TaskConflict, TeamCommitment) + ValueDiversity @ % The formula specification for an example of a parallel multiple mediator model with control variables is given by (see also the case study in Section~\ref{sec:parallel}): % <>= TeamPerformance ~ parallel_m(ProceduralJustice, InteractionalJustice) + SharedLeadership + covariates(AgeDiversity, GenderDiversity) @ % Note that different variables within \code{m()}, \code{parallel_m()}, \code{serial_m()}, and \code{covariates()} are separated by commas. \subsection{Main functions and class structure} \label{sec:classes} The two main functions of package \pkg{robmed} are \code{fit_mediation()}, which implements various methods for the estimation of a mediation model, and \code{test_mediation()}, which performs statistical tests on the indirect effects in the mediation model. Furthermore, \pkg{robmed} follows a clear object-oriented design using \code{S3} classes \citep{chambers92}. Function \code{fit_mediation()} is mainly intended to be used internally by \code{test_mediation()}, but it is also useful for a user who wants to compare different tests on the indirect effects for the same method of estimation, such that the estimation on the given sample only has to be performed once. It returns an object inheriting from class \code{"fit_mediation"}. The currently available subclasses are \code{"reg_fit_mediation"} if the mediation model was estimated via a series of regressions, and \code{"cov_fit_mediation"} if the model was estimated based on the covariance matrix of the involved variables. We expect most users to find it more convenient to use \code{test_mediation()} directly in order to perform model estimation and testing the indirect effects with one function call. See Tables~\ref{tab:methods-boot} and~\ref{tab:methods-sobel} for an overview of which argument values to use in \code{test_mediation()} for the various available methods. Furthermore, function \code{robmed()} is a wrapper function for the fast-and-robust bootstrap test of \citet{alfons22a}. The results are returned as an object inheriting from class \code{"test_mediation"}. The currently available subclasses are \code{"boot_test_mediation"} for bootstrap tests, and \code{"sobel_test_mediation"} for tests based on the normal approximation of \citet{sobel82}. Among other information, the component \code{fit} stores the estimation results as an object inheriting from class \code{"fit_mediation"}. Objects of class \code{"boot_test_mediation"} also contain a component \code{reps}, which stores the bootstrap replicates as an object of class \code{"boot"}, as returned by function \code{boot()} from package \pkg{boot} \citep{boot}. It should be noted that the internal use of function \code{boot()} implies that the user can easily take advantage of parallel computing to reduce computation time. Functions \code{fit_mediation()} and \code{test_mediation()} are implemented as generic functions. Two methods are available for both functions: one method uses the formula interface described in Section~\ref{sec:formula}, while the default method provides an alternative way of specifying mediation models. Additionally, \code{test_mediation()} has a method for objects inheriting from class \code{"fit_mediation"}, as returned by \code{fit_mediation()}. The default methods take the data set as their first argument in order to work nicely with the pipe operator, i.e., \code{|>} introduced in \proglang{R} version~4.1.0 or \code{\%>\%} from package \pkg{magrittr} \citep{magrittr}. Arguments \code{x}, \code{y}, \code{m}, and \code{covariates} take character, integer, or logical vectors to select the independent variables, the dependent variable, the hypothesized mediators, and any additional control variables, respectively, from the data set. Note that this interface offers various ways to select the variables in a programmable manner. In case of multiple mediators, argument \code{model} allows to specify whether multiple mediators are treated as parallel or serial mediators. Package \pkg{robmed} provides various accessor functions to extract relevant information from the returned objects, such as \code{coef()} and \code{confint()} methods. In addition, it contains the plot functions \code{weight_plot()} and \code{ellipse_plot()} for diagnostics, as well as \code{ci_plot()} to visualize confidence intervals and \code{density_plot()} to plot density estimates of the indirect effect estimators. %% -- Illustrations ------------------------------------------------------------ %% - Virtually all JSS manuscripts list source code along with the generated %% output. The style files provide dedicated environments for this. %% - In R, the environments {Sinput} and {Soutput} - as produced by Sweave() or %% or knitr using the render_sweave() hook - are used (without the need to %% load Sweave.sty). %% - Equivalently, {CodeInput} and {CodeOutput} can be used. %% - The code input should use "the usual" command prompt in the respective %% software system. %% - For R code, the prompt "R> " should be used with "+ " as the %% continuation prompt. %% - Comments within the code chunks should be avoided - these should be made %% within the regular LaTeX text. % ------------- % illustrations % ------------- \section[Illustrations: Using package robmed]% {Illustrations: Using package \pkg{robmed}} \label{sec:illustrations} We demonstrate the use of package \pkg{robmed} in three illustrative mediation analyses using the included data set \code{BSG2014} (see Section~\ref{sec:data}). While the package and the data have already been loaded in Section~\ref{sec:implementation}, we store the seed to be used for the random number generator in an object, as it will be needed in all examples for the purpose of replicating the results. % <<>>= seed <- 20150601 @ The following subsections provide examples for a simple mediation model (Section~\ref{sec:simple}), a serial multiple mediator model (Section~\ref{sec:serial}), as well as a parallel multiple mediator model with additional control variables (Section~\ref{sec:parallel}). \vspace{5ex} % avoid that the header is left dangling at the bottom of the page \subsection{Simple mediation} \label{sec:simple} In the first code example, we replicate parts of the empirical example of \citet{alfons22a}. The illustrative hypothesis to be investigated is that task conflict mediates the relationship between team value diversity and team commitment. Using \pkg{robmed}'s formula interface (see Section~\ref{sec:formula}), we specify a simple mediation model with the dependent variable \code{TeamCommitment} on the left hand side. On the right hand side, we have the hypothesized mediator \code{TaskConflict}, which is wrapped in a call to function \code{m()}, as well as the independent variable \code{ValueDiversity}. As we will compare the robust bootstrap test of \citet{alfons22a} with the OLS bootstrap test \citep[e.g.,][]{preacher04, preacher08, hayes18}, we store the formula object for later use. % <<>>= f_simple <- TeamCommitment ~ m(TaskConflict) + ValueDiversity @ Next, we perform the two bootstrap tests using function \code{test_mediation()}. As usual for functions that fit models, we supply the model specification and the data via the \code{formula} and \code{data} arguments. By default, \code{test_mediation()} fits the mediation model via regressions (argument \code{method = "regression"}) and performs a bootstrap test for the indirect effect (argument \code{test = "boot"}) with 5000 bootstrap replications (argument \code{R = 5000}). In that case, setting \code{robust = TRUE} (the default) or \code{robust = "MM"} specifies the robust bootstrap procedure of \citet{alfons22a}, while \code{robust = FALSE} yields the nonrobust OLS bootstrap test. Before each call to \code{test_mediation()}, we set the seed of the random number generator. % <>= set.seed(seed) robust_boot_simple <- test_mediation(f_simple, data = BSG2014, robust = TRUE) set.seed(seed) ols_boot_simple <- test_mediation(f_simple, data = BSG2014, robust = FALSE) @ % Other estimation methods for a bootstrap test can be specified via a combination of arguments, as outlined in Table~\ref{tab:methods-boot}. Function \code{test_mediation()} returns an object inheriting from class \code{"test_mediation"}. The corresponding \code{summary()} method shows the relevant information on the fitted models and emphasizes the significance tests of the total, direct, and indirect effects. For bootstrap tests, the information displayed by \code{summary()} by default stays within the bootstrap framework. For effects other than the indirect effect, asymptotic tests are performed using the normal approximation of the bootstrap distribution. That is, the sample mean and the sample standard deviation of the bootstrap replicates are used for asymptotic $z$~tests. Furthermore, bootstrap estimates of all effects are shown in addition to the estimates on the original data. At the bottom of the output, the indirect effect is summarized by the estimate on the original data (column \code{Data}), the bootstrap estimate (i.e., the sample mean of the bootstrap replicates; column \code{Boot}), and the lower and upper limits of the confidence interval (columns \code{Lower} and \code{Upper}, respectively). % <>= summary(robust_boot_simple) @ % The above results are similar to those reported in \citet{alfons22a}, but they are not identical due to a change in the random number generator in \proglang{R} version~3.6.0 and a change in the default type of bootstrap confidence intervals in \pkg{robmed} version~1.1.0. Specifically, the robust bootstrap test detects a significant indirect effect, as the confidence interval is strictly negative (albeit barely so at the 95\% confidence level). This negative indirect effect is composed of a positive effect of value diversity on task conflict (see the output of the first regression model), and a negative effect of task conflict on team commitment (see the output of the second regression model). For further interpretation, recall that value diversity is measured as a coefficient of variation averaged over various value dimensions, and that task conflict and team commitment are measured as averages of items on a 5-point rating scale. On average, an increase in value diversity by one relative standard deviation explains an increase in task conflict by about 0.32 points, which in turn explains a decrease in team commitment by about 0.11 points. Furthermore, we observe that the direct effect of value diversity on team commitment is not significantly different from 0, meaning that value diversity affects team commitment only via the indirect path through task conflict. In the typology of mediations of \citet{zhao10}, we find indirect-only mediation. Note that the output corresponding to the regression models is similar to that of the \code{summary()} method for \code{"lmrob"} objects from package \pkg{robustbase} \citep{robustbase}, but it is shortened as the output is already rather long. In particular, we emphasize that the indices of potential outliers are displayed for each regression model. Those potential outliers should always be investigated further, as they may be interesting observations that could lead to new insights when studied separately \citep[see][for a detailed discussion on outliers in the mediation model]{alfons22a}. Moreover, when the summary output for the robust bootstrap procedure of \citet{alfons22a} is printed, by default also a diagnostic plot is shown that allows to detect deviations from normality assumptions. Keep in mind that this procedure uses the robust MM-estimator of regression \citep{yohai87, salibian02}, which assigns robustness weights to all observations. Those weights can take any value in the interval $[0, 1]$, with lower values indicating a higher degree of deviation. For a varying threshold on the horizontal axis, the diagnostic plot displays how many observations have a weight below this threshold. The plot is thereby split into separate panels for negative and positive residuals. For comparison, a reference line is drawn for the expected percentages under normally distributed errors. Figure~\ref{fig:summary} shows the plot for this example. For the regression of the hypothesized mediator (\code{TaskConflict}) on the independent variable in the top row of the plot, it reveals much more downweighted observations with positive residuals than expected and fewer with negative residuals. This indicates right skewness with a heavy upper tail. It is possible to suppress the plot by setting \code{plot = FALSE} in \code{summary()}. Then function \code{weight_plot()} can be used to create the diagnostic plot. In this example, Figure~\ref{fig:summary} can also be produced with the commands below. % <>= weight_plot(robust_boot_simple) + scale_color_manual("", values = c("black", "#00BFC4")) + theme(legend.position = "top") @ It should also be noted that the output from \code{summary()} is structured in a similar way as the output of the widely-used macro \code{PROCESS} \citep{hayes18}, which implements the OLS bootstrap test for conditional process models such as the mediation model. The intention is to facilitate the use of package \pkg{robmed} for users of the \code{PROCESS} macro. While \code{PROCESS} constructs a bootstrap confidence interval for the indirect effect, it reports estimates on the original data and the usual normal-theory $t$~tests for all other effects. In \pkg{robmed}, the same can be achieved by setting the argument \code{type = "data"} in the \code{summary()} method. The results from the regressions are then summarized in the usual way, as shown below for the OLS bootstrap. % <<>>= summary(ols_boot_simple, type = "data") @ % Unlike the robust bootstrap test above, the OLS bootstrap does not detect a significant indirect effect, since the confidence interval covers 0. Due to the influential heavy tail indicated by the diagnostic plot in Figure~\ref{fig:summary}, the results of the robust bootstrap test can be considered more reliable. Methods for common generic functions to extract information from objects are implemented in \pkg{robmed}, such as a \code{coef()} method to extract the relevant effects of the mediation model, and \code{confint()} to extract confidence intervals of those effects. % <<>>= coef(robust_boot_simple) confint(robust_boot_simple) @ % While the confidence intervals in this example do not add much in terms of interpretation over the output of \code{summary()}, the latter reports significance tests instead of confidence intervals for the effects other than the indirect effect. For researchers who prefer to report confidence intervals, the \code{confint()} method allows to easily extract this information. For objects corresponding to bootstrap tests (class \code{"boot_test_mediation"}), argument \code{type} of the \code{coef()} method allows to specify whether to extract the bootstrap estimates (\code{"boot"}, the default) or the estimates on the original data (\code{"data"}). Similarly, argument \code{type} of the \code{confint()} method allows to specify whether the confidence intervals for the effects other than the indirect effect should be bootstrap confidence intervals (\code{"boot"}, the default) or normal theory intervals based on the original data (\code{"data"}). % <<>>= coef(ols_boot_simple, type = "data") confint(ols_boot_simple, type = "data") @ % In addition, argument \code{parm} allows to specify which coefficients or confidence intervals to extract. % <<>>= coef(robust_boot_simple, parm = "Indirect") confint(robust_boot_simple, parm = "Indirect") @ % While the bootstrap tests implemented in \code{test_mediation()} construct a confidence interval for the indirect effect based on a pre-specified confidence level $1-\alpha$, function \code{p_value()} allows to analyze the bootstrap distribution and extract the smallest significance level $\alpha$ for which the $(1-\alpha)\cdot 100\%$ confidence interval of the indirect effect does not contain 0. % <>= p_value(robust_boot_simple, parm = "Indirect") p_value(ols_boot_simple, parm = "Indirect") @ % It should be noted that depending on the seed of the random number generator, the $p$~value for the robust bootstrap test may fall below or above the common 5\% significance level. But given the arbitrariness of this threshold, we find that robust bootstrap test shows at least weak evidence against the null hypothesis of no mediation, whereas the OLS bootstrap fails to do so. Several plots are implemented to visualize the results of mediation analysis. They can be applied to each object individually, but it is also possible to combine multiple objects from mediation analysis into a list in order to compare different methods graphically. If names are given to the list elements, those names will be used by the plots to identify the different methods. Note that we use the name \code{"ROBMED"} here to refer to the robust bootstrap test. % <<>>= boot_list <- list("OLS bootstrap" = ols_boot_simple, "ROBMED" = robust_boot_simple) @ Function \code{density_plot()} plots the density estimates of the indirect effect. It also adds vertical lines for the point estimates and illustrates the confidence intervals by shaded areas. The plot resulting from the following command is displayed in Figure~\ref{fig:density}. % <>= density_plot(boot_list) @ In order to aid with interpretation of the results from mediation analysis, function \code{ci_plot()} allows to visualize the point estimates and confidence intervals of selected effects. The direct effect and the indirect effect are plotted by default, as the typology of mediations of \citet{zhao10} is based on those two effects. Nevertheless, argument \code{parm} can be used to specify which effects to plot. Figure~\ref{fig:ci} contains the plot created by the command below. % <>= ci_plot(boot_list, parm = c("a", "b", "Direct", "Indirect")) @ Finally, function \code{ellipse_plot()} produces a bivariate scatterplot together with a tolerance ellipse that illustrates how well the regression results represent the data, exploiting the relationship between regression coefficients and the covariance matrix. For the robust bootstrap procedure of \citet{alfons22a}, the robustness weights from the robust regression estimator can be used to compute a weighted sample covariance matrix, from which the tolerance ellipse is computed. It is important to note that such a weighted covariance matrix is not Fisher consistent (that is, the functional form of the estimator applied to the model distribution does not equal the true covariance matrix), as observations are also expected to be downweighted when all observations follow the model. However, it is straightforward to obtain a correction for a Fisher consistent covariance matrix (see Appendix~\ref{app:ellipse}). For instance, we can produce such a plot with the independent variable on the horizontal axis and the hypothesized mediator on the vertical axis. In that case the plot represents the results from the regression of the hypothesized mediator on the independent variable. As the independent variable is the only explanatory variable in this regression model, the plot also adds lines representing the respective regression coefficients. % <>= ellipse_plot(boot_list, horizontal = "ValueDiversity", vertical = "TaskConflict") @ % The resulting plot is shown in Figure~\ref{fig:ellipse}. Since we are comparing a nonrobust method with a robust method that assigns a robustness weight to each observation, by default those robustness weights are visualized by plotting the points on a grayscale. Clearly, the tolerance ellipse corresponding to the robust method fits the main bulk of the data better, as the tolerance ellipse corresponding to the nonrobust method contains more empty space. This plot further suggests that the detected potential outliers (see also the printed output of \code{summary()} above) are a result of the heavy upper tail in the hypothesized mediator (\code{TaskConflict}). All plot functions in \pkg{robmed} allow customization via the underlying package \pkg{ggplot2} \citep{wickham16}. Arguments can be passed down to \code{geom_xxx()} functions, and additional elements can be added to the plot as usual with the \code{+} operator. We refer to the \proglang{R} help files of the plots for some examples. Nevertheless, there are limits to the customization, in particular for the plots that contain various elements such as the diagnostic plot with the tolerance ellipse. For further customization, \pkg{robmed} provides the workhorse functions \code{setup_weight_plot()}, \code{setup_ci_plot()}, \code{setup_density_plot()}, and \code{setup_ellipse_plot()}. They do not produce the plot, but they extract the relevant information to be displayed. They are useful for users who want to create similar plots, but who want more control over what information to display or how to display that information. With the commands below, we manually produce the same plot as before, but only plot the tolerance ellipses and the data without the regression lines. Figure~\ref{fig:ellipse-custom} displays the resulting plot. % <>= setup <- setup_ellipse_plot(boot_list, horizontal = "ValueDiversity", vertical = "TaskConflict") ggplot() + geom_path(aes(x = x, y = y, color = Method), data = setup$ellipse) + geom_point(aes(x = x, y = y, fill = Weight), data = setup$data, shape = 21) + scale_fill_gradient(limits = 0:1, low = "white", high = "black") + labs(x = setup$horizontal, y = setup$vertical) @ \subsection{Serial multiple mediators} \label{sec:serial} The second example extends the simple mediation model from the previous section to a serial multiple mediator model. We investigate the following illustrative hypothesis: value diversity negatively impacts team commitment through increased task conflict, and in turn task conflict negatively affects team performance through decreased team commitment. Here the dependent variable is an objective assessment of team performance, measured by the team's score on the simulation game. The corresponding formula is stored in an object for later use and contains the dependent variable \code{TeamScore} on the left hand side. On the right hand side, the hypothesized serial mediators \code{TaskConflict} and \code{TeamCommitment} are wrapped in a call to \code{serial_m()}, separated by the usual \code{+} operator from the independent variable \code{ValueDiversity}. % <<>>= f_serial <- TeamScore ~ serial_m(TaskConflict, TeamCommitment) + ValueDiversity @ % Function \code{test_mediation()} is then called in the same way as in the previous section to compare the robust bootstrap procedure of \citet{alfons22a} with the nonrobust OLS bootstrap, but here we use confidence level 90\% for the confidence intervals of indirect effects. % <>= set.seed(seed) robust_boot_serial <- test_mediation(f_serial, data = BSG2014, level = 0.9, robust = TRUE) set.seed(seed) ols_boot_serial <- test_mediation(f_serial, data = BSG2014, level = 0.9, robust = FALSE) @ The output of \code{summary()} looks very similar to that of the previous example, except that the last part shows results for multiple indirect effects. In order to save space, we only print the objects returned by \code{test_mediation()}, which shows the results for the bootstrap confidence intervals of the indirect effects. For completeness, the full \code{summary()} output of the robust bootstrap test can be found in Appendix~\ref{app:output}. % <<>>= robust_boot_serial ols_boot_serial @ % Note that the row labeled \code{Total} in the above output contains the results for the sum of the three individual indirect effects. We observe that the robust bootstrap test detects mediation in the indirect path that goes first through task conflict and subsequently through team commitment (effect \code{Indirect3}), whereas none of the indirect effects are significant in the OLS bootstrap. We can use function \code{weight_plot()} to produce a diagnostic plot to investigate deviations from normality assumptions that could explain the differences in results for the two methods. The plot created with the commands below is shown in Figure~\ref{fig:weight}. % <>= weight_plot(robust_boot_serial) + scale_color_manual("", values = c("black", "#00BFC4")) + theme(legend.position = "top") @ % As in the previous example, we see that there are strong indications of a heavy upper tail in the regression of task conflict on value diversity (top row of Figure~\ref{fig:weight}) and a heavy lower tail in the regression for objective team performance (bottom row of Figure~\ref{fig:weight}), which makes the results of the OLS bootstrap unreliable. \subsection{Parallel multiple mediators and control variables} \label{sec:parallel} In the final example, we investigate the illustrative hypothesis that procedural justice and interactional justice are parallel mediators for the relationship between shared leadership and the team's perception of its performance, controlled for diversity in age and gender within the team. Unlike in the previous section, the dependent variable is a subjective measure of team performance, as evaluated by the team members themselves. In \pkg{robmed}'s formula interface, parallel multiple mediators can be specified by wrapping multiple variables in a call to function \code{parallel_m()}, and control variables can be specified in a similar manner with function \code{covariates()}. % <<>>= f_parallel <- TeamPerformance ~ parallel_m(ProceduralJustice, InteractionalJustice) + SharedLeadership + covariates(AgeDiversity, GenderDiversity) @ Function \code{test_mediation()} is then called in the usual way to compare the robust bootstrap test of \citet{alfons22a} and the nonrobust OLS bootstrap. % <>= set.seed(seed) robust_boot_parallel <- test_mediation(f_parallel, data = BSG2014, robust = TRUE) set.seed(seed) ols_boot_parallel <- test_mediation(f_parallel, data = BSG2014, robust = FALSE) @ % To save space, we again do not show the entire \code{summary()} of the resulting objects, but only the results for the indirect effect by printing the objects. Appendix~\ref{app:output} contains the full \code{summary()} output of the robust bootstrap test, including the diagnostic plot of the regression weights. % <<>>= robust_boot_parallel ols_boot_parallel @ % As for the serial multiple mediator model, the row \code{Total} in the output above contains the results for the sum of the two individual indirect effects through the hypothesized mediators \code{ProceduralJustice} and \code{InteractionalJustice}. We observe that the robust procedure of \citet{alfons22a} finds support for mediation via both procedural justice and interactional justice (since the confidence intervals are strictly positive), whereas only the indirect effect of procedural justice is significant at the 5\% level for the OLS bootstrap. The output of \code{summary()} in Appendix~\ref{app:output}, indicates that the residual distributions are fairly normal (see the diagnostic plot in Figure~\ref{fig:summary-parallel}), but that there is a small number of potential outliers that should be investigated further. As the regression of the dependent variable (\code{TeamPerformance}) on the remaining variables shows more deviations from normality than the other regressions (bottom row of Figure~\ref{fig:summary-parallel}), we use \code{ellipse_plot()} to create the diagnostic plot with a tolerance ellipse that is related to the regression results. If several explanatory variables are included, it is often more insightful to plot the partial residuals on the vertical axis, i.e., to subtract from the response the linear predictor without the variable that is displayed on the horizontal axis. This has the additional advantage that the regression coefficient can be visualized by a line, which is otherwise not possible in the case of multiple explanatory variables. With function \code{ellipse_plot()}, plotting the partial residuals can easily be achieved by setting the argument \code{partial = TRUE}. With the command below, we plot the partial residuals of team performance against the independent variable shared leadership. Figure~\ref{fig:ellipse-partial} contains the resulting plot, which clearly visualizes the noisy data points. % <>= ellipse_plot(robust_boot_parallel, horizontal = "SharedLeadership", vertical = "TeamPerformance", partial = TRUE) @ In multiple mediator models, it can be of interest if the indirect effects are different from one another, or if they differ in magnitude \citep[p.163--166]{hayes18}. Function \code{test_mediation()} allows to make pairwise comparisons of indirect effects via the argument \code{contrast}. By setting this argument to \code{"estimates"}, the pairwise differences of the estimates of the indirect effect are computed, whereas setting this argument to \code{"absolute"} yields the computation of pairwise differences in absolute values. The output of \code{test_mediation()} then includes the estimates and confidence intervals for those contrasts, and also displays information on the definition of the contrasts. % <>= set.seed(seed) test_mediation(f_parallel, data = BSG2014, contrast = "absolute") @ However, it is not actually necessary to run the entire bootstrap procedure again if a bootstrap test had already been performed without computing those contrasts. Function \code{retest()} allows to reanalyze the bootstrap estimates with different parameter settings, which saves computation time in such a case. % <<>>= retest(robust_boot_parallel, contrast = "absolute") @ We emphasize that function \code{retest()} is available for computational convenience in case the analysis was by mistake conducted with the wrong parameter settings. It must not be abused for $p$~hacking. %% -- Summary/conclusions/discussion ------------------------------------------- \section{Summary and discussion} \label{sec:summary} The \proglang{R} package \pkg{robmed} provides easy-to-use functionality for robust mediation analysis. It implements the robust bootstrap procedure of \citet{alfons22a}, which yields reliable results under outliers and other deviations from normality assumptions, as well as diagnostic plots that allow to detect and further investigate such deviations. In addition, package \pkg{robmed} provides functionality for various other procedures for mediation analysis, as well as plots that allow to visualize and compare results from different methods. All implemented methods thereby share the same function interface and a clear class structure of the results. In particular, \pkg{robmed} introduces a new formula interface that allows to specify various types of mediation models with a single formula. At present, there are some limitations of package \pkg{robmed}. First of all, the current version requires a numeric dependent variable and numeric mediators, although the independent variable and additional control variables may be binary or categorical. We aim to extend the fast-and-robust bootstrap methodology for robust estimators of logistic regression in order to add support for mediation analysis with binary dependent variables. Second, adding support for additional mediation models, such as moderated mediation and mediated moderation models \citep[e.g.,][]{muller05} is planned for future versions. Third, the diagnostic plot of the regression weights would be even more useful if it included confidence bands, which could perhaps be constructed from the bootstrap procedure. Finally, a graphical user interface (GUI) could be beneficial for less proficient \proglang{R} users. Developing such a GUI, for instance as a web application based on package \pkg{shiny} \citep{shiny}, is future work. In the meantime, we provide the \proglang{R} extension bundle \pkg{ROBMED} \citep{ROBMED-RSPSS} for \proglang{SPSS} \citep{SPSS}, which links to the \proglang{R} package \pkg{robmed} and allows to use its main functionality through a GUI from within \proglang{SPSS}. Interested readers can obtain the extension bundle from \url{https://github.com/aalfons/ROBMED-RSPSS}. %% -- Optional special unnumbered sections ------------------------------------- \section*{Computational details} % \begin{leftbar} % If necessary or useful, information about certain computational details % such as version numbers, operating systems, or compilers could be included % in an unnumbered section. Also, auxiliary packages (say, for visualizations, % maps, tables, \dots) that are not cited in the main text can be credited here. % \end{leftbar} The results in this paper were obtained using \proglang{R} version~\Sexpr{paste(R.Version()[c("major", "minor")], collapse = ".")} \citep{R} with package \pkg{robmed} version~\Sexpr{packageVersion("robmed")} \citep{robmed} and its dependencies \pkg{boot} version~\Sexpr{packageVersion("boot")} \citep{boot}, \pkg{ggplot2} version~\Sexpr{packageVersion("ggplot2")} \citep{wickham16}, and \pkg{robustbase} version~\Sexpr{packageVersion("robustbase")} \citep{robustbase}. Package \pkg{robmed} also builds upon packages \pkg{quantreg} \citep{quantreg} and \pkg{sn} \citep{sn} for functionality not shown in this paper, as well as package \pkg{testthat} \citep{wickham11} for unit tests. \proglang{R} itself and all mentioned packages are available from the Comprehensive \proglang{R} Archive Network (CRAN) at \url{https://CRAN.R-project.org/}. The latest development version of package \pkg{robmed} can be obtained from \url{https://github.com/aalfons/robmed}. \section*{Acknowledgments} % \begin{leftbar} % All acknowledgments (note the AE spelling) should be collected in this % unnumbered section before the references. It may contain the usual information % about funding and feedback from colleagues/reviewers/etc. Furthermore, % information such as relative contributions of the authors may be added here % (if any). % \end{leftbar} Andreas Alfons is supported by a grant of the Dutch Research Council (NWO), research program Vidi, project number \mbox{VI.Vidi.195.141}. N\"{u}fer Y.\ Ate\c{s} is supported by the Science Academy Young Scientists Award Program (BAGEP) of the Science Academy Society of Turkey. %% -- Bibliography ------------------------------------------------------------- %% - References need to be provided in a .bib BibTeX database. %% - All references should be made with \cite, \citet, \citep, \citealp etc. %% (and never hard-coded). See the FAQ for details. %% - JSS-specific markup (\proglang, \pkg, \code) should be used in the .bib. %% - Titles in the .bib should be in title case. %% - DOIs should be included where available. \vspace{5ex} % avoid that the header is left dangling at the bottom of the page \bibliography{robmed} %% -- Appendix (if any) -------------------------------------------------------- %% - After the bibliography with page break. %% - With proper section titles and _not_ just "Appendix". \newpage \begin{appendix} % ------------------------------------------------------- % Details on the diagnostic plot with a tolerance ellipse % ------------------------------------------------------- \section{Details on the diagnostic plot with a tolerance ellipse} \label{app:ellipse} The diagnostic plot from function \code{ellipse_plot()} exploits the relationship between regression coefficients and the covariance matrix to draw a tolerance ellipse that illustrates how well the regression results represent the data. Our recommended robust bootstrap test for mediation analysis is based on a robust regression estimator that assigns robustness weights to the observations. Those robustness weights lie between 0 and 1, with lower weights indicating a higher degree of deviation. The corresponding tolerance ellipse is computed based on a weighted sample covariance matrix, using the weights returned by the robust regression. However, for such a plot to be meaningful, the weighted sample covariance matrix needs to yield a Fisher consistent estimator of the true covariance matrix under the model distribution. The diagnostic plot is most useful if the data come from a multivariate elliptical distribution. Consider the regression model \begin{equation*} Y = \alpha + \vect{X}^\top \vect{\beta} + \sigma \varepsilon, \end{equation*} In addition to the usual assumptions that $\varepsilon \sim N(0, 1)$ and that $\vect{X}$ and $\varepsilon$ are uncorrelated, we therefore also assume for the diagnostic plot that $\vect{X} \sim N(\vect{\mu}_{\vect{X}}, \mat{\Sigma}_{\vect{X}\vect{X}})$. We emphasize that this additional assumption is made only for the diagnostic plot, it is not required for the general applicability of our robust bootstrap test. Then we have $Z = (Y, \vect{X}^\top)^\top \sim N(\vect{\mu}, \mat{\Sigma})$ with \begin{equation*} \vect{\mu} = \left( \begin{array}{c} \mu_{Y} \\ \vect{\mu}_{\vect{X}} \end{array} \right) % \qquad \text{and} \qquad % \mat{\Sigma} = \left( \begin{array}{cc} \Sigma_{YY} & \vect{\Sigma}_{Y\vect{X}} \\ \vect{\Sigma}_{\vect{X}Y} & \mat{\Sigma}_{\vect{X}\vect{X}} \end{array} \right). \end{equation*} Furthermore, the regression coefficients can be written as \begin{align*} \beta &= \mat{\Sigma}_{\vect{X}\vect{X}}^{-1} \mat{\Sigma}_{\vect{X}Y}, \\ \alpha &= \mu_{Y} - \vect{\mu}_{\vect{X}}^\top \vect{\beta}. \end{align*} With observations $\obs{z}_{i} = (y_{i}, \obs{x}_{i}^\top)^\top$ and with with robustness weights $w_{i}$ from robust regression, $i = 1, \dots, n$, we define the weighted center estimate $\obs{m}$ and the weighted scatter matrix $\mat{S}$ as \begin{align*} \obs{m} &= \frac{1}{\sum_{i = 1}^{n} w_{i}} \sum_{i = 1}^{n} w_{i} \obs{z}_{i}, \\ \mat{S} &= \frac{1}{\left( \sum_{i = 1}^{n} w_{i} \right) - 1} \sum_{i = 1}^{n} w_{i} (\obs{z}_{i} - \obs{m}) (\obs{z}_{i} - \obs{m})^\top. \end{align*} Note that the denominator of $\mat{S}$ is chosen such that the weighted scatter matrix reduces to the unbiased sample covariance matrix if all observations receive full weight $w_{i} = 1$, \mbox{$i = 1, \dots, n$}. Let $F$ denote the cumulative distribution function (CDF) of the multivariate normal distribution $N(\vect{\mu}, \mat{\Sigma})$ and let $\Phi$ denote the CDF of the univariate standard normal distribution $N(0, 1)$. Then the functional forms of the estimators are given by \begin{align*} \vect{m}(F) &= \frac{1}{\delta} \int w(\varepsilon) \vect{z} \: dF(\vect{z}), \\ \mat{S}(F) &= \frac{1}{\delta} \int w(\varepsilon) (\vect{z} - \vect{m}(F)) (\vect{z} - \vect{m}(F))^\top \: dF(\vect{z}), \end{align*} where $\delta = \int w(\varepsilon) \: d\Phi(\varepsilon)$. Keep in mind that we only consider robust regression with symmetric loss functions such that the weight function $w$ is also symmetric, and let $\vect{m}(F) = (m_{Y}(F), \vect{m}_{\vect{X}}(F)^\top)^\top$. For the explanatory variables $\vect{X}$, we have \begin{equation*} \vect{m}_{\vect{X}}(F) = \frac{1}{\delta} \int w(\varepsilon) \vect{x} \: dF(\vect{z}) = \frac{1}{\delta} \underbrace{\int w(\varepsilon) \: d\Phi(\varepsilon)}_{=\delta} \int \vect{x} \: dF_{\vect{X}}(\vect{x}) = \vect{\mu}_{\vect{X}}, \end{equation*} where $F_{\vect{X}}$ denotes the joint CDF of $\vect{X}$. To show that $m_{Y}(F) = \mu_{Y}$, we can assume without loss of generality that $\mu = (0, \dots, 0)^\top$ and that $\vect{\beta} = (0, \dots, 0)^\top$. Then $\alpha = 0$ and $\varepsilon = Y / \sigma$, and we need to show that $m_{Y}(F) = 0$. We obtain \begin{equation} \label{eq:mean_Y} m_{Y}(F) = \frac{1}{\delta} \int w(\varepsilon) y \: dF(\vect{z}) = \frac{1}{\delta} \int_{-\infty}^{\infty} \underbrace{w \left( \frac{y}{\sigma} \right) y f_{Y}(y)}_{=g(y)} \: dy, \end{equation} where $f_{Y}$ denotes the probability density function of the marginal distribution of $Y$. For the integrand in \eqref{eq:mean_Y}, it holds that $g(-y) = -g(y)$, hence we have $m_{Y}(F) = 0$. Thus $\vect{\hat{\mu}} = \vect{m}$ is Fisher consistent for $\vect{\mu}$. Since $0 \leq w(\varepsilon) \leq 1$, $\mat{S}(F)$ is expected to underestimate (some elements of) the covariance matrix $\mat{\Sigma}$. However, for the submatrix involving only the explanatory variables $\vect{X}$, we obtain \begin{align*} \mat{S}_{\vect{X}\vect{X}}(F) &= \frac{1}{\delta} \int w(\varepsilon) (\vect{x} - \underbrace{\vect{m}_{\vect{X}}(F)}_{=\vect{\mu}_{\vect{X}}}) (\vect{x} - \underbrace{\vect{m}_{\vect{X}}(F)}_{=\vect{\mu}_{\vect{X}}})^\top \: dF(\vect{z}) \\ &= \frac{1}{\delta} \underbrace{\int w(\varepsilon) \: d\Phi(\varepsilon)}_{=\delta} \int (\vect{x} - \vect{\mu}_{\vect{X}}) (\vect{x} - \vect{\mu}_{\vect{X}})^\top \: dF_{\vect{X}}(\vect{x}) = \mat{\Sigma}_{\vect{X}\vect{X}}. \end{align*} Thus \begin{equation*} \mat{\hat{\Sigma}}_{\vect{X}\vect{X}} = \mat{S}_{\vect{X}\vect{X}} = \frac{1}{\left( \sum_{i = 1}^{n} w_{i} \right) - 1} \sum_{i = 1}^{n} w_{i} (\obs{x}_{i} - \obs{\bar{x}}) (\obs{x}_{i} - \obs{\bar{x}})^\top \end{equation*} is Fisher consistent for $\mat{\Sigma}_{\vect{X}\vect{X}}$. With Fisher consistent estimators $\vect{\hat{\beta}}$ and $\hat{\sigma}$ from robust regression, we can therefore construct a Fisher consistent estimator $\mat{\hat{\Sigma}}$ of the covariance matrix $\mat{\Sigma}$ by computing \begin{align*} \vect{\hat{\Sigma}}_{\vect{X}Y} &= \mat{\hat{\Sigma}}_{\vect{X}\vect{X}} \vect{\hat{\beta}}, \\ % \vect{\hat{\Sigma}}_{Y\vect{X}} &= \vect{\hat{\Sigma}}_{\vect{X}Y}^\top, \\ \hat{\Sigma}_{YY} &= \vect{\hat{\beta}}^\top \mat{\hat{\Sigma}}_{\vect{X}\vect{X}} \vect{\hat{\beta}} + \hat{\sigma}^{2}, \\ \mat{\hat{\Sigma}} &= \left( \begin{array}{cc} \hat{\Sigma}_{YY} & \vect{\hat{\Sigma}}_{\vect{X}Y}^\top \\ \vect{\hat{\Sigma}}_{\vect{X}Y} & \mat{\hat{\Sigma}}_{\vect{X}\vect{X}} \end{array} \right) . \end{align*} \end{appendix} % --------------------------------------------------------------- % Summary output of robust bootstrap tests for multiple mediators % --------------------------------------------------------------- \section{Additional summary output of robust bootstrap tests} \label{app:output} We first produce the summary of the robust bootstrap test for the serial multiple mediator model from Section~\ref{sec:serial}. Since the diagnostic plot for this example is already shown in Figure~\ref{fig:weight}, we set the argument \code{plot = FALSE} to suppress the diagnostic plot. % <<>>= summary(robust_boot_serial, plot = FALSE) @ \bigskip Finally, we display the summary of the robust bootstrap test for the parallel multiple mediator model with control variables from Section~\ref{sec:parallel}. Figure~\ref{fig:summary-parallel} contains the resulting diagnostic plot. % <>= summary(robust_boot_parallel) @ \newpage % make sure affiliation is printed in full after diagnostic plot %% ----------------------------------------------------------------------------- \end{document}