---
title: "Mediation Analysis for survival data"
author: Klaus Holst & Thomas Scheike
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    fig_caption: yes
    fig_width: 7.15
    fig_height: 5.5        
vignette: >
  %\VignetteIndexEntry{Mediation Analysis for survival data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  #dev="png",
  comment = "#>"  
)
fullVignette <- Sys.getenv("_R_FULL_VIGNETTE_") %in% c("1","TRUE")
library(mets)
```

Overview 
========

Fit 

 * binomial-regression IPCW, binreg 
 * additive Lin-Ying model, aalenMets
 * cox model phreg 
 * standard logistic regression via binreg

in the context of mediation analysis using mediation weights as in the medFlex package. 
We thus fit natural effects models, that for example on the binary scale might state
that
\begin{align*}
\mbox{logit}(P(Y(x,M(x^*))=1| Z)  = \beta_0+ \beta_1 x + \beta_2 x^* + \beta_3^T Z,
\end{align*}
in this case the the Natural Direct Effect (NDE) for fixed covariates $Z$ is 
\begin{align*}
 \mbox{OR}_{1,0|Z}^{\mbox{NDE}} = \frac{\mbox{odds}(Y(1,M(x))|Z)}{\mbox{odds}(Y(0,M(x))|Z)}  = \exp(\beta_1),
\end{align*}
and the Natural Inderect Effect (NIE) for fixed covariates $Z$ is 
\begin{align*}
 \mbox{OR}_{1,0|Z}^{\mbox{NIE}} = \frac{\mbox{odds}(Y(x,M(1))|Z)}{\mbox{odds}(Y(x,M(0))|Z)} = \exp(\beta_2).
\end{align*}
See the medFlex package for additional discussion of the parametrization.

The mediator can  be

 * binomial using glm-binomial.
 * multnomial via the mlogit function of mets

Both mediator and exposure must be coded as factors.

In the below example these are

 * mediator: gp.f
 * exposure : dnr.f

and the outcome model is concerned with the risk/hazard of cause=2. 

The key is that the standard errors are computed using the i.i.d influence
functions and a Taylor expansion to deal with the uncertainty from the
mediation weights. 


Simulated Data 
==============

First we simulate some data that mimics that of Kumar et al 2012. 
This is data from multiple myeloma patients treated with allogeneic stem cell
transplantation from the Center for International Blood and Marrow Transplant Research
(CIBMTR) Kumar et al (2012), "Trends in allogeneic stem cell transplantation for multiple myeloma: a CIBMTR
analysis".  The data used in this paper consist of patients transplanted from 1995 to
2005, and we compared the outcomes between transplant periods: 2001-2005 (N=488)
versus 1995-2000 (N=375). The two competing events were
relapse (cause 2) and treatment-related mortality (TRM, cause 1)) 
defined as death without relapse.
\cite{kumar-2012} considered the following risk covariates: 
transplant time period (gp (main interest of the study): 1 for transplanted in
2001-2005 versus 0 for transplanted in 1995-2000), donor type (dnr: 1 for Unrelated or
other related donor (N=280) versus 0 for HLA-identical sibling (N=584)), prior
autologous transplant (preauto: 1 for Auto+Allo transplant (N=399) versus 0 for
allogeneic transplant alone (N=465)) and time to transplant (ttt24: 1 for more than 24 months (N=289) versus 0 for less than or
equal to 24 months (N=575))). 

The interest is then on the effect of the period (gp) and the possible mediation
via the amount of unrealted or related donors (dnr). A somewhat artificial example ! 
All adjusted for other important counfounders. 

```{r}
 library(mets)
 runb <- 0
 options(warn=-1)
 set.seed(1000) # to control output in simulatins for p-values below.

n <- 200; k.boot <- 10; 

dat <- kumarsimRCT(n,rho1=0.5,rho2=0.5,rct=2,censpar=c(0,0,0,0),
          beta = c(-0.67, 0.59, 0.55, 0.25, 0.98, 0.18, 0.45, 0.31),
    treatmodel = c(-0.18, 0.56, 0.56, 0.54),restrict=1)
dfactor(dat) <- dnr.f~dnr
dfactor(dat) <- gp.f~gp
drename(dat) <- ttt24~"ttt24*"
dat$id <- 1:n
dat$ftime <- 1
```


Mediation Weights 
=================

Then  compute the mediation weights  based on a mediation model

```{r}
weightmodel <- fit <- glm(gp.f~dnr.f+preauto+ttt24,data=dat,family=binomial)
wdata <- medweight(fit,data=dat)
```

Binomial Regression 
===================

A simple multvariate regression of the probaibility of relapse at 50 months with both
exposure and mediator (given the other covariates)

```{r}
aaMss2 <- binreg(Event(time,status)~gp+dnr+preauto+ttt24+cluster(id),data=dat,time=50,cause=2)
summary(aaMss2)
```

Binomial regression IPCW Mediation Analysis 
===========================================

We first look at the probability of  relapse at 50 months

```{r, label=firstmodel}
### binomial regression ###########################################################
aaMss <- binreg(Event(time,status)~dnr.f0+dnr.f1+preauto+ttt24+cluster(id),data=wdata,
		time=50,weights=wdata$weights,cause=2)
summary(aaMss)

ll <- mediatorSurv(aaMss,fit,data=dat,wdata=wdata)
summary(ll)
if (runb>0) { bll <- BootmediatorSurv(aaMss,fit,data=dat,k.boot=k.boot); summary(bll)}
```

So the  NDE is $1.40 (0.72,2.76)$ and the NIE is $1.32 (1.05,1.66)$. 

Mediation Analysis 
====================

We here also illustrate how to use the other models mentioned above. 

```{r, label=multiplemodels}
### lin-ying model ################################################################
aaMss <- aalenMets(Surv(time/100,status==2)~dnr.f0+dnr.f1+preauto+ttt24+cluster(id),data=wdata,
		   weights=wdata$weights)
ll <- mediatorSurv(aaMss,fit,data=dat,wdata=wdata)
summary(ll)
if (runb>0) { bll <- BootmediatorSurv(aaMss,fit,data=dat,k.boot=k.boot); summary(bll)}

### cox model ###############################################################################
aaMss <- phreg(Surv(time,status==2)~dnr.f0+dnr.f1+preauto+ttt24+cluster(id),data=wdata,
	       weights=wdata$weights)
summary(aaMss)
ll <- mediatorSurv(aaMss,fit,data=dat,wdata=wdata)
summary(ll)
if (runb>0) { bll <- BootmediatorSurv(aaMss,fit,data=dat,k.boot=k.boot); summary(bll)}

### Fine-Gray #############################################################3
aaMss <- cifreg(Event(time,status)~dnr.f0+dnr.f1+preauto+ttt24+cluster(id),data=wdata,
		weights=wdata$weights,propodds=NULL,cause=2)
summary(aaMss)
ll <- mediatorSurv(aaMss,fit,data=dat,wdata=wdata)
summary(ll)
if (runb>0) { bll <- BootmediatorSurv(aaMss,fit,data=dat,k.boot=k.boot); summary(bll)}

### logit model  #############################################################3
aaMss <- cifreg(Event(time,status)~dnr.f0+dnr.f1+preauto+ttt24+cluster(id),data=wdata,
		weights=wdata$weights,cause=2)
summary(aaMss)
ll <- mediatorSurv(aaMss,fit,data=dat,wdata=wdata)
summary(ll)
if (runb>0) { bll <- BootmediatorSurv(aaMss,fit,data=dat,k.boot=k.boot); summary(bll)}

### binomial outcome  ############################
aaMss <- binreg(Event(ftime,status)~dnr.f0+dnr.f1+preauto+ttt24+cluster(id),data=wdata,
		time=50,weights=wdata$weights,cens.weights=1,cause=2)
summary(aaMss)
ll <- mediatorSurv(aaMss,fit,data=dat,wdata=wdata)
summary(ll)
if (runb>0) { bll <- BootmediatorSurv(aaMss,fit,data=dat,k.boot=k.boot); summary(bll)}
```


Multinomial regression
======================

Also works with mediator with more than two levels

 * meditor: wmi in 4 categories
 * exposure: age in 4 categories

```{r, label=multinom, cache=TRUE, eval=fullVignette}
data(tTRACE)
dcut(tTRACE) <- ~. 

weightmodel <- fit <- mlogit(wmicat.4 ~agecat.4+vf+chf,data=tTRACE,family=binomial)
wdata <- medweight(fit,data=tTRACE)

aaMss <- binreg(Event(time,status)~agecat.40+ agecat.41+ vf+chf+cluster(id),data=wdata,
		time=7,weights=wdata$weights,cause=9)
summary(aaMss)
MultMed <- mediatorSurv(aaMss,fit,data=tTRACE,wdata=wdata)
```

```{r  results="hide", echo=FALSE }
## To save time building the vignettes on CRAN, we cache time consuming computations
if (fullVignette) {
  MultMed[c('iid','iid.w','iid.surv')] <- NULL
  saveRDS(MultMed, "data/MultMed.rds")
} else {
  MultMed <- readRDS("data/MultMed.rds")
}
```

```{r}
summary(MultMed)
```

SessionInfo
============


```{r}
sessionInfo()
```