The dpasurv package

Matthias Kormaksson

August 27, 2024

library(dpasurv)
library(ggplot2)

1. Introduction

Dynamic path analysis introduced by (Fosen et al. 2006) is a method for performing mediation analysis with a longitudinal mediator and time-to-event outcome. A thorough review of the methodology can be found in the tutorial paper of (Kormaksson et al. 2024). In this vignette we will outline the modeling framework and define direct, indirect and total effects in the context of dynamic path analysis. We then demonstrate the main functionalities of the package, namely estimation, inference, and visualization.

2. Methods

Dynamic path analysis

Dynamic path analysis involves modeling so-called dynamic path diagrams such as the one depicted in Figure 1 (a). A dynamic path diagram is defined in (Fosen et al. 2006) as a time-indexed sequence of directed acyclic graphs that is characterized by a treatment variable, \(Z\), one or more longitudinal mediator variables, \(M(t)\), and a survival (time-to-event) outcome \(dN(t)=1(t \leq T < t+dt)\), where \(T\) denotes the time of event with hazard \(\lambda(t) = \lim_{dt \rightarrow 0} P(dN(t)=1 | T \geq t)\). The survival outcome takes the role of a single terminal node in the dynamic path diagram. Additional time-fixed confounder variables, such as baseline characteristics, \(X\), may be incorporated as well.

Figure 1. (a) A dynamic path diagram with treatment $Z$, a single mediator $M(t)$, and baseline confounders $X$; (b) Corresponding dynamic path diagram without the mediator process.

Figure 1. (a) A dynamic path diagram with treatment \(Z\), a single mediator \(M(t)\), and baseline confounders \(X\); (b) Corresponding dynamic path diagram without the mediator process.

Relationships between variables are represented by directed edges between nodes and collectively the nodes and edges imply an underlying model structure. More specifically, each child node represents an entity to be modeled while the corresponding parent nodes represent the underlying model covariates. In Figure 1 (a) there are two implied models, one corresponding to the survival outcome variable, \(dN(t)\), and the other corresponding to the mediator response \(M(t)\).

A defining feature of dynamic path analysis as defined in (Fosen et al. 2006) is the specific choice of models for the dynamic path diagram, namely the so-called dynamic path models. At the core of these is Aalen’s additive regression model for the survival outcome at the terminal node while all other models involve least squares regressions at each time point \(t\). In the following equations we see the dynamic path models corresponding to Figure 1 (a) \[\begin{align} \lambda(t) &= R(t) \cdot \left\{\beta_0(t) + \beta_1(t) Z + \beta_2(t) M(t) + X\eta \right\}, \tag{1} \\ M(t) &= \alpha_0(t) + \alpha_1(t) Z + X\eta + \varepsilon(t). \tag{2} \end{align}\] In the above \(R(t)\) is the at-risk function indicating whether subject is still under observation (i.e. alive and still under follow up) just before time \(t\), and \(\varepsilon(t)\) is a time-dependent error term. Under the outcome model (1) the hazard is assumed to change linearly as a function of treatment, mediator(s) and confounders, while in the mediator model (2) the longitudinal mediator is regressed on treatment and confounders.

The regression functions of principal interest are those characterizing the relationships between treatment, mediator(s) and survival, namely \(\beta_1(t), \beta_2(t)\), and \(\alpha_1(t)\). The goal of dynamic path analysis (and the dpasurv package) is to estimate these key functions that serve as building blocks for the direct, indirect and total effects that are defined in the next section.

Direct, indirect and total effects

The cumulative direct and indirect effects at time \(t\) are defined as as follows \[\begin{align} cumdir(t) &= \int_0^t \beta_1(s)ds \label{eqn:direct_effect}, \\ cumind(t) &= \int_0^t \alpha_1(s)\beta_2(s)ds \label{eqn:indirect_effect}. \end{align}\] The direct effect corresponds to the regression function along the direct arrow between treatment \(Z\) and outcome \(dN(t)\) in Figure 1 (a), while the indirect effect is obtained by multiplying the regression functions along the mediation path \(Z \rightarrow M(t) \rightarrow dN(t)\).

The total effect is defined as the regression function \(\gamma_1(t)\) from the mediator free Aalen’s additive model from Figure 1 (b) \[ \lambda(t) = R(t) \cdot \left\{\gamma_0(t) + \gamma_1(t) Z + X\eta \right\}, \tag{3} \] and the corresponding cumulative total effect is defined as: \[\begin{align} cumtot(t) &= \int_0^t \gamma_1(s)ds \label{eqn:total_effect}. \\ \end{align}\] One of the most appealing features of dynamic path analysis is the following analytical decomposition \[\begin{align} cumtot(t) &= cumdir(t) + cumind(t). \\ \end{align}\]

Estimation and inference

The analytical estimators for the cumulative direct, indirect, and total effects above are derived in (Kormaksson et al. 2024) and confidence bands are obtained by bootstrap inference. More specifically, we repeatedly sample subjects with replacement from the set of all subjects and estimate the direct, indirect and total effects each time. Once we have obtained \(B\) bootstrap estimates of the effects of interest, i.e. \((\widehat{cumdir}_b(t), \widehat{cumind}_b(t), \widehat{cumtot}_b(t)\)), for \(b=1,\dots,B\), then we calculate pointwise \(100\cdot(1-\alpha)\)% bootstrap confidence intervals at each timepoint \(t\) by calculating the corresponding empirical \(100\cdot\alpha/2\) and \(100\cdot(1-\alpha/2)\) percentiles.

3. Getting started with dpasurv

The dpasurv package implements the complete dynamic path analysis workflow.

Simulated data set

The dpasurv package comes with a simulated data set called simdata

simdata
#> # A tibble: 1,229 × 7
#>    subject     x dose      M start  stop event
#>    <fct>   <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
#>  1 1           0 ctrl   6.74     0   4       0
#>  2 1           0 ctrl   6.91     4   8       0
#>  3 1           0 ctrl   6.9      8  12       0
#>  4 1           0 ctrl   6.71    12  26       0
#>  5 1           0 ctrl   6.45    26  46.8     1
#>  6 2           1 high   6.11     0   4       0
#>  7 2           1 high   6.28     4   8       0
#>  8 2           1 high   7.04     8  12       0
#>  9 2           1 high   6.93    12  26       0
#> 10 2           1 high   7.86    26  52       0
#> # … with 1,219 more rows

The data contains the following variables:

At each visiting time, start, a new mediator measurement is taken. However, the mathematical assumption is that the intervals (start,stop] are open on the left and closed on the right. Under this assumption the risk of an event in the interval (start, stop] is influenced by the last observed mediator value.

Implementation in a nutshell

Below we see the minimal code required for performing dynamic path analysis on simdata. It explores the direct effect of treatment x on survival as well as the indirect effect mediated through M:

# Perform dynamic path analysis
s <- dpa(Surv(start, stop, event) ~ M + x, list(M ~ x), id = "subject", data = simdata, boot.n = 500)

# Extract cumulative direct and indirect effects with 95% pointwise bootstrap confidence bands:
direct <- effect(x ~ outcome, s, alpha=0.05)
indirect <- effect(x ~ M ~ outcome, s, alpha=0.05)

# Use sum() method to obtain total effect
total <- sum(direct, indirect)

# Plot the results
layout1x3 <- par(mfrow=c(1,3))
plot(direct); abline(h=0, lty=2, col=2)
plot(indirect); abline(h=0, lty=2, col=2)
plot(total); abline(h=0, lty=2, col=2)

# restore user's graphical parameters:
par(layout1x3)

# Plot the results with ggplot2 graphics:
ggplot.effect(list(direct, indirect, total))

4. Simulated Use Case

Let us now use the dpasurv package to perform a slightly more in depth analysis of simdata.

Look at the data

Firstly, we note that there is a dose response trend when looking at longitudinal trends in the subject mediator profiles:

ggplot(mapping=aes(x = start, y = M, group=dose), data=simdata) + geom_smooth(aes(colour=dose))

The higher the dose, the higher the average mediator value is.

We also note that there is a dose response trend when looking at association between dose levels and survival

fit <- survival::survfit(Surv(start, stop, event) ~ dose, data=simdata)
plot(fit, col=2:4, xlab="Time", ylab="Survival", main="KM-curves by dose level")
legend(150, 1, c("control", "low dose", "high dose"), col=2:4, lwd=2, bty='n')

A Cox model with dose and mediator as covariates reveals a significant hazard ratio for high dose, but only a trend for the low dose (both as compared to control), while there is no significant association between the mediator and survival:

summary(survival::coxph(Surv(start, stop, event) ~ dose + M, data=simdata))
#> Call:
#> survival::coxph(formula = Surv(start, stop, event) ~ dose + M, 
#>     data = simdata)
#> 
#>   n= 1229, number of events= 172 
#> 
#>              coef exp(coef) se(coef)      z Pr(>|z|)   
#> doselow  -0.32281   0.72411  0.18826 -1.715  0.08640 . 
#> dosehigh -0.66230   0.51566  0.20542 -3.224  0.00126 **
#> M         0.05274   1.05416  0.04863  1.085  0.27808   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>          exp(coef) exp(-coef) lower .95 upper .95
#> doselow     0.7241     1.3810    0.5007    1.0473
#> dosehigh    0.5157     1.9392    0.3448    0.7713
#> M           1.0542     0.9486    0.9583    1.1596
#> 
#> Concordance= 0.555  (se = 0.025 )
#> Likelihood ratio test= 10.57  on 3 df,   p=0.01
#> Wald test            = 10.51  on 3 df,   p=0.01
#> Score (logrank) test = 10.7  on 3 df,   p=0.01

Dynamic path analysis

Let’s now perform dynamic path analysis with the dpa() function

set.seed(1)
s <- dpa(Surv(start, stop, event) ~ M + dose, list(M ~ dose), id = "subject", data = simdata, boot.n = 500)

and calculate some effects of interest with the effect() function.

We confirm a dose response on mediator, which appears significant for the high-dose vs control:

# The dose effect on mediator response:
dose.on.mediator <- effect(dose ~ M, s)

ggplot.effect(dose.on.mediator)

We also confirm a dose response on survival outcome, although only the high-dose appears significant:

# direct effect of dose on outcome
direct <- effect(dose ~ outcome, s)

ggplot.effect(direct)

In contrast to the conclusions from the Cox-model there appears to be a meaningful dynamic association between the mediator and survival:

# effect of mediator on outcome
mediator.on.outcome <- effect(M ~ outcome, s)

ggplot.effect(mediator.on.outcome)

In particular, the mediator effect is neutral for the first 100 days and then starts having a significant detrimental effect on survival after that (the higher the mediator value, the higher the risk).

Explore output of Aalen’s additive model

The output object from the call to the dpa() function above stores the output from Aalen’s additive model (1) and we can explore its content for inferential purposes

# Output from Aalen's additive model from the timereg::aalen() implementation:
summary(s$aalen)
#> Additive Aalen Model 
#> 
#> Test for nonparametric terms 
#> 
#> Test for non-significant effects 
#>             Supremum-test of significance p-value H_0: B(t)=0
#> (Intercept)                          2.56               0.163
#> M                                    3.30               0.025
#> doselow                              2.99               0.061
#> dosehigh                             5.12               0.000
#> 
#> Test for time invariant effects 
#>                   Kolmogorov-Smirnov test p-value H_0:constant effect
#> (Intercept)                         1.400                       0.023
#> M                                   0.176                       0.066
#> doselow                             0.595                       0.473
#> dosehigh                            0.587                       0.534
#>                     Cramer von Mises test p-value H_0:constant effect
#> (Intercept)                        148.00                       0.008
#> M                                    2.05                       0.072
#> doselow                             14.90                       0.530
#> dosehigh                            13.30                       0.623
#> 
#>    
#>    
#>   Call: 
#> timereg::aalen(formula = out.formula, data = data, id = data[[id]])

According to the resampling tests of the time-varying effects we see that the mediator effect is indeed significantly different from zero as well as the high-dose effect, while the low-dose effect is only marginally significant.

Plot the direct, indirect, and total effects

Finally, let’s also calculate the indirect and total effects

# indirect effect of dose on outcome, mediated through M
indirect <- effect(dose ~ M ~ outcome, s)

# total effect
total <- sum(direct, indirect)

and then plot together the direct, indirect, and total effects:

ggplot.effect(list(direct, indirect, total))

We see that the indirect effect of dose on survival is neutral for the first 100 days but then becomes detrimental for survival after that (only significant for the high dose). The detrimental indirect effect of the high dose counteracts the corresponding significant and beneficial direct effect.

Fosen, J., E. Ferkingstad, Ø. Borgan, and O. O. Aalen. 2006. “Dynamic Path Analysis - a New Approach to Analyzing Time-Dependent Covariates.” Lifetime Data Anal. 12 (2): 143–67. https://doi.org/10.1007/s10985-006-9004-2.
Kormaksson, M., M. R. Lange, D. Demanse, S. Strohmaier, J. Duan, Q. Xie, M. Carbini, C. Bossen, A. Guettner, and A. Maniero. 2024. “Dynamic Path Analysis for Exploring Treatment Effect Mediation Processes in Clinical Trials with Time-to-Event Endpoints.” (Submitted).