--- title: "Plot Cumulative Incidence and Survival Curves" author: "Sahir Rai Bhatnagar" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: fig_height: 6 fig_width: 8 toc: yes toc_depth: 4 vignette: > %\VignetteIndexEntry{Plot Cumulative Incidence and Survival Curves} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} bibliography: reference.bib editor_options: chunk_output_type: console --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, results='hide', fig.show='all', comment = "#>" ) ``` ## Introduction In this short vignette, we will introduce the `plot` method for `absoluteRisk` objects. This method allows you to plot cumulative incidence (CI) or survival curves as a function of time and a given covariate profile. More specifically, the cumulative incidence is given by: $$ CI(x, t) = 1 - exp\left[ - \int_0^t h(x, u) \textrm{d}u \right] $$ where \\( h(x, t) \\) is the hazard function, \\( t \\) denotes the numerical value (number of units) of a point in prognostic/prospective time and \\( x \\) is the realization of the vector \\( X \\) of variates based on the patient's profile and intervention (if any). And the survival function is given by $$ S(x, t) = 1 - CI(x,t) = exp\left[ - \int_0^t h(x, u) \textrm{d}u \right] $$ ## Analysis of the `brcancer` dataset To illustrate hazard function plots, we will use the breast cancer dataset which contains the observations of 686 women taken from the [`TH.data`](https://cran.r-project.org/package=TH.data) package. This dataset is also available from the `casebase` package. In the following, we will show the CI curve for several covariate profiles. ```{r eval=TRUE} library(casebase) library(survival) library(ggplot2) data("brcancer") mod_cb_glm <- fitSmoothHazard(cens ~ estrec*log(time) + horTh + age + menostat + tsize + tgrade + pnodes + progrec, data = brcancer, time = "time", ratio = 10) summary(mod_cb_glm) ``` ### Plotting Cumulative Incidence Curves We can use the `plot` method for objects of class `absRiskCB`, which is returned by the `absoluteRisk` function, to plot cumulative incidence curves. For example, suppose we want to compare the cumulative incidence curves of the 1st and 50th individuals in the `brcancer` dataset. We first call the `absoluteRisk` function and specify the `newdata` argument. Note that since time is missing, the risk estimate is calculated at the observed failure times. ```{r} smooth_risk_brcancer <- absoluteRisk(object = mod_cb_glm, newdata = brcancer[c(1,50),]) class(smooth_risk_brcancer) plot(smooth_risk_brcancer) ``` These curves can be further customized. For example, suppose we want to change the legend title and legend keys: ```{r} plot(smooth_risk_brcancer, id.names = c("Covariate Profile 1","Covariate Profile 50"), legend.title = "Type", xlab = "time (days)", ylab = "Cumulative Incidence (%)") ``` The call to `plot` on a `absRiskCB` object returns a `ggplot2` object, and therefore can be used downstream with other `ggplot2` functions. For example, suppose we want to change the theme: ```{r} plot(smooth_risk_brcancer, id.names = c("Covariate Profile 1","Covariate Profile 50"), legend.title = "Type", xlab = "time (days)", ylab = "Cumulative Incidence (%)") + ggplot2::theme_linedraw() ``` ### Using `graphics::matplot` By default, the `plot` method uses `ggplot2` to produce the curves. Alternatively, you can use `graphics::matplot` by specifying `gg = FALSE`. This option is particularly useful if you want to add the cumulative incidence curve to an existing plot, e.g., adding the adjusted smooth curve to a Kaplan-Meier curve. In this example, we calculate the cumulative incidence for a _typical_ individual in the dataset: ```{r} cols <- c("#8E063B","#023FA5") smooth_risk_typical <- absoluteRisk(object = mod_cb_glm, newdata = "typical") y <- with(brcancer, survival::Surv(time, cens)) plot(y, fun = "event", conf.int = F, col = cols[1], lwd = 2) plot(smooth_risk_typical, add = TRUE, col = cols[2], lwd = 2, gg = FALSE) legend("bottomright", legend = c("Kaplan-Meier", "casebase"), col = cols, lty = 1, lwd = 2, bg = "gray90") ``` ## Survival Curves We can also easily calculate and plot survival curves by specifying `type = 'survival'` in the call to `absoluteRisk`. The corresponding call to `plot` is the same as with cumulative incidence curves: ```{r} smooth_surv_brcancer <- absoluteRisk(object = mod_cb_glm, newdata = brcancer[c(1,50),], type = "survival") plot(smooth_surv_brcancer) ``` ## Other families ### `glmnet` We can also plot cumulative incidence curves for other families. For example, using the `family = "glmnet"`, we can plot the cumulative incidence curves for the first 10 individuals in the `brcancer` dataset, using the tuning parameter which minimizes the 10-fold cross-validation error ($\lambda_{min}$): ```{r eval=requireNamespace("glmnet", quietly = TRUE)} mod_cb_glmnet <- fitSmoothHazard(cens ~ estrec*time + horTh + age + menostat + tsize + tgrade + pnodes + progrec, data = brcancer, time = "time", ratio = 1, family = "glmnet") smooth_risk_glmnet <- absoluteRisk(object = mod_cb_glmnet, newdata = brcancer[1:10,], s = "lambda.min") plot(smooth_risk_glmnet) ``` ### `gam` Here we produce the same plot but for `family = "gam"` for generalised additive models. ```{r} mod_cb_gam <- fitSmoothHazard(cens ~ estrec + time + horTh + age + menostat + tsize + tgrade + pnodes + progrec, data = brcancer, time = "time", ratio = 1, family = "gam") smooth_risk_gam <- absoluteRisk(object = mod_cb_gam, newdata = brcancer[1:10,]) plot(smooth_risk_gam) ``` # Session information ```{r echo=FALSE, eval=TRUE} print(sessionInfo(), locale = F) ```