---
title: "CRE"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{CRE}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Installation

Installing from CRAN.

```{r, eval=FALSE}
install.packages("CRE")
```

Installing the latest developing version. 

```{r, eval=FALSE}
library(devtools)
install_github("NSAPH-Software/CRE", ref = "develop")
```

Import.

```{r, eval=FALSE}
library("CRE")
```

# Arguments

__Data (required)__   

**`y`** The observed response/outcome vector (binary or continuous).

**`z`** The treatment/exposure/policy vector (binary).  

**`X`** The covariate matrix (binary or continuous).    

__Parameters (not required)__   

**`method_parameters`** The list of parameters to define the models used, including:    
- **`ratio_dis`** The ratio of data delegated to the discovery sub-sample (default: 0.5).     
- **`ite_method`** The method to estimate the individual treatment effect (default: "aipw") [1].      
- **`learner_ps`** The ([SuperLearner](https://CRAN.R-project.org/package=SuperLearner)) model for the propensity score estimation (default: "SL.xgboost", used only for "aipw","bart","cf" ITE estimators).     
- **`learner_y`** The ([SuperLearner](https://CRAN.R-project.org/package=SuperLearner)) model for the outcome estimation (default: "SL.xgboost", used only for "aipw","slearner","tlearner" and "xlearner" ITE estimators).       

**`hyper_params`** The list of hyper parameters to fine tune the method, including:    
- **`intervention_vars`** Intervention-able variables used for Rules Generation (default: `NULL`).   
- **`ntrees`** The number of decision trees for random forest (default: 20).       
- **`node_size`** Minimum size of the trees' terminal nodes (default: 20).    
- **`max_rules`** Maximum number of candidate decision rules (default: 50).     
- **`max_depth`** Maximum rules length (default: 3).     
- **`t_decay`** The decay threshold for rules pruning (default: 0.025).        
- **`t_ext`** The threshold to define too generic or too specific (extreme) rules (default: 0.01).        
- **`t_corr`** The threshold to define correlated rules (default: 1).     
- **`stability_selection`** Method for stability selection for selecting the rules. `vanilla` for stability selection, `error_control` for stability selection with error control and `no` for no stability selection (default: `vanilla`).    
- **`B`** Number of bootstrap samples for stability selection in rules selection and uncertainty quantification in estimation (default: 20).
- **`subsample`** Bootstrap ratio subsample for stability selection in rules selection and uncertainty quantification in estimation (default: 0.5).
- **`offset`** Name of the covariate to use as offset (i.e. "x1") for T-Poisson ITE Estimation. `NULL` if not used (default: `NULL`).   
- **`cutoff`** Threshold defining the minimum cutoff value for the stability scores in Stability Selection (default: 0.9).    
- **`pfer`** Upper bound for the per-family error rate (tolerated amount of falsely selected rules) in Error Control Stability Selection (default: 1).

__Additional Estimates (not required)__  

**`ite`** The estimated ITE vector. If given, both the ITE estimation steps in Discovery and Inference are skipped (default: `NULL`).    


## Notes

### Options for the ITE estimation

**[1]** Options for the ITE estimation are as follows:   
- [S-Learner](https://CRAN.R-project.org/package=SuperLearner) (`slearner`).    
- [T-Learner](https://CRAN.R-project.org/package=SuperLearner) (`tlearner`)  
- T-Poisson(`tpoisson`)  
- [X-Learner](https://CRAN.R-project.org/package=SuperLearner) (`xlearner`)  
- [Augmented Inverse Probability Weighting](https://CRAN.R-project.org/package=SuperLearner) (`aipw`)  
- [Causal Forests](https://CRAN.R-project.org/package=grf) (`cf`)  
- [Causal Bayesian Additive Regression Trees](https://CRAN.R-project.org/package=bartCause) (`bart`)  

If other estimates of the ITE are provided in `ite` additional argument, both the ITE estimations in discovery and inference are skipped and those values estimates are used instead.  The ITE estimator requires also an outcome learner and/or a propensity score learner from the [SuperLearner](https://CRAN.R-project.org/package=SuperLearner) package (i.e., "SL.lm", "SL.svm"). Both these models are simple classifiers/regressors. By default XGBoost algorithm is used for both these steps.     

### Customized wrapper for SuperLearner

One can create a customized wrapper for SuperLearner internal packages. The following is an example of providing the number of cores (e.g., 12) for the xgboost package in a shared memory system. 

```R
m_xgboost <- function(nthread = 12, ...) {
  SuperLearner::SL.xgboost(nthread = nthread, ...)
}
```

Then use "m_xgboost", instead of "SL.xgboost".  

# Examples

Example 1 (*default parameters*)
```R
set.seed(9687)
dataset <- generate_cre_dataset(n = 1000, 
                                rho = 0, 
                                n_rules = 2, 
                                p = 10,
                                effect_size = 2, 
                                binary_covariates = TRUE,
                                binary_outcome = FALSE,
                                confounding = "no")
y <- dataset[["y"]]
z <- dataset[["z"]]
X <- dataset[["X"]]

cre_results <- cre(y, z, X)
summary(cre_results)
plot(cre_results)
ite_pred <- predict(cre_results, X)
```

Example 2 (*personalized ite estimation*)
```R
set.seed(9687)
dataset <- generate_cre_dataset(n = 1000, 
                                rho = 0, 
                                n_rules = 2, 
                                p = 10,
                                effect_size = 2, 
                                binary_covariates = TRUE,
                                binary_outcome = FALSE,
                                confounding = "no")
  y <- dataset[["y"]]
  z <- dataset[["z"]]
  X <- dataset[["X"]]

ite_pred <- ... # personalized ite estimation
cre_results <- cre(y, z, X, ite = ite_pred)
summary(cre_results)
plot(cre_results)
ite_pred <- predict(cre_results, X)
```

Example 3 (*setting parameters*)
```R
set.seed(9687)
dataset <- generate_cre_dataset(n = 1000, 
                                rho = 0, 
                                n_rules = 2, 
                                p = 10,
                                effect_size = 2, 
                                binary_covariates = TRUE,
                                binary_outcome = FALSE,
                                confounding = "no")
y <- dataset[["y"]]
z <- dataset[["z"]]
X <- dataset[["X"]]

method_params <- list(ratio_dis = 0.5,
                      ite_method ="aipw",
                      learner_ps = "SL.xgboost",
                      learner_y = "SL.xgboost")

hyper_params <- list(intervention_vars = c("x1","x2","x3","x4"),
                     offset = NULL,
                     ntrees = 20,
                     node_size = 20,
                     max_rules = 50,
                     max_depth = 3,
                     t_decay = 0.025,
                     t_ext = 0.025,
                     t_corr = 1,
                     stability_selection = "vanilla",
                     cutoff = 0.8,
                     pfer = 1,
                     B = 10,
                     subsample = 0.5)

cre_results <- cre(y, z, X, method_params, hyper_params)
summary(cre_results)
plot(cre_results)
ite_pred <- predict(cre_results, X)
```

More synthetic data sets can be generated using `generate_cre_dataset()`.