---
title: "ATE.ERROR.XY: Estimating Average Treatment Effect with Measurement Error in X and Misclassification in Y"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{ATE.ERROR.XY: Estimating Average Treatment Effect with Measurement Error in X and Misclassification in Y}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(ggplot2)
```

## Introduction

The ATE.ERROR.XY function estimates the Average Treatment Effect (ATE) in the presence of both measurement error in covariates and misclassification in the binary outcome variable Y. It provides estimates of the ATE by employing simulations, bootstrap sampling, and extrapolation methods to account for these errors.

## Generating Simulated Data

First, we generate our simulated data, which includes the observed outcome variable Y_star, which may be misclassified, and the covariate X_star, which is subject to measurement error.

```{r setup}
library(ATE.ERROR)
set.seed(1)
data(Simulated_data)
Y_star <- Simulated_data$Y_star
Y <- Simulated_data$Y
A <- Simulated_data$T
Z <- Simulated_data$Z
X_star <- Simulated_data$X_star
X <- Simulated_data$X
p11 <- 0.8
p10 <- 0.2
sigma_epsilon <- 0.1
B <- 100
Lambda <- seq(0, 2, by = 0.5)
bootstrap_number <- 10
```

In this section, we load the necessary libraries and data. The probabilities `p11` and `p10` are set to 0.8 and 0.2, respectively. We define the measurement error variance `sigma_epsilon` and set up the parameters for the number of simulation steps (`B`), the sequence of lambda values (`Lambda`), and the number of bootstrap samples (`bootstrap_number`).

### Applying the ATE.ERROR.XY Function

We apply the `ATE.ERROR.XY` function using different types of extrapolation: linear, quadratic, and nonlinear. The results from each extrapolation are stored in separate variables.

```{r}
ATE.ERROR.XY_results_linear <- ATE.ERROR.XY(Y_star, A, Z, X_star, p11, p10, sigma_epsilon, 
                                            B, Lambda, extrapolation = "linear", 
                                            bootstrap_number)
```

```{r}
ATE.ERROR.XY_results_quadratic <- ATE.ERROR.XY(Y_star, A, Z, X_star, p11, p10, sigma_epsilon, 
                                               B, Lambda, extrapolation = "quadratic", 
                                               bootstrap_number)

```

```{r}
ATE.ERROR.XY_results_nonlinear <- ATE.ERROR.XY(Y_star, A, Z, X_star, p11, p10, sigma_epsilon, 
                                               B, Lambda, extrapolation = "nonlinear", 
                                               bootstrap_number)
```

### Combining Summaries
The summaries from the different extrapolation methods are combined into one table, which also includes the True ATE.

```{r}
combined_summary <- rbind(
  ATE.ERROR.XY_results_linear$summary,
  ATE.ERROR.XY_results_quadratic$summary,
  ATE.ERROR.XY_results_nonlinear$summary)
```

## Adding True ATE to the Result Summary:

The True ATE is added to the result summary, and the columns are reordered to report the true ATE and the naive estimate for ATE:

```{r}
True_ATE <- True_Estimation(Y, A, Z, X)
```

```{r}
Naive_ATE_XY <- Naive_Estimation(Y_star, A, Z, X_star)
```

```{r}
combined_summary <- data.frame(True_ATE = round(True_ATE, 3), combined_summary)
print(combined_summary)
```

This table summarizes the results from the `ATE.ERROR.XY` function with different extrapolation methods. It includes the True ATE, Naive ATE, measurement error variance `sigma_epsilon`, probabilities `p10` and `p11`, type of extrapolation, ATE, a standard error (SE), and a 95% confidence interval (CI).

### Visualizing the Distribution of ATE Estimates Using a Boxplot

We create a boxplot for the N estimates of ATE obtained from the `ATE.ERROR.XY` function with different extrapolation methods.

```{r, fig.width=8.5, fig.height=4}
combined_data <- rbind(
  ATE.ERROR.XY_results_linear$boxplot$data,
  ATE.ERROR.XY_results_quadratic$boxplot$data,
  ATE.ERROR.XY_results_nonlinear$boxplot$data
)

combined_plot <- ggplot(combined_data, aes(x = Method, y = ATE, fill = Method)) +
  geom_boxplot() +
  geom_hline(aes(yintercept = Naive_ATE_XY, color = "naive estimate"), 
             linetype = "dashed") +
  geom_hline(aes(yintercept = True_ATE, color = "true estimate"), 
             linetype = "dashed") +
  scale_color_manual(name = NULL, values = c("naive estimate" = "red", 
                                             "true estimate" = "blue")) +
  labs(title = "ATE Estimates from the ATE.ERROR.XY Method with different 
       Approximations of the Extrapolation Function", 
       y = "ATE Estimate") +
  theme_minimal() +
  theme(legend.position = "right") +
  guides(fill = guide_legend(title = NULL, order = 1),
         color = guide_legend(title = NULL, override.aes = list(linetype = "dashed"),
                              order = 2))

print(combined_plot)
```

### Explanation

- **Naive Estimation**: The `Naive_Estimation` function calculates the naive estimate of the ATE without correcting for measurement error or misclassification.
- **ATE.ERROR.XY Function**: The `ATE.ERROR.XY` function corrects for measurement error and misclassification, providing more accurate ATE estimates.
- **Summary Table**: The summary table includes naive and true estimates, measurement error variance, extrapolation method, standard error, and confidence intervals.
- **Boxplot**: The boxplot visualizes the distribution of ATE estimates using different extrapolation methods, with dashed lines indicating the naive and true estimates.

This vignette provides a comprehensive overview of the `ATE.ERROR.XY` function, demonstrating how to apply it, interpret the results, and visualize the ATE estimates. The method effectively addresses measurement error and misclassification in the data.