---
title: "Confidence intervals with proportions"
bibliography: "../inst/REFERENCES.bib"
csl: "../inst/apa-6th.csl"
output: 
  rmarkdown::html_vignette
description: >
  This vignette describes how to plot confidence intervals with proportions.
vignette: >
  %\VignetteIndexEntry{Confidence intervals with proportions}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r, echo = FALSE, message = FALSE, results = 'hide', warning = FALSE}
cat("this will be hidden; use for general initializations.\n")
library(ANOPA)
```

Probably the most useful tools for data analysis is a plot with
suitable error bars [@cgh21]. In this vignette, we show how
to make confidence intervals for proportions.

## Theory behind Confidence intervals for proportions

For proportions, ANOPA is based on the Anscombe transform 
\insertCite{a48}{ANOPA}. This measure has a known theoretical 
standard error which depends only on sampe size $n$:

$$SE_{A}(n) = 1/\sqrt{4(n+1/2)}.$$ 

Consequently, when the groups'
sizes are similar, homogeneity of variances holds.

From this, we can decomposed the total test statistic $F$ into 
a component for each cell of the design. We thus get

$$\left[
A + z_{0.5-\gamma/2} \times SE_{A}(n), \; A + z_{0.5+\gamma/2} \times SE_{A}(n)
\right]$$

in which $SE_{A}(n)$ is the theoretical standard error based
only on $n$, and $\gamma$ is the desired confidence level (often .95).

This technique returns _stand-alone_ confidence intervals, that is, intervals
which can be used to compare the proportion to a fixed point. However,
such _stand-alone_ intervals cannot be used to compare one proportion
to another proportion [@cgh21]. To compare an observed
proportion to another observed proportion, it is necessary to adjust them
for pair-wise differences [@b12]. This is achieved by 
increasing the wide of the intervals by $\sqrt{2}$.

Also, in repeated measure designs, the correlation is beneficial to improve
estimates. As such, the interval wide can be reduced when correlation is
positive by multiplying its length by $\sqrt{1-\alpha_1}$, where $\alpha_1$ is 
a measure of correlation in a matrix containing repeated measures
(based on the unitary alpha measure).

Finally, the above returns confidence intervals for the *transformed* scores. 
However, when used in a plot, it is typically more convenient to plot
proportions (from 0 to 1) rather than Anscombe-scores (from 0 to $\pi/2 \approx$ 1.57).
Thus, it is possible to rescale the vertical axis using the inverse Anscombe
transform and be shown proportions.

This is it.

## Complicated?

Well, not really:

```{r, message=FALSE, warning=FALSE, fig.width=5, fig.height=3, fig.cap="**Figure 1**. The proportions as a function of class and Difficulty. Error bars show difference-adjusted 95% confidence intervals."}
library(ANOPA)
w <- anopa( {success;total} ~ Class * Difficulty, twoWayExample)
anopaPlot(w) 
```

Because the analyses ``summary(w)`` suggests that only the factor
`Difficulty` has a significant effect, you may select only that factors for plotting, 
with e.g., 

```{r, message=FALSE, warning=FALSE, fig.width=4, fig.height=3,  fig.cap="**Figure 2**. The proportions as a function of Difficulty only. Error bars show difference-adjusted 95% confidence intervals."}
anopaPlot(w, ~ Difficulty ) 
```

As is the case with any ``ggplot2`` figure, you can customize it at will. 
For example,

```{r, message=FALSE, warning=FALSE, fig.width=4, fig.height=3, fig.cap="**Figure 3**. Same as Figure 2 with some visual improvements."}
library(ggplot2)
anopaPlot(w, ~ Difficulty) + 
            theme_bw() +  # change theme
            scale_x_discrete(limits = c("Easy", "Moderate", "Difficult")) #change order
```

As you can see from this plot, Difficulty is very significant, and the most different
conditions are Easy vs. Difficult.

Here you go.


# References