---
title: "AUtests: approximate unconditional and permutation tests for 2x2 tables"
author: "Arjun Sondhi"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{AUtests: approximate unconditional and permutation tests for 2x2 tables}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

This package contains functions for association testing in 2x2 tables (ie. two binary variables). In particular, the scientific setting that motivated this package's development was testing for associations between diseases and rare genetic variants in case-control studies. When the expected number of subjects possessing a variant is small, standard methods perform poorly (usually tend to be overly conservative in controlling the Type I error). 

The two alternative methods implemented in the package are permutation testing and approximate unconditional (AU) testing. 

## Permutation tests

Permutation testing works by computing a test statistic T for the observed data, generating all plausible datasets with the same total number of exposed subjects, then adding up the probabilities of those datasets which give more extreme test statistics than T.

The `perm.tests` function returns p-values from permutation tests based on score, likelihood ratio, Wald (with and without regularization), and Firth statistics. 

The following code runs the tests for a dataset containing 5,000 cases (55 with a minor allele of interest) and 15,000 controls (45 with a minor allele of interest):

```{r}
library(AUtests)
# Example data, 1:3 case-control ratio
perm.tests(15000, 5000, 45, 55)
```

For comparison purposes, the `basic.tests` function returns p-values for the standard score, likelihood ratio, Wald, Firth, and Fisher's exact tests:

```{r}
basic.tests(15000, 5000, 45, 55)
```

## Approximate unconditional tests

AU testing works by computing a test statistic T for the observed data, generating all plausible datasets with *any* number of variants, then adding up the probabilities of those datasets which give more extreme test statistics than T.

The `au.tests` function returns p-values from AU tests based on score, likelihood ratio, and Wald (with and without regularization) statistics. The `au.firth` function returns a p-value from the AU Firth test. It was implemented as a separate function due to its increased computational time.

The following code runs the tests for a dataset containing 10,000 cases (60 with a minor allele of interest) and 10,000 controls (45 with a minor allele of interest):

```{r}
# Example data, balanced case-control ratio
au.tests(10000, 10000, 45, 60)
au.firth(10000, 10000, 45, 60)
```

## AU and permutation likelihood ratio tests with categorical covariates

In order to gain precision or adjust for a confounding variable, it can be of interest to perform a stratified analysis. The `perm.test.strat` function implements a permutation likelihood ratio test that allows for categorical covariates, and the `au.test.strat` implements a similar AU test. The functions read in vectors of controls, cases, controls with the exposure, and cases wih the exposure, where the i-th element of each vector corresponds to the coount for the i-th strata. 

Consider the following example data, with two strata (ie. a binary covariate):
```{r}
m0list = c(500, 1250) # controls 
m1list = c(150, 100) # cases 
r0list = c(60, 20) # exposed controls
r1list = c(25, 5) # exposed cases
```
A non-stratified analysis would yield a highly significant result:
```{r}
perm.tests(1750, 250, 80, 30)
au.tests(1750, 250, 80, 30)
```
When adjusting for the covariate, however, the result is much less significant:
```{r}
perm.test.strat(m0list, m1list, r0list, r1list)
au.test.strat(m0list, m1list, r0list, r1list)
```