---
title: "Three Ways to Test the Same Hypothesis"
author: "Donny Williams"
date: "5/23/2020"
bibliography: ../inst/REFERENCES.bib
output:
rmarkdown::html_vignette:
  toc: yes
vignette: >
  %\VignetteIndexEntry{Three Ways to Test the Same Hypothesis}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

# Introduction
On a Facebook methods group, there was a question about testing hypotheses in networks. In 
the comments, it was suggested that **BGGM** could be used to test the hypothesis. And it turns
out that **BGGM** really shines for testing expectations [see for example @rodriguez2020formalizing]. 

In this vignette, I demonstrate three ways to go about testing the same hypothesis, which is
essentially testing for a difference in the **sum** of partial correlations between groups.

### R package
```{r, eval = FALSE, message=FALSE}
# need the developmental version
if (!requireNamespace("remotes")) { 
  install.packages("remotes")   
}   

# install from github
remotes::install_github("donaldRwilliams/BGGM")
library(BGGM)
```

### Data
For demonstrative purposes, I use the `bfi` data and test the hypotheses in males and females

```
# data
Y <- bfi

# males
Y_males <- subset(Y, gender == 1, select = -c(education, gender))[,1:5]

# females
Y_females <- subset(Y, gender == 2, select = -c(education, gender))[,1:5]
```

# Approach 1: Posterior Difference
The first approach is rather straightforward, with the caveat that the method needs to be 
implemented by the user. Note that I could certainly implement this in **BGGM**, assuming there 
is enough interest. Please make a feature request [here](https://github.com/donaldRwilliams/BGGM/issues).

## Hypothesis
The hypothesis was that a sum of relations was larger in one group, for example,

$$
\begin{align}
\mathcal{H}_0: (\rho^{male}_{A1--A2}\; + \; \rho^{male}_{A1--A3}) = (\rho^{female}_{A1--A2}\; + \; \rho^{female}_{A1--A3}) \\
\mathcal{H}_1: (\rho^{male}_{A1--A2}\; + \; \rho^{male}_{A1--A3}) > (\rho^{female}_{A1--A2}\; + \; \rho^{female}_{A1--A3})
\end{align}
$$
Note that the hypothesis is related to the sum of relations, which is readily tested in **BGGM**. 

## Fit Models
The first step is to estimate the model for each group

```r
# fit female
fit_female <- estimate(Y_females, seed = 2)

# fit males
fit_male <- estimate(Y_males, seed = 1)
```
For an example, I used the default which is to assume the data is Gaussian. This can be changed with `type = ` either `binary`, `ordinal`, or `mixed`.

## Extract the Samples
The next step is to extract the posterior samples for each relation

```r
post_male <- posterior_samples(fit_male)[,c("A1--A2", "A1--A3")]

post_female <- posterior_samples(fit_female)[,c("A1--A2", "A1--A3")]
```

Note that the column names reflect the upper-triangular elements of the 
partial correlation matrix. Hence, the first name (e.g.,`A1`) must be located before 
the second name (e.g., `A2`) in the data matrix. This can be understood in reference 
to the column numbers: `1--2` is correct whereas `2--1` will result in an error.

## Sum and Compute Difference
The next step is to sum the relations and compute the difference

```r
# sum males
sum_male <- rowSums(post_male) 

# sum females
sum_female <- rowSums(post_female)

# difference
diff <- sum_male - sum_female
```
which can then be plotted
```r
# three column
par(mfrow=c(1,3))

# male sum
hist(sum_male)

# female sum
hist(sum_female)

# difference
hist(diff)
```
![](../man/figures/hyp_3ways_hist.png)




## Posterior Probability
Next compute the posterior probability the sum is larger in males than females

```r
# posterior prob
mean(sum_male > sum_female)

#> 0.737
```
and then the credible interval for the difference

```
quantile(diff, probs = c(0.025, 0.975))

#>        2.5%       97.5% 
#> -0.06498586  0.12481253 
```

# Approach 2: Predictive Check
The next approach is based on a posterior predictive check. The hypothesis is essentially the same as above, but for the predictive distribution, that is,

$$
\begin{align}
\mathcal{H}_0: (\rho^{male^{yrep}}_{A1--A2}\; + \; \rho^{male^{yrep}}_{A1--A3}) = (\rho^{female^{yrep}}_{A1--A2}\; + \; \rho^{female^{yrep}}_{A1--A3}) \\
\mathcal{H}_1: (\rho^{male^{yrep}}_{A1--A2}\; + \; \rho^{male^{yrep}}_{A1--A3}) > (\rho^{female^{yrep}}_{A1--A2}\; + \; \rho^{female^{yrep}}_{A1--A3})
\end{align}
$$
where the only difference is $yrep$. See more details [here](https://donaldrwilliams.github.io/BGGM/articles/ppc_custom.html).

## Define Function
The first step is to define a function to compute the difference in sums
```r
# colnames
cn <- colnames(Y_males)

# function
f <- function(Yg1, Yg2){
  
  # data
  Yg1 <- na.omit(Yg1)  
  Yg2 <- na.omit(Yg2)

  # estimate partials
  fit1 <- pcor_mat(estimate(Yg1, analytic = TRUE))
  fit2 <- pcor_mat(estimate(Yg2, analytic = TRUE))
  
  # names (not needed)
  colnames(fit1) <- cn
  rownames(fit1) <- cn
  colnames(fit2) <- cn
  rownames(fit2) <- cn
  
  # take sum
  sum1 <- fit1["A1", "A2"] + fit1["A1", "A3"]
  sum2 <- fit2["A1", "A2"] + fit2["A1", "A3"]
  
  # difference
  sum1 - sum2

}
```

Note that the function takes two data matrices and then returns a single value. 
Also, the default in **BGGM** does not require a custom function 
(only needs the data from each group).

## Predictive Check
The next step is to compute the observed difference and then perform the check.

```r
# observed
obs <- f(Y_males, Y_females)

# check
ppc <- ggm_compare_ppc(Y_males, Y_females, 
                       iter = 250, 
                       FUN = f, 
                       custom_obs = obs)

# print
ppc

#> BGGM: Bayesian Gaussian Graphical Models 
#> --- 
#> Test: Global Predictive Check 
#> Posterior Samples: 250 
#>   Group 1: 896 
#>   Group 2: 1813 
#> Nodes:  5 
#> Relations: 10 
#> --- 
#> Call: 
#> ggm_compare_ppc(Y_males, Y_females, iter = 250, FUN = f, custom_obs = obs)
#> --- 
#> Custom: 
#>  
#>    contrast custom.obs p.value
#>  Yg1 vs Yg2      0.029   0.264
#> --- 
```

Note this requires the user to determine $\alpha$. 

## Plot 
The check can also be plotted

```r
plot(ppc)
```
![](../man/figures/hyp_3ways_ppc.png)

where the red is the critical region.

# Approach 3: Bayesian Hypothesis Testing
The above approaches cannot provide evidence that the sum is equal. In other words, just because there was 
not a difference, this does not provide evidence for equality. The Bayes factor methods allow for formally 
assessing the equality model, that is,

$$
\begin{align}
\mathcal{H}_1&: (\rho^{male}_{A1--A2}\; + \; \rho^{male}_{A1--A3}) > (\rho^{female}_{A1--A2}\; + \; \rho^{female}_{A1--A3}) \\
\mathcal{H}_2&: (\rho^{male}_{A1--A2}\; + \; \rho^{male}_{A1--A3}) = (\rho^{female}_{A1--A2}\; + \; \rho^{female}_{A1--A3}) \\
\mathcal{H}_3&: \text{not} \; \mathcal{H}_1 \; \text{or} \; \mathcal{H}_2
\end{align}
$$

where $\mathcal{H}_3$ is the complement and can be understood as neither the first or second hypothesis.

## Test Hypothesis 
The hypothesis is easily translated to `R` code
```r
hyp <- c("g1_A1--A2 + g1_A1--A3 > g2_A1--A2 + g2_A1--A3; 
          g1_A1--A2 + g1_A1--A3 = g2_A1--A2 + g2_A1--A3")
```

Note the `g1` indicates the group and `;` separates the hypotheses. I again assume the data is Gaussian 
(although this can be changed to `type = "ordinal"` or `type = "mixed"`; see [here](https://donaldrwilliams.github.io/BGGM/reference/ggm_compare_confirm.html))

```r
test <- ggm_compare_confirm(Y_males, Y_females, 
                            hypothesis = hyp)


# print
test


#> BGGM: Bayesian Gaussian Graphical Models 
#> Type: continuous 
#> --- 
#> Posterior Samples: 25000 
#>   Group 1: 896 
#>   Group 2: 1813 
#> Variables (p): 5 
#> Relations: 10 
#> Delta: 15 
#> --- 
#> Call:
#> ggm_compare_confirm(Y_males, Y_females, hypothesis = hyp)
#> --- 
#> Hypotheses: 
#> 
#> H1: g1_A1--A2+g1_A1--A3>g2_A1--A2+g2_A1--A3
#> H2: g1_A1--A2+g1_A1--A3=g2_A1--A2+g2_A1--A3
#> H3: complement
#> --- 
#> Posterior prob: 
#> 
#> p(H1|data) = 0.13
#> p(H2|data) = 0.825
#> p(H3|data) = 0.046
#> --- 
#> Bayes factor matrix: 
#>       H1    H2     H3
#> H1 1.000 0.158  2.853
#> H2 6.349 1.000 18.113
#> H3 0.351 0.055  1.000
#> --- 
#> note: equal hypothesis prior probabilities
```


Note the posterior hypothesis probability for the equality model is 0.825. The Bayes factor matrix then divides those values, for example, $BF_{21}$ indicates the data were about 6 times more likely under $\mathcal{H}_2$ than $\mathcal{H}_1$. 

## Plot Hypothesis
The hypothesis can be plotted

```r
plot(test)
```
![](../man/figures/hyp_3ways_pie.png)

### Sensitivity Analysis
It is also important to check the robustness. Here the width of the prior distribution is decreased

```r
test <- ggm_compare_confirm(Y_males, Y_females, 
                            hypothesis = hyp, 
                            prior_sd = 0.15)
# print
test$out_hyp_prob

#> 0.18523406 0.74906147 0.06570447
```
which results in a probability of 0.75 for $\mathcal{H}_2$ ($BF_{21} = 4.04$).


# Conclusion
Three approaches for testing the same hypothesis were demonstrated in this vignette. This highlights that any hypothesis can be tested in **BGGM** and in several ways. 


# References