---
title: "NCT: Null Distributions"
bibliography: ../inst/REFERENCES.bib
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{NCT: Null Distributions}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Introduction

The **GGMncv** network comparison test was vetted by investigating 
the distribution of $p$-values under the null hypothesis, which 
should be uniform. If so, this would indicate that the implementation
is correct, in so far as providing error rates at the nominal level 
(i.e., $\alpha$).


On the other hand, power to detect differences was not investigated 
compared to the implementation in the `R` package **NetworkComparisonTest** [@van2017comparing]. This could be an interesting future direction.

# De-Sparsified Glasso

Following the approach of @jankova2015confidence, the de-sparsified glasso
estimator can be computed without data driven selection
of the regularization parameter. This is accomplished by setting 
$\lambda = \sqrt(log(p)/n)$. The advantage is that this will be very (very) 
fast. That speed, however, should not come at the cost of a compromised
test statistic under the null. 

The following looks at the default test statistics using 
`desparsified = TRUE` (the default). Note for actual use, `iter`
should be set to, say, at least 1,000 and perhaps even 10,000 to get 
a stable $p$-value.

```r
library(GGMncv)
library(car)

main <- gen_net(p = 10, edge_prob = 0.1)

sims <- 1000

p_st <- p_max <- p_kl <- p_sse <- rep(0, sims)

for(i in 1:sims){
  
  y1 <- MASS::mvrnorm(n = 500, 
                      mu = rep(0, 10), 
                      Sigma = main$cors)
  
  y2 <- MASS::mvrnorm(n = 500, 
                      mu = rep(0, 10), 
                      Sigma = main$cors)
  
  fit <- nct(y1, y2, 
               desparsify = TRUE, 
               cores = 1,
               iter = 250,
               penalty = "lasso",
               update_progress = 2,
               penalize_diagonal = FALSE)
  
  p_kl[i]  <- fit$jsd_pvalue
  p_st[i]  <- fit$glstr_pvalue
  p_max[i] <- fit$max_pvalue
  p_sse[i] <- fit$sse_pvalue

}
```


## Distribution of $p$-values

The following distribution should be uniform. Because
I only used 1,000 simulations, however, there might be some
slight departures.

```r
par(mfrow=c(2,2))

hist(p_kl, main = "KL-divergence", 
     xlab = "p-values")

hist(p_st, main = "Global Strength", 
     xlab = "p-values")

hist(p_max, main = "Maximum Difference", 
     xlab = "p-values")

hist(p_sse, main = "Sum of Squares", 
     xlab = "p-values")
```


![](../man/figures/p_dist.png)

That appears really close! Also note a seed was not set, so if 
the above code is run, the results could be a bit different.

## QQplot

Next is a qqplot. Here the $p$-values should fall along the line that
corresponds to the quantiles calculated from a theoretical, in this 
case uniform, distribution.

```r
par(mfrow=c(2,2))

car::qqPlot(
  p_kl,
  distribution  = "unif",
  ylab = "p-values",
  id = FALSE,
  main = 'KL-divergence'
)

car::qqPlot(
  p_st,
  distribution  = "unif",
  ylab = "p-values",
  id = FALSE,
  main = "Global Strength"
)

car::qqPlot(
  p_max,
  distribution  = "unif",
  ylab = "p-values",
  id = FALSE,
  main = "Maximum Difference"
)

car::qqPlot(
  p_sse,
  distribution  = "unif",
  ylab = "p-values",
  id = FALSE,
  main = "Sum of Squares"
)
```

![](../man/figures/qqplot.png)

Again, that looks pretty good.

## Error rate

Here is the type 1 error rate for each test statistic

```r
# kl
mean(p_kl < 0.05)

#> [1] 0.042

# global strength
mean(p_st < 0.05)

#> [1] 0.06

# max diff
mean(p_max < 0.05)

#> [1] 0.041

# sum of squares
mean(p_sse < 0.05)

#> [1] 0.043
```

*Really* close to the nominal level.

# Discussion
For custom test stats (e.g., provided to `FUN`), it is really 
informative to investigate the the p-value distribution. Perhaps
there are some that are overly conservative, which would indicate
the test is not very powerful.

Although not included here, setting `desparsified = FALSE` was
also investigated. The results were much the same, but took
a lot longer (because, for each permutation, 
the tuning parameter is selected).

# References