---
title: "Refine SuSiE model"
author: "Yuxin Zou"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Refine SuSiE model}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(collapse = TRUE,comment = "#",fig.width = 5,
                      fig.height = 3,fig.align = "center",
                      fig.cap = " ",dpi = 120)
```

In this vignette, we demonstrate a procedure that helps SuSiE get out of local optimum.

We simulate phenotype using UK Biobank genotypes from 50,000 individuals. There are 1001 SNPs.
It is simulated to have exactly 2 non-zero effects at 234, 287.

```{r}
library(susieR)
library(curl)
data_file <- tempfile(fileext = ".RData")
data_url <- paste0("https://raw.githubusercontent.com/stephenslab/susieR/",
                   "master/inst/datafiles/FinemappingConvergence1k.RData")
curl_download(data_url,data_file)
load(data_file)
b <- FinemappingConvergence$true_coef
susie_plot(FinemappingConvergence$z, y = "z", b=b)
```

The strongest marginal association is a non-effect SNP. 

Since the sample size is large, we use sufficient statistics ($X^\intercal X, X^\intercal y, y^\intercal y$ and sample size $n$) to fit susie model. It identifies 2 Credible Sets, one of them is false positive. This is because `susieR` get stuck around a local minimum. 

```{r}
fitted <- with(FinemappingConvergence,
               susie_suff_stat(XtX = XtX, Xty = Xty, yty = yty, n = n))
susie_plot(fitted, y="PIP", b=b, main=paste0("ELBO = ", round(susie_get_objective(fitted),2)))
```

Our refine procedure to get out of local optimum is

1. fit a susie model, $s$ (suppose it has $K$ CSs).

2. for CS in $s$, set SNPs in CS to have prior weight 0, fit susie model --> we have K susie models: $t_1, \cdots, t_K$.

3. for each $k = 1, \cdots, K$, fit susie with initialization at $t_k$ ($\alpha, \mu, \mu^2$) --> $s_k$

4. if $\max_k \text{elbo}(s_k) > \text{elbo}(s)$, set $s = s_{kmax}$ where $kmax = \arg_k \max \text{elbo}(s_k)$ and go to step 2; if no, break.

We fit susie model with above procedure by setting `refine = TRUE`.

```{r}
fitted_refine <- with(FinemappingConvergence,
                      susie_suff_stat(XtX = XtX, Xty = Xty, yty = yty,
					                  n = n, refine=TRUE))
susie_plot(fitted_refine, y="PIP", b=b, main=paste0("ELBO = ", round(susie_get_objective(fitted_refine),2)))
```

With the refine procedure, it identifies 2 CSs with the true signals, and the achieved evidence lower bound (ELBO) is higher.

## Session information

Here are some details about the computing environment, including the
versions of R, and the R packages, used to generate these results.

```{r}
sessionInfo()
```