---
title: "Illustration on a simulated multipartite network"
author: "Sophie Donnet, Pierre Barbillon"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 4
bibliography: references.bib
link-citations: yes
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{Illustration on a simulated multipartite network}
  %\VignetteEncoding{UTF-8}
---


<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
 # fig.path = "man/figures/README-",
  out.width = "100%"
)
```


<!-- badges: start -->
<!-- badges: end -->




We present the performances of GREMLINS on a simulated multipartite network.  GREMLINS includes a function `rMBM` to simulate multipartite networks.  Mathematical details can be found in @multipartite. 



## Simulation of a complex multipartite network. 


We use the function  `rMBM` provided in the package to simulate a  multipartite network involving $2$ functional groups (namely A and B) of respective sizes 
$$n_A = 60, \quad, n_B = 50.$$ 

A and B are  divided respectively into   $3$ and $2$ blocks. The sizes of the blocks are generated randomly. For reproductibility, we fix the random seed to an arbitrarily chosen value.  

```{r param FG}
namesFG <- c('A','B')
v_NQ <-  c(60,50) #size of each FG
list_pi = list(c(0.16 ,0.40 ,0.44),c(0.3,0.7)) #proportion of each block in each  FG
list_pi[[1]]
``` 

We assume that we observe $3$ interactions matrices

    - A-B : continuous weighted interactions
    - B-B : binary interactions
    - A-A : counting directed interactions
  

```{r param 1 net}
E  <-  rbind(c(1,2),c(2,2),c(1,1))
typeInter <- c( "inc","diradj", "adj")
v_distrib <- c('ZIgaussian','bernoulli','poisson')
```

Note that the distributions may be `Bernoulli`, `Poisson`, `Gaussian` or `Laplace` (with null mean). For the Gaussian distribution, a mean and a variance must be given. 
We generate randomly the  emission parameters $\theta$. 


```{r param net}
list_theta <- list()
list_theta[[1]] <- list()
list_theta[[1]]$mean  <- matrix(c(6.1, 8.9, 6.6, 9.8, 2.6, 1.0), 3, 2)
list_theta[[1]]$var  <-  matrix(c(1.6, 1.6, 1.8, 1.7 ,2.3, 1.5),3, 2)
list_theta[[1]]$p0  <-  matrix(c(0.4, 0.1, 0.6, 0.5 , 0.2, 0),3, 2)
list_theta[[2]] <- matrix(c(0.7,1.0, 0.4, 0.6),2, 2)
m3 <- matrix(c(2.5, 2.6 ,2.2 ,2.2, 2.7 ,3.0 ,3.6, 3.5, 3.3),3,3 )
list_theta[[3]] <- (m3 + t(m3))/2# for symetrisation
``` 

We are now ready to simulate the data



```{r simul 2, eval = TRUE, echo = TRUE}
library(GREMLINS)
dataSim <- rMBM(v_NQ,E , typeInter, v_distrib, list_pi,
                list_theta, namesFG = namesFG, seed = 4,keepClassif  = TRUE)
list_Net <- dataSim$list_Net
length(list_Net)
names(list_Net[[1]])
list_Net[[1]]$typeInter
list_Net[[1]]$rowFG
list_Net[[1]]$colFG
```

## Inference with model selection 

The model selection and the estimation are performed with the function `multipartiteBM`.

```{r MBM simul, echo = TRUE, eval = TRUE}
res_MBMsimu <- multipartiteBM(list_Net, 
                              v_distrib = v_distrib, 
                              namesFG = c('A','B'),
                              v_Kinit = c(2,2),
                              nbCores = 2,
                              initBM = FALSE,
                              keep = FALSE)
```

We can now get the estimated parameters. 

```{r estim param, eval=TRUE}
res_MBMsimu$fittedModel[[1]]$paramEstim$list_theta$AB$mean
```


`extractClustersMBM` produces the clusters in each functional group. 
```{r extract cluster, eval=TRUE}
Cl <- extractClustersMBM(res_MBMsimu)
``` 


## Inference without model selection 
One may also want to estimate the parameters for given numbers of clusters. The function `multipartiteBMFixedModel` is designed for this task. 
```{r MBM fixed, echo = TRUE, eval = TRUE}
res_MBMsimu_fixed <- multipartiteBMFixedModel(list_Net, v_distrib = v_distrib, nbCores = 2,namesFG = namesFG, v_K = c(3,2))
res_MBMsimu_fixed$fittedModel[[1]]$paramEstim$v_K
extractClustersMBM(res_MBMsimu_fixed)$A
```


## Missing data 

GREMLINS is also able to handle missing data. In the following experiment, we artificially set missing data in the previously simulated matrices. 


```{r sim NA}
############# NA data at random in any matrix
epsilon =  10/100
list_Net_NA <- list_Net
for (m in 1:nrow(E)){
   U <-  sample(c(1,0),v_NQ[E[m,1]]*v_NQ[E[m,2]],replace=TRUE,prob  = c(epsilon, 1-epsilon))
   matNA <- matrix(U,v_NQ[E[m,1]],v_NQ[E[m,2]])
   list_Net_NA[[m]]$mat[matNA== 1] = NA
   if (list_Net_NA[[m]]$typeInter == 'adj') {
     M <- list_Net_NA[[m]]$mat
     diag(M) <- NA
     M[lower.tri(M)] = t(M)[lower.tri(M)]
     list_Net_NA[[m]]$mat <- M
     }
}
``` 

```{r MBM simul NA, echo = TRUE, eval = TRUE}
res_MBMsimuNA <- multipartiteBM(list_Net_NA, 
                              v_distrib = v_distrib, 
                              namesFG = c('A','B'),
                              v_Kinit = c(2,2),
                              nbCores = 2,
                              keep = FALSE)
```

We then have a function to predict the missing edges (probability if binary or intensity if weighted)

```{r MBM predict NA, echo = TRUE, eval = TRUE}
pred <- predictMBM(res_MBMsimuNA)
```




## References