---
title: "MariNET - A Novel Framework for Inferring Dynamic Network Relationships from Longitudinal EHRs Using Linear Mixed Models"
author:
  - name: "Marina Vargas-Fernández"   
    affiliation: 
    - Department of Statistics and Operational Research. University of Granada
    - GENYO, Centre for Genomics and Oncological Research
    email: marina.vargas@genyo.es
  - name: "Jordi Martorell-Marugán"
    affiliation: 
    - GENYO, Centre for Genomics and Oncological Research
    - Andalusian Foundation for Biomedical Research in Eastern Andalusia (FIBAO)
    email: jordi.martorell@genyo.es
  - name: "Pedro Carmona-Sáez"
    affiliation: 
    - Department of Statistics and Operational Research. University of Granada
    - GENYO, Centre for Genomics and Oncological Research
    email: pcarmona@ugr.es
package: MariNET
date: "`r BiocStyle::doc_date()`"
abstract: >
  The rapid digitization of healthcare data, particularly through electronic health records (EHRs), has created unprecedented opportunities for       biomedical research. EHRs contain rich, heterogeneous, and longitudinal data that, when analyzed at a systems level, can reveal complex patterns    underlying disease progression, comorbidities, and patient trajectories. However, the high-dimensional and interdependent nature of these data      poses significant analytical challenges, particularly when accounting for temporal dependencies and hierarchical structures inherent in             longitudinal studies. Traditional methods, such as Gaussian Graphical Modeling and Vector Autoregression, often fall short in addressing these      complexities due to strict assumptions of independence and stationarity, limiting their applicability to real-world EHR data. To overcome these     limitations, we introduce MariNET, a novel methodology that leverages linear mixed models (LMM) to infer network relationships from longitudinal    EHR data. By incorporating weights derived from LMMs, our method effectively handles correlated observations and provides a robust framework for    analyzing dynamic interactions among clinical variables over time. This approach not only enhances the understanding of temporal dependencies in    healthcare data but also offers a scalable and practical solution for uncovering clinically relevant insights. 
output: 
  BiocStyle::html_document:
    toc: true
    toc_depth: 2
    toc_float: true
    number_sections: true
    css: styles.css
bibliography: references.bib  # Link the bibliography file
vignette: >
  %\VignetteIndexEntry{MariNET - A Novel Framework for Inferring Dynamic Network Relationships from Longitudinal EHRs Using Linear Mixed Models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction to MariNET

The `MariNET` package provides tools for analyzing longitudinal clinical data using linear mixed models (LMM) and visualizing the results as networks. This vignette demonstrates how to use the package to perform longitudinal analysis and generate network plots.

The purpose of this vignette is to showcase the functionality of the package, including:

- Fitting linear mixed models to clinical data
- Visualizing the results as networks
- Comparing different network structures

# Installation

You can install `MariNET` package from CRAN using:

```{r setup}
#install.packages("MariNET")
library("MariNET")
```

# Loading Data

In this section, we will load the dataset included in the package. Sample data was obtained from previous Assesment study about relationships between COVID-19 and clinical variables related to mental health and social contact [@fried2022mental]. 

```{r load-data}
# Load the dataset from the package
data(example_data)

# Display the first few rows
head(example_data)

```

# Linear Mixed effects Model network 

The present package is focused on the use of linear mixed models in the field of network construction. It should be noted that the described methodology could be applied to different fields of information, as the origin of the data itself makes no difference in the method's applicability [@lme42015].

For network construction, a separate linear mixed model is created for each clinical variable, including the others as dependent ones. This process was repeated iteratively for each variable, as performed on previous studies [@van2018network]. 

```{r}
# Extract column names from the dataset
# These represent all available variables in the dataset
varLabs <- colnames(example_data)

# Define a list of variables to be removed from the analysis
# These variables are not included as nodes in the network visualization
remove <- c("id", "day", "beep", "conc")

# Filter out the unwanted variables
# Keeps only the variables that are not in the "remove" list
varLabs <- varLabs[!varLabs %in% remove]

# Print the final list of selected variables to be used as nodes in the network
print(varLabs)
```

The function *lmm_analysis()* is the main tool of this package. It requires input data with the following conditions:

 - *clinical_data*: Dataframe containing clinical and metadata for participants, including identifier as *participant_id*. Make sure this is the first column of the dataframe.
 - *variables_to_scale*: Character vector of variable names to be analyzed, must be numerical as they are scaled.
 - *random_effects*: A character string specifying the random effects formula (default: "(1 | participant_id)").

```{r}
# Perform Linear Mixed Model (LMM) analysis on the dataset
# This function iterates over selected variables (varLabs) and models their relationships
# while accounting for individual-level variability using a random effect.

model <- lmm_analysis(
  example_data,   # Input dataset containing clinical/longitudinal data
  varLabs,        # List of selected variables to be analyzed in the model
  random_effects = "(1|id)"  # Specifies a random intercept for each individual (id)
)

# Print the model results (optional, useful for debugging or reviewing output)
# print(model)

```
# Network visualization

In order to visualize the plot according to grouping factors, it is important to add a structure to the data. This means grouping or selecting colors to differentiate between correlated symptoms. Visualization is based on qgraph package [@R-qgraph].

```{r qgraph-plot2, fig.width=3, fig.height=2, dpi=300}
# Define the community structure for the variables
# Assigns labels to different groups based on symptoms or categories
community_structure <- c(
  rep("Stress", 8),   # First 8 variables belong to the "Stress" group
  rep("Social", 6),   # Next 6 variables belong to the "Social" group
  rep("Covid-19", 4)  # Last 4 variables belong to the "Covid-19" group
)

# Create a dataframe linking variable names to their assigned community group
structure <- data.frame(varLabs, community_structure)

# Define labels for the network plot (using variable names)
labels <- varLabs

# Load the qgraph package for network visualization
library(qgraph)

# Generate the network plot using qgraph
qgraph(
  model,                                # Adjacency matrix or network model input
  groups = structure$community_structure, # Assign colors based on community groups
  labels = labels,                        # Display variable names as node labels
  legend = TRUE,                           # Include a legend in the plot
  layout = "spring",                       # Use a force-directed "spring" layout for better visualization
  color = c("orange", "lightblue", "#008080"), # Define colors for different groups
  legend.cex = 0.3                          # Adjust the size of the legend text
)


```

# Comparison between models

As the weighted matrix is built based on t-values, it is not contained between -1 and 1 values. This means that it is not comparable with usual network modeling methods, which rely on correlation and pairwise estimation. For comparability purposes, normalization is performed on the adjacency matrix, scaling values by range. Then, normalized weighted matrices are subtracted to see differences.

```{r qgraph-plot3, fig.width=3, fig.height=2, dpi=300}
# Fit a second Linear Mixed Model (LMM) with a more complex random effects structure
# This model accounts for repeated measures within individuals (id) over different days (day)
# and also considers an additional random effect for the variable "conc" (context or condition)

model2 <- lmm_analysis(
  example_data,    # Input dataset containing clinical/longitudinal data
  varLabs,         # List of selected variables to be analyzed in the model
  random_effects = "(1|id/day) + (1|conc)"  # Random effects structure:
                                            # (1|id/day) -> Nested random effect for each day within an individual
                                            # (1|conc) -> Additional random effect for "conc" variable
)

# Generate a network visualization from the second LMM model
qgraph(
  model2,                                # Adjacency matrix or network model derived from LMM
  groups = structure$community_structure, # Assign colors based on predefined symptom groups
  labels = labels,                        # Display variable names as node labels
  legend = TRUE,                           # Include a legend in the plot
  layout = "spring",                       # Use a force-directed "spring" layout for better visualization
  color = c("orange", "lightblue", "#008080"), # Define colors for different variable groups
  legend.cex = 0.3                          # Adjust the legend text size to avoid oversized labels
)

```


Subtraction is performed between adjacency matrices. Normalization between -1 and 1 is performed inside *differentiation()* function.
This function requires two adjacency matrices as an input, both of them should have the same dimensions and node names.
```{r qgraph-plot4, fig.width=3, fig.height=2, dpi=300}
# Compute the difference between the two Linear Mixed Model (LMM) networks
# This highlights changes in relationships when considering different random effect structures
difference <- differentiation(model, model2)

# Generate a network visualization of the differences between the two models
qgraph(
  difference,                            # Adjacency matrix representing differences between model1 and model2
  groups = structure$community_structure, # Assign colors based on predefined variable groups
  labels = labels,                        # Display variable names as node labels
  legend = TRUE,                           # Include a legend in the plot
  layout = "spring",                       # Use a force-directed "spring" layout for better visualization
  color = c("orange", "lightblue", "#008080"), # Define colors for different variable groups
  legend.cex = 0.3                          # Adjust legend text size to keep it readable
)

```

# Additional information

To check your R session information, including loaded packages, R version, and system details.
```{r}
sessionInfo()
```

# References
```{r, echo=FALSE, results="asis"}