DrDimont: Drug Response Prediction from Differential Multi-Omics Networks

Installation

The R package DrDimont can be installed via CRAN. The R dependencies of the package will also be installed when installing the package. The complete source code can be accessed through https://gitlab.com/PHiort/DrDimont.

The R package can be installed with:

install.packages('DrDimont')

After installation, you can load the package into your session with:

library(DrDimont)

Installation of Python and its dependencies

The pipeline uses a Python script for one of the intermediate steps. For differential drug response score computation, Python (3.8 or 3.9) has to be installed on the system prior to running DrDimont. Please install either the standalone Python 3.9 (or 3.8) (https://www.python.org/downloads/), or Python via Anaconda (https://www.anaconda.com/products/distribution) or via miniconda (https://docs.conda.io/en/latest/miniconda.html) before running DrDimont.

If Python is installed on your system, the install_python_dependencies() function of DrDimont can be used to install the required Python dependencies. The Python packages will be installed in a virtual python or conda environment called ‘r-DrDimont’. Depending on which Python package manager (standalone or anaconda/miniconda Python) is installed on your system, the dependencies can be installed using pip (default; if standalone Python is installed):

install_python_dependencies(package_manager="pip")

or conda (if anaconda/miniconda Python is installed):

install_python_dependencies(package_manager="conda")

ATTENTION : When using pip, Python version 3.8 or 3.9 must be installed on the system (currently one of the Python dependencies (‘ray’) only work with Python <= 3.9). When using conda Python 3.9 will be automatically installed in the conda environment. If the installation does not work with DrDimont’s internal function please refer to a manual installation of the libraries as described below.

To manually create and install the python packages in the r-DrDimont environment, please run the following on your command line (outside of R):

With conda run:

conda create -n r-DrDimont -c conda-forge --yes python=3.9
conda activate r-DrDimont
pip install numpy tqdm igraph ray

With pip run:

#on Windows run the following in your home folder:
mkdir  .\Documents\.virtualenvs\
python -m venv .\Documents\.virtualenvs\r-DrDimont
.\Documents\.virtualenvs\r-DrDimont\Scripts\activate
pip install --upgrade pip numpy tqdm igraph ray

# on Linux and Mac run the following in your home folder:
mkdir .virtualenvs/
python -m venv .virtualenvs/r-DrDimont
source .virtualenvs/r-DrDimont/bin/activate
pip install --upgrade pip numpy tqdm igraph ray

ATTENTION : The python dependencies have to be installed into a virtual or conda environment with the name r-DrDimont otherwise the execution of the Python script will not work.

A text file (requirments_pip.txt or requirements_conda.txt) with all required packages can also be downloaded from gitlab in the inst/ directory or you can find them in your R package directory folder in the DrDimont/ folder.

Example Data Set Description

The following exemplary pipeline application showcases the usage of molecular breast cancer data with ER+ (Estrogen receptor-positive) patient samples as group A and ER- (Estrogen receptor-negative) as group B. A reduced exemplary data set is included within the package.

The breast cancer data by Krug et al. (2020) used for this tutorial is already preprocessed and only includes samples with tumor purity > 0.5 and known ER status. Metabolite data was sampled randomly to generate distributions similar to those reported, e.g., in Terunuma et al. (2014).

The data set contains observations from:

78 ER+ samples
34 ER- samples

	Number of genes, etc.	Preprocessing	Identifier
mRNA	13915	quantified mRNA expression; log2-transformed FPKM values, NAs set to -11, removed mRNAs with > 90% of zero measurements, reduced	gene name
Protein	5809 (ER+) and 5845 (ER-)	quantified proteomics data; normalized, standardized, removed proteins with > 20% NAs, reduced	NCBI RefSeq ID, gene name
phosphosites	10272 (ER+) and 11318 (ER-)	quantified phosphoproteomics data; normalized, removed phosphosites with > 20% NAs, reduced	phosphosite, gene name, NCBI RefSeq ID
Metabolite	275 from 33 (ER+) and 34 (ER-) samples	randomly sampled metabolomics data; removed metabolites with > 50% NAs	biochemical name, PubChem ID, metabolon ID

To limit runtime and space requirements of the example we reduced the mRNA, protein and phosphosite data to a random set of 50 genes. The 50 genes were randomly selected from the set of genes with known drug targets from The Drug Gene Interaction Database (https://www.dgidb.org/). The metabolite data was also randomly reduced to 50 metabolites.

Load the data

First you load the pre-processed data. This data is included in the package and does not need to be manually loaded but can be directly accessed once library(DrDimont) is called.

data("mrna_data")
data("protein_data")
data("phosphosite_data")
data("metabolite_data")
data("metabolite_protein_interactions")
data("drug_gene_interactions")

Transform the data to the required input format

After loading the data, you can use formatting functions to bring your data into the required input formats:

make_layer() - creates individual molecular layers from raw data and unique identifiers
make_connection() - specifies connections between two individual layers
make_drug_target() - formats drug target interactions

Create individual layers data structure from the molecular data

Before running the pipeline, you can create individual layer objects using make_layer(). Please supply raw data stratified over two patient groups and unique identifiers for the molecular entities, e.g, genes. The function make_layer() requires the following input parameters: name, data_groupA, data_groupB, identifiers_groupA and identifiers_groupB. Please give each layer a unique name with the name argument. The identifiers_groupA and identifiers_groupB parameters are given data frames which should contain one or more uniquely named columns with identifiers of the molecular entities in the rows, e.g., gene names. You can supply the raw data with the data_groupA and data_groupB parameters with the molecular entities (e.g, genes) as rows and the samples as columns. Please make sure that the identifiers of the molecular entities are in the same order as the columns in the raw data. If you have only one group to analyse then you can set the parameters data_groupB=NULL and identifiers_groupB=NULL.

Run the code below for exemplary raw data frames:

# Data inspection
mrna_data$groupA[1:3, 1:5]
#>   gene_name X01BR015 X01BR018 X01BR023 X01BR025
#> 1     ABCA1   2.6616   3.0655   2.4372   2.0085
#> 2  CACNA2D1  -0.0491   2.4549   0.4724  -2.6449
#> 3     CASP3   3.9785   4.5653   3.3765   3.7385
protein_data$groupA[1:3, 1:5]
#>       ref_seq gene_name X01BR015 X01BR018 X01BR023
#> 1 NP_004360.2    COL6A3   2.8159  -1.0654  -0.1252
#> 2 NP_476507.3    COL6A3   0.7548  -2.8463   0.3242
#> 3 NP_005600.1      PYGM   0.4418   0.6022   0.5072
phosphosite_data$groupA[1:3, 1:5]
#>                             site_id gene_name        ref_seq X01BR015 X01BR018
#> 1 NP_001006666.1_S230s _1_1_230_230   RPS6KA1 NP_001006666.1   0.4518   0.0663
#> 2 NP_001006666.1_S372s _1_1_372_372   RPS6KA1 NP_001006666.1   0.1946  -0.4792
#> 3 NP_001006666.1_S389s _1_1_389_389   RPS6KA1 NP_001006666.1  -0.7127  -0.1509
metabolite_data$groupA[1:3, 1:5]
#>                   biochemical_name metabolon_id pubchem_id       X1        X2
#> 2             1-methylnicotinamide        27665        457  61561.9  38714.07
#> 4 1-stearoylglycerophosphoinositol        19324       6563 106015.4        NA
#> 6     cytidine 5'-diphosphocholine        34418      13804       NA 736205.19

Run the code below to create the individual layers:

# Create individual layers
mrna_layer <- make_layer(name="mrna",
                         data_groupA=mrna_data$groupA[,-1],
                         data_groupB=mrna_data$groupB[,-1],
                         identifiers_groupA=data.frame(gene_name=mrna_data$groupA$gene_name),
                         identifiers_groupB=data.frame(gene_name=mrna_data$groupB$gene_name))
#> [22-09-23 16:29:54] Layer "mrna", group "groupA" contains 78 samples and 50 genes/proteins/entities.
#> [22-09-23 16:29:54] Layer "mrna", group "groupB" contains 34 samples and 50 genes/proteins/entities.

protein_layer <- make_layer(name="protein",
                            data_groupA=protein_data$groupA[, c(-1,-2)],
                            data_groupB=protein_data$groupB[, c(-1,-2)],
                            identifiers_groupA=data.frame(gene_name=protein_data$groupA$gene_name, 
                                ref_seq=protein_data$groupA$ref_seq),
                            identifiers_groupB=data.frame(gene_name=protein_data$groupB$gene_name, 
                                ref_seq=protein_data$groupB$ref_seq))
#> [22-09-23 16:29:54] Layer "protein", group "groupA" contains 78 samples and 53 genes/proteins/entities.
#> [22-09-23 16:29:54] Layer "protein", group "groupB" contains 34 samples and 53 genes/proteins/entities.

phosphosite_layer <- make_layer(name="phosphosite",
                                data_groupA=phosphosite_data$groupA[, c(-1,-2, -3)],
                                data_groupB=phosphosite_data$groupB[, c(-1,-2, -3)],
                                identifiers_groupA=data.frame(phosphosite_data$groupA[, 1:3]),
                                identifiers_groupB=data.frame(phosphosite_data$groupB[, 1:3]))
#> [22-09-23 16:29:54] Layer "phosphosite", group "groupA" contains 78 samples and 61 genes/proteins/entities.
#> [22-09-23 16:29:54] Layer "phosphosite", group "groupB" contains 34 samples and 65 genes/proteins/entities.

metabolite_layer <- make_layer(name="metabolite",
                               data_groupA=metabolite_data$groupA[, c(-1,-2, -3)],
                               data_groupB=metabolite_data$groupB[, c(-1,-2, -3)],
                               identifiers_groupA=data.frame(metabolite_data$groupA[, 1:3]),
                               identifiers_groupB=data.frame(metabolite_data$groupB[, 1:3]))
#> [22-09-23 16:29:54] Layer "metabolite", group "groupA" contains 33 samples and 50 genes/proteins/entities.
#> [22-09-23 16:29:54] Layer "metabolite", group "groupB" contains 34 samples and 49 genes/proteins/entities.

Run the code below to create a list of all individual layers for the pipeline input:

all_layers <- list(mrna_layer, protein_layer, phosphosite_layer, metabolite_layer)

Create inter-layer connections data structure

The inter-layer connections can be supplied by the user with make_connection(). The parameters from and to have to match to a name given in the previously created layers by make_layer(). The established connection will result in an undirected combined graph. The parameter group indicates whether the connection will be applied to both groups (default) or only group A or B. There are two options to connect layers: (i) based on identical identifiers of entities, or (ii) based on a given interaction table.

For (i), two layers should contain one matching column name in their identifiers_groupA/identifiers_groupB data frames that is passed as the parameter connect_on. Two entities in the different layers with the same ID therein are connected with an edge of fixed weight (indicated by the weight parameter, default 1).

For example:

# (i) make inter-layer connection 
make_connection(from='mrna', to='protein', connect_on='gene_name', weight=1, group="both")

For (ii), an interaction table containing three columns is required. Two columns should contain entity IDs that are also given in the respective identifiers identifiers_groupA/identifiers_groupB of the two layers to be connected. One column of those should have the same name as a column name given in the identifiers_groupA/identifiers_groupB data frames of one layer and the second column the same for the second layer. The third column should contain the weights with which the respective entities of the two layers are to be connected.

See data(metabolite_protein_interactions) for an exemplary interaction table. The table contains the columns “pubchem_id” also given for the metabolite layer, “gene_name” also given for the protein layer, and “combined_score” containing the weights for the respective interactions:

# Data inspection
metabolite_protein_interactions[1:3, ]
#>   pubchem_id gene_name combined_score
#> 1          1      CHAT          0.946
#> 2          1      CRAT          0.993
#> 3          1        CS          0.911

The interaction table is passed to the connect_on parameter of make_connection() and the column name of the column containing the weights to the weight parameter. For example:

# (ii) make inter-layer connection 
make_connection(from='protein', to='metabolite', 
                connect_on=metabolite_protein_interactions, 
                weight='combined_score', group="both")

If you have only one layer you can skip the next step and set the parameter inter_layer_connections=NULL later on.

Run the code below to create a list of all inter-layer connections for pipeline input:

all_inter_layer_connections = list(
    make_connection(from='mrna', to='protein', connect_on='gene_name', weight=1, group="both"),
    make_connection(from='protein', to='phosphosite', connect_on='gene_name', weight=1, group="both"),
    make_connection(from='protein', to='metabolite', 
        connect_on=metabolite_protein_interactions, weight='combined_score', group="both")
)

Create drug-target interaction data structure

To run the entire pipeline, drug-target interactions are required. For that you need an interaction table mapping drugs to their targets, e.g, proteins. The table should contain two columns: one column containing the drug ids with the name drug_name and another column containing the drug targets with a name matching a column name in the identifiers_groupA/identifiers_groupB data frames of the target layer. The example data contains a table from The Drug Gene Interaction Database providing interactions of drugs with genes. The exemplary data frame has three columns (gene_name, drug_name, drug_chembl_id), one containing the gene names also given for the target protein layer, the second containing the drug names which are used to identify the drugs and a third column containing the ChEMBL IDs of drugs which will be ignored in the pipeline. The data frame of the drug-target interactions should have an column named drug_name containing drug identifiers.

Example:

# Data inspection
drug_gene_interactions[1:3, ]
#>   gene_name   drug_name drug_chembl_id
#> 1      CDK7  BMS-387032   CHEMBL296468
#> 2   ADORA2A CHEMBL72862    CHEMBL72862
#> 3      APOE  PREDNISONE      CHEMBL635

The function make_drug_target() generates the required format of the drug-target interactions for the pipeline. The parameter target_molecules should match one of the layer names, e.g., protein. The data frame supplied with the parameter interaction_table should map drugs to their target as described above. The column in the interaction table containing the targets should be given with the match_on parameter, e.g, match_on=gene_name for protein as targets.

Run the code below to create a list containing the drug-target input for the pipeline:

all_drug_target_interactions <- make_drug_target(
                                    target_molecules='protein',
                                    interaction_table=drug_gene_interactions,
                                    match_on='gene_name')

Check input data structures

When the input data structures of the individual layers, the inter-layer connections, and the drug target interactions are created they are checked automatically for validity. Additionally, the function below checks for a variety of possible input formatting and connection errors and reports registered data set sizes (samples, entities) for the user to compare with the intended input.

return_errors(check_input(layers=all_layers, 
                          inter_layer_connections=all_inter_layer_connections, 
                          drug_target_interactions=all_drug_target_interactions))
#> [22-09-23 16:29:55] Layer "mrna", group "groupA" contains 78 samples and 50 genes/proteins/entities.
#> [22-09-23 16:29:55] Layer "mrna", group "groupB" contains 34 samples and 50 genes/proteins/entities.
#> [22-09-23 16:29:55] Layer "protein", group "groupA" contains 78 samples and 53 genes/proteins/entities.
#> [22-09-23 16:29:55] Layer "protein", group "groupB" contains 34 samples and 53 genes/proteins/entities.
#> [22-09-23 16:29:55] Layer "phosphosite", group "groupA" contains 78 samples and 61 genes/proteins/entities.
#> [22-09-23 16:29:55] Layer "phosphosite", group "groupB" contains 34 samples and 65 genes/proteins/entities.
#> [22-09-23 16:29:55] Layer "metabolite", group "groupA" contains 33 samples and 50 genes/proteins/entities.
#> [22-09-23 16:29:55] Layer "metabolite", group "groupB" contains 34 samples and 49 genes/proteins/entities.

Run the complete pipeline

The pipeline can be run entirely or in individual steps. To set global pipeline options you can create a settings list using the drdimont_settings() function. This function contains default parameters that can be modified as shown below. For a detailed explanation of all possible settings and parameters please refer to the function documentation by calling ?drdimont_settings().

Please be aware of the Python script used in one of the pipeline steps (see Requirements above). If you have installed python and the required packages via pip then you should set the drdimont_settings() parameter conda=FALSE". If you have installed python and the required packages via conda then set the drdimont_settings() parameters conda=TRUE. drdimont_settings() will automatically check if Python can be found and prints a warning if not.

The intermediate pipeline and drug response scores output (parameter save_data) is deactivated (default) but especially for large data files consider turning it on. You can specify the output location of files with the saving_path parameter. If not specified all files will be written to a temporary file created by R. For this example, the data will be saved in a temporary directory. If you want to save the data elsewhere you need to change the parameter saving_path below. The intermediate output data includes RData-files of the correlation matrices, the individual graphs, the combined graphs, the drug target edges, the interaction score graphs, and the differential score graph. The drug response scores are saved in a tsv-file in the specified output directory if save_data=TRUE.

See Running the individual pipeline steps and call ?drdimont_settings() for further explanations of settings parameters.

Run the following code to create a settings list for the example:

example_settings <- drdimont_settings(
                        handling_missing_data = list(
                            default = "pairwise.complete.obs",
                            mrna = "all.obs"),
                        reduction_method = "pickHardThreshold",
                        r_squared=list(default=0.65, metabolite=0.1),
                        cut_vector=list(default=seq(0.2, 0.65, 0.01)),
                        conda=FALSE,
                        save_data = FALSE,
                        saving_path = tempdir())
# disable multi-threading for example run; 
# not recommended for actual data processing
WGCNA::disableWGCNAThreads()
#>

To run the entire pipeline from beginning-to-end the run_pipeline() function can be used:

run_pipeline(layers=all_layers, 
             inter_layer_connections=all_inter_layer_connections, 
             drug_target_interactions=all_drug_target_interactions, 
             settings=example_settings)

Run the individual pipeline steps

The pipeline can also be used in a modular fashion. The modules then refer to the different steps:

Compute correlation matrices
Generate individual graphs
Combine graphs
Identify drug targets and their edges
Calculate integrated interaction score
Generate differential graph
Calculate differential drug response score

Step 1: Compute correlation matrices

In step one, correlation matrices are computed for the specified layers created above. The parameter correlation_method in drdimont_settings() can be set to “spearman” (default), “pearson”, or “kendall” as the correlation methods. The list of layers and the settings list are passed to compute_correlation_matrices().

To reduce runtime the following example will only analyze the first 10 genes and patients of the mRNA layer (to compute all layers set layers=all_layers in compute_correlation_matrices()):

reduced_mrna_layer <- make_layer(name="mrna",
                          data_groupA=t(mrna_data$groupA[1:10,2:11]),
                          data_groupB=t(mrna_data$groupB[1:10,2:11]),
                          identifiers_groupA=data.frame(gene_name=mrna_data$groupA$gene_name[1:10]),
                          identifiers_groupB=data.frame(gene_name=mrna_data$groupB$gene_name[1:10]))

example_correlation_matrices <- compute_correlation_matrices(
                                    layers=list(reduced_mrna_layer), 
                                    settings=example_settings)

The resulting data structure example_correlation_matrices is a nested named list with 3 levels containing the correlation matrices and annotation data frames. The first level are correlation_matrices and annotations. The correlation matrices are separated on the second level by group (groupA and groupB) and on the third level by layer name (e.g., mrna, protein, etc.). The annotations element contains annotations for each groups and both combined at the second level (groupA, groupB, and both). The third level consist of named data frames, for each layer one data frame, which contain the pipeline-internal mapping of the identifiers_groupA/identifiers_groupB data frames of the individual layers to layer specific node IDs.

Example:

# Data inspection
data("correlation_matrices_example")
correlation_matrices_example$annotations$groupA$protein[1:3, ]
#>   gene_name     ref_seq   node_id   layer
#> 1    COL6A3 NP_004360.2 protein_1 protein
#> 2    COL6A3 NP_476507.3 protein_2 protein
#> 3      PYGM NP_005600.1 protein_3 protein

Step 2: Generate individual graphs

Next, the individual graphs are generated. In this step edge weights are established based on the correlation computation and the edges are reduced by the specified reduction method. Reduction can be done based on maximizing scale-freeness employing WGCNA::pickHardThreshold (reduction_method="pickHardThreshold" in drdimont_settings(); default) or based on significance of the correlation (reduction_method="p_value"). Please call ?drdimont_settings() for more information on additional “pickHardThreshold” and “p_value” settings. With “pickHardThreshold” the networks can also be reduced to at most a given number of mean edges or a given density with the parameters mean_number_edges and edge_density respectively (default of both: NULL).

Run the following code to generate the individual graphs:

data("correlation_matrices_example")
example_individual_graphs <- generate_individual_graphs(
                                 correlation_matrices=correlation_matrices_example, 
                                 layers=all_layers, 
                                 settings=example_settings)

The resulting data structure example_individual_graphs is a nested named list with 3 levels containing the graphs and annotation data frames. The element graphs contains, similar to the correlation matrices, the two groups on the second level and the graphs as iGraphs objects for each molecular layer on the third level. The annotations element is a copy from the annotations element of the example_correlation_matrices data (see Step 1).

Step 3: Combine graphs

In this step, the individual graphs are combined to a single combined graph per group based on the inter-layer connections created above. The function creates the disjoint union of the individual graphs and adds inter-layer edges with the specified weight.

Run the following code to combine the individual layers:

example_combined_graphs <- generate_combined_graphs(
                               graphs=example_individual_graphs[["graphs"]], 
                               annotations=example_individual_graphs[["annotations"]], 
                               inter_layer_connections=all_inter_layer_connections, 
                               settings=example_settings)

The resulting data structure example_combined_graphs is a nested named list with 2 levels containing the combined graphs and a combined annotation data frame. The element graphs contains the two groups on the second level with the combined graphs as iGraphs objects. The annotations element consists of a data frame on the second level named both which contains the mapping of the identifier data frames of the individual layers to the layer specific node IDs for all layers together.

Example:

# Data inspection
example_combined_graphs$annotations$both[1:3, ]
#>   gene_name node_id layer ref_seq site_id biochemical_name metabolon_id
#> 1     ABCA1  mrna_1  mrna    <NA>    <NA>             <NA>           NA
#> 2  CACNA2D1  mrna_2  mrna    <NA>    <NA>             <NA>           NA
#> 3     CASP3  mrna_3  mrna    <NA>    <NA>             <NA>           NA
#>   pubchem_id
#> 1         NA
#> 2         NA
#> 3         NA

Step 4: Identify drug targets and their edges

Next, in order to extract the list of relevant drugs the drug targets are identified in the combined graph for each group. Here, the node IDs of the specified drug targets are found in the combined graph and the drugs are mapped to their target nodes. Additionally, edge lists are returned containing the incident edges of drug target nodes for which integrated interaction scores are computed in the next step.

Run the following code to extract the drug targets and their edges:

example_drug_target_edges <- determine_drug_targets(
                                 graphs=example_combined_graphs[["graphs"]], 
                                 annotations=example_combined_graphs[["annotations"]], 
                                 drug_target_interactions=all_drug_target_interactions, 
                                 settings=example_settings)

The resulting data structure example_drug_target_edges is a nested named list with 2 levels. The first level consists of the elements targets and edgelists. The element targets contains the data frame target_nodes and the dictionary-like list drugs_to_target_nodes. The data frame target_nodes contains the node IDs of the nodes that are drug targets and TRUE/FALSE values if they are present in the graph of each group. The list drugs_to_target_nodes maps the drugs to the node IDs of their targets. The element edgelists consists of a data frame for each of the groups (groupA and groupB) which, respectively, contain the incident edges of the drug targets and their weights (columns from, to and weight).

Step 5: Calculate integrated interaction score

In this step, the combined graphs for each group together with the edge list from drug_target_edges are used to calculate the integrated interaction scores for all edges incident to drug targets. The pipeline uses a Python script to compute the integrated interaction scores in the function generate_interaction_score_graphs(). The input data for the Python script (combined graphs for both groups in gml format and relevant edges lists for both groups in tsv format) are written to disk and the script is called to calculate the scores. Output files written by the Python script are two graphs in gml format containing the interaction score as an additional edge attribute called interactionweight. These are then loaded into R and returned in a named list containing the graphs for groupA and groupB, respectively.

ATTENTION : Data exchange via files is necessary and can take long for large data. Additionally, the interaction score computation can be slow. Therefore, do not set max_path_length in drdimont_settings() to a large value (default: 3). If max_path_length=1 then the integrated interaction scores will be the same as the correlation-based edge weights.

The Python script for integrated interaction score computation is parallelized using ray (https://www.ray.io/). Refer to the Ray documentation if you encounter problems with running the Python script in parallel. Use the setting int_score_mode="sequential" in drdimont_settings() for forced sequential computation or int_score_mode="ray" for parallel computation otherwise one of the two will be automatically chosen based on the size of the data.

Running Ray on a compute cluster : If you want to run DrDimont on a cluster, run ray start --head --num-cpus 12 (change number of CPUs as fit) on the command line, before starting the R session.

Run the following code to calculate the integrated interaction scores:

example_interaction_score_graphs <- generate_interaction_score_graphs(
                                        graphs=example_combined_graphs[["graphs"]], 
                                        drug_target_edgelists=example_drug_target_edges[["edgelists"]], 
                                        settings=example_settings)

Step 6: Generate differential graph

To generate a differential graph, the difference of the interaction scores between the two groups is computed by subtracting the values of the edge attributes of groupB from groupA (i.e., groupA - goupB). A single differential graph with differential_score and differential_interaction_score as edge attributes is returned. The edge attribute differential_score is the difference of the correlation-based edge weights and differential_interaction_score is the difference in integrated interaction scores. Missing edges in one of the two groups are set to zero before computing the difference. The differential integrated interaction score is set to NA if the integrated interaction score was not computed in both groups, i.e., for all edges not incident to a drug target.

data("interaction_score_graphs_example")
example_differential_graph <- generate_differential_score_graph(
                                  interaction_score_graphs=interaction_score_graphs_example, 
                                  settings=example_settings)

# if interaction score graphs have been computed use the following:
#example_differential_score_graph <- generate_differential_score_graph(
#                                        interaction_score_graphs=example_interaction_score_graphs, 
#                                        settings=example_settings)

Step 7: Calculate differential drug response score

In the last step, the differential drug response score is calculated based on the differential graph. The score of a drug is the mean (default) or the median of all differential integrated interaction scores of the edges incident to the drug targets. Drugs that have only targets without any edges have a NA as differential drug response. Only drugs with at least one target present in the network are analysed. The differential drug response score can be calculated in different ways: by computing the mean (default) or the median of the differential integrated interaction scores of the edges incident to a drug’s targets (set median_drug_response=TRUE in drdimont_settings() for median computation). The drug response score can also be computed from the differential (default) or the absolute differential integrated interaction scores (set absolute_difference=TRUE in drdimont_settings() to use absolute differential interaction scores). The drug response score is reported in absolute values.

example_drug_response_scores <- compute_drug_response_scores(
                                    differential_graph=example_differential_graph,
                                    drug_targets=example_drug_target_edges[["targets"]],
                                    settings=example_settings)

The first few lines of the resulting data frame are shown below. The data frame is saved as drug_response_score.tsv in the specified output folder (see Running the complete pipeline above). The drug response score is an indirect measure of how the strength of connectivity differs between the groups for the drug targets of the particular drug.

head(dplyr::filter(example_drug_response_scores, !is.na(drug_response_score)))
#>                     drug_name drug_response_score
#> 1 (7S)-HYDROXYL-STAUROSPORINE           0.1805916
#> 2                 ABEMACICLIB           0.2581332
#> 3               ACALABRUTINIB           0.2378911
#> 4                ADECATUMUMAB           0.5709812
#> 5                    AFATINIB           0.2581332
#> 6                     ALCOHOL           0.0000000

DrDimont: Drug Response Prediction from Differential Multi-Omics Networks

Katharina Baum, Pauline Hiort, Julian Hugo, Spoorthi Kashyap, Nataniel Mueller, and Justus Zeinert

2022-09-23

Introduction

Installation

Installation of Python and its dependencies

Example Data Set Description

Load the data

Transform the data to the required input format

Create individual layers data structure from the molecular data

Create inter-layer connections data structure

Create drug-target interaction data structure

Check input data structures

Run the complete pipeline

Run the individual pipeline steps

Step 1: Compute correlation matrices

Step 2: Generate individual graphs

Step 3: Combine graphs

Step 4: Identify drug targets and their edges

Step 5: Calculate integrated interaction score

Step 6: Generate differential graph

Step 7: Calculate differential drug response score

References