---
title: "Applying Functions"
output: rmarkdown::html_vignette
description: >
  What you need to know to apply functions to matrices in the matrixset object,
  in the spirit of the apply() function.
vignette: >
  %\VignetteIndexEntry{Applying Functions}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(matrixset)
```



```{r, echo=FALSE, message=FALSE}
library(tidyverse)
animals <- as.matrix(MASS::Animals)
log_animals <- log(animals)
animal_info <- MASS::Animals %>% 
  rownames_to_column("Animal") %>% 
  mutate(is_extinct = case_when(Animal %in% c("Dipliodocus", "Triceratops", "Brachiosaurus") ~ TRUE,
                                TRUE ~ FALSE),
         class = case_when(Animal %in% c("Mountain beaver", "Guinea pig", "Golden hamster", "Mouse", "Rabbit", "Rat") ~ "Rodent",
                           Animal %in% c("Potar monkey", "Gorilla", "Human", "Rhesus monkey", "Chimpanzee") ~ "Primate",
                           Animal %in% c("Cow", "Goat", "Giraffe", "Sheep") ~ "Ruminant",
                           Animal %in% c("Asian elephant", "African elephant") ~ "Elephantidae",
                           Animal %in% c("Grey wolf") ~ "Canine",
                           Animal %in% c("Cat", "Jaguar") ~ "Feline",
                           Animal %in% c("Donkey", "Horse") ~ "Equidae",
                           Animal == "Pig" ~ "Sus",
                           Animal == "Mole" ~ "Talpidae",
                           Animal == "Kangaroo" ~ "Macropodidae",
                           TRUE ~ "Dinosaurs")) %>% 
  select(-body, -brain)
animals_ms <- matrixset(msr = animals, log_msr = log_animals, row_info = animal_info,
                row_key = "Animal")
animals_ms <- animals_ms %>% 
  annotate_column(unit = case_when(.colname == "body" ~ "kg",
                                   TRUE ~ "g")) 
```



## Apply Functions to `matrixset` Matrices

There are two ways to apply functions to the matrices of a `matrixset` object.
The first one is through the `apply_*` family, which will be covered here.

The second is through `mutate_matrix()`, covered in the next section.

There are 3 functions in the `apply_*` family:

  * `apply_matrix()`: The functions must take a matrix as input. In base R, this
    is similar to simply calling `fun(matrix_object)`.
  * `apply_row()`: The functions must take a vector as input. The vector will be
    a matrix row. In base R, this is akin to 
    `apply(matrix_object, 1, fun, simplify = FALSE)`.
  * `apply_column()`: The functions must take a vector as input. The vector will 
    be a matrix column. In base R, this is similar to 
    `apply(matrix_object, 2, fun, simplify = FALSE)`.

Each of these function will loop on the `matrixset` object's matrices to apply
the functions. In the case of `apply_row()` and `apply_column()`, an additional
loop on the margin (row or column, as applicable) is executed, so that the
functions are applied to each matrix and margin.

To see the functions in action, we will use the following object:
```{r}
animals_ms
```


We will use the following custom printing functions for compactness purposes.
```{r}
show_matrix <- function(x) {
  if (nrow(x) > 4) {
    newx <- head(x, 4)
    storage.mode(newx) <- "character"
    newx <- rbind(newx, rep("...", ncol(x)))
  }  else newx <- x
  newx
}
show_vector <- function(x) {
  newx <- if (length(x) > 4) {
    c(as.character(x[1:4]), "...")
  } else x
  newx
}
show_lst <- function(x) {
  lapply(x, function(u) {
    if (is.matrix(u)) show_matrix(u) else if (is.vector(u)) show_vector(u) else u
  })
}
```


So now, let's see the `apply_matrix()` in action.
```{r, message=FALSE}
library(magrittr)
library(purrr)
out <- animals_ms %>% 
   apply_matrix(exp,
                ~ mean(.m, trim=.1),
                foo=asinh,
                pow = ~ 2^.m,
                reg = ~ {
                  is_alive <- !is_extinct
                  lm(.m ~ is_alive + class)
                  })
# out[[1]] %>% map(~ if (is.matrix(.x)) {head(.x, 5)} else .x)
show_lst(out[[1]])
```

We have showcased several features of the `apply_*` functions:

  * Many functions can be supplied at once.
  * Functions can be supplied by bare function name. Note that this mean that
    only the current matrix will be used as input. You can also supply the 
    function call or even, when submitting a one-sided formula, complex 
    expressions.
  * The name of the function result can be controlled.
  * Object traits are accessible by bare name. The condition for this to work
    is that traits are unique across row and column annotation.
    
You probably have noticed the use of `.m`. This is a pronoun that is accessible
inside `apply_matrix()` and refers to the current matrix in the internal loop.
Similar pronouns exists for `apply_row()` and `apply_column()`, and they are
respecticely `.i` and `.j`.

The returned object is a list of lists. The first layer is for each matrix and
the second layer is for each function call.

Let's now showcase the row/column version with a `apply_column()` example:
```{r}
out <- animals_ms %>% 
   apply_column(exp,
                ~ mean(.j, trim=.1),
                foo=asinh,
                pow = ~ 2^.j,
                reg = ~ {
                  is_alive <- !is_extinct
                  lm(.j ~ is_alive + class)
                  })
out[[1]] %>% map(show_lst)
```

The idea is similar, but in the returned object, there is a third list layer:
the first layer for the matrices, the second layer for the columns (it would
be rows for `apply_row()`) and the third layer for the functions.

Note as well the use of the `.j` pronoun instead of `.m`.

### Grouped Data

The `apply_*` functions understand data grouping and will execute on the proper
matrix/vector subsets.

```{r}
animals_ms %>% 
  row_group_by(class) %>% 
  apply_matrix(exp,
               ~ mean(.m, trim=.1),
               foo=asinh,
               pow = ~ 2^.m,
               reg = ~ {
                 is_alive <- !is_extinct
                 lm(.m ~ is_alive)
                 })
```

As one can see, the output format differs in situation of grouping. We still end
up with a list with an element for each matrix, but each of these element is now
a `tibble`.

Each tibble has a column called `.vals`, where the function results are stored. 
This column is a list, one element per group. The group labels are given by the 
other columns of the tibble. For a given group, things are like the ungrouped 
version: further sub-lists for rows/columns - if applicable - and function 
values.

### Simplified Results

Similar to the `apply()` function that has a `simplify` argument, the output
structured can be simplified, baring two conditions:

  * Each function returns a vector, where a vector is every object for which
    `is.vector` returns `TRUE`.
  * Each vector must be of the same length $\geq$ 1.
  
If the conditions are met, each `apply_*` function has two simplified version
available: `_dfl` and `dfw`.

Below is the `_dfl` flavor in action. We point out two things to notice:

  * For `apply_column_dfl` (and `_dfw`), a `.column` column stores the column ID
    (`.row` for `apply_row_*`).
  * We wrapped the `lm` result in a `list` so that the outcome is vector.

```{r}
animals_ms %>% 
    apply_matrix_dfl(~ mean(.m, trim=.1),
                     MAD=mad,
                     reg = ~ {
                         is_alive <- !is_extinct
                         list(lm(.m ~ is_alive + class))
                     })
```

```{r}
animals_ms %>% 
    apply_column_dfl(~ mean(.j, trim=.1),
                     MAD=mad,
                     reg = ~ {
                         is_alive <- !is_extinct
                         list(lm(.j ~ is_alive + class))
                     })
```

If using `apply_column_dfw` in this context, you wouldn't notice a difference in
output format.

The difference between the two lies when the vectors are of length > 1.

```{r}
animals_ms %>% 
    apply_row_dfl(rg = ~ range(.i),
                  qt = ~ quantile(.i, probs = c(.25, .75)))   
```

```{r}
animals_ms %>% 
    apply_row_dfw(rg = ~ range(.i),
                  qt = ~ quantile(.i, probs = c(.25, .75)))   
```

We can observe three things:

  1. df**l** stands for *long* and stacks the elements of the function output
    into different rows, adding a column to identify the different elements.
  2. df**w** stands for *wide* and put the elements of the function output
    into different columns.
  3. Element names are made unique if necessary.


### Knowing the current context

It may happen that you need to get information about the current group. For this
reason, the following context functions are made available:

 * `current_n_row()` and `current_n_column()`. They each give the number of rows
     and columns, respectively, of the current matrix.

     They are the context equivalent of `nrow()` and `ncol()`.

 * `current_row_info()` and `current_column_info()`. They give access to the
     current row/column annotation data frame. The are the context equivlent
     of `row_info()` and `column_info()`.

 * `row_pos()` and `column_pos()`. They give the current row/column indices.
     The indices are the the ones before matrix subsetting.

 * `row_rel_pos()` and `column_rel_pos()`. They give the row/column indices
     relative to the current matrix. They are equivalent to
     `seq_len(current_n_row())`/`seq_len(current_n_column())`.

For instance, a simple way of knowing the number of animals per group could be
```{r}
animals_ms %>% 
    row_group_by(class) %>% 
    apply_matrix_dfl(n = ~ current_n_row()) %>% 
    .$msr
```

### With common row and column annotation trait

The context functions can also be of use when one or more traits are shared
(in name) between rows and columns.

Here's a pseudo-code example:

```{r}
# ms_object %>% 
#     apply_matrix( ~ {
#       ctrt <- current_column_info()$common_trait
#       rtrt <- current_row_info()$common_trait
#       
#       do something with ctrt and rtrt
#     })
```

### Pronouns, or dealing with ambiguous variables

It may happen that a variable in the calling environment shares its name with
a trait of a `matrixset` object.

You can make it explicit which version of the variable you are using the pronouns
`.data` (the trait annotation version) and `.env`.

## Quasi quotation

```{r}
reg_expr <- expr({
    is_alive <- !is_extinct
    list(lm(.j ~ is_alive + class))
})

animals_ms %>% 
    apply_column_dfl(~ mean(.j, trim=.1),
                     MAD=mad,
                     reg = ~ !!reg_expr)
```

## Multivariate

# mutate_matrix