---
title: "User Guide: 2 Manipulating ggplots"
subtitle: "'gginnards' `r packageVersion('gginnards')`"
author: "Pedro J. Aphalo"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: yes
vignette: >
  %\VignetteIndexEntry{User Guide: 2 Manipulating ggplots}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include=FALSE, echo=FALSE}
library(knitr)
opts_chunk$set(fig.align = 'center', 
               fig.show = 'hold', fig.width = 7, fig.height = 4)
options(warnPartialMatchArgs = FALSE)
```

## Introduction

The functions described here are not expected to be useful in everyday plotting
as when using the _grammar of graphics_ one can simply change the order in which
layers are added to a ggplot, or remove unused variables from the data before
passing it as argument to the `ggplot()` constructor.

However, if one uses high level methods like `autoplot()` or other functions
that automatically produce a full plot using 'ggplot2' internally, one may need
to add, move or delete layers so as to profit from such canned methods and
retain enough flexibility.

Some time ago I needed to manipulate the layers of a `ggplot`, and found a
[matching question in
Stackoverflow](https://stackoverflow.com/questions/13407236/remove-a-layer-from-a-ggplot2-chart).
I used the answers found in Stackoverflow as the starting point for writing the
functions described in the first part of this vignette.

In a `ggplot` object, layers reside in a list, and their positions in the list
determine the plotting order when generating the graphical output. The _grammar
of graphics_ treats the list of layers as a _stack_ using only _push_
operations. In other words, always the most recently added layer resides at the
end of the list, and during rendering over-plots all layers previously added.
The functions described in this vignette allow overriding the **normal** syntax
at the cost of breaking the expectations of the grammar. These functions are, as
told above, to be used only in exceptional cases. This notwithstanding, they are
rather easy to use and the user interface is consistent across all of them.
Moreover, they are designed to return objects that are identical to objects
created using the normal syntax rules of the _grammar of graphics_. The table
below list the names and purpose of these functions.

Function | Use                     
-------- | -------------------------------------
`delete_layers()` | delete one or more layers 
`append_layers()` | append layers at a specific position 
`move_layers()`   | move layers to an absolute position
`shift_layers()`  | move layers to a relative position
`which_layers()`  | obtain the index positions of layers
`extract_layers()`   | extract matched or indexed layers
`num_layers()`    | obtain number of layers
`top_layer()`    | obtain position of top layer
`bottom_layer()`    | obtain position of bottom layer

Although their definitions do not rely on code internal to 'ggplot2', they rely
on the internal structure of objects belonging to class `gg` and `ggplot`.
Consequently, long-term backwards and forward compatibility cannot be
guaranteed, or even expected.

## Preliminaries

```{r}
library(ggplot2)
library(gginnards)
library(tibble)
library(magrittr)
library(stringr)
eval_pryr <- requireNamespace("pryr", quietly = TRUE)
```

We generate some artificial data and create a data frame with them.

```{r}
set.seed(4321)
# generate artificial data
my.data <- data.frame(
  group = factor(rep(letters[1:4], each = 30)),
  panel = factor(rep(LETTERS[1:2], each = 60)),
  y = rnorm(40),
  unused = "garbage"
)
```

We add attributes to the data frame with the fake data.

```{r}
attr(my.data, "my.atr.char") <- "my.atr.value"
attr(my.data, "my.atr.num") <- 12345678
```

We change the default theme to an uncluttered one.

```{r}
old_theme <- theme_set(theme_bw())
```

We generate a plot to be used later to demonstrate the use of the functions. We
ue `expand_limits()` to ensure that the effect of later manipulations is easier
to notice.

```{r}
p <- ggplot(my.data, aes(group, y)) + 
  geom_point() +
  stat_summary(fun.data = mean_se, colour = "cornflowerblue", size = 1) +
  facet_wrap(~panel, scales = "free_x", labeller = label_both) +
  expand_limits(y = c(-2, 2))
p
```

## Exploring how ggplots are stored

To display summary textual information about a `gg` object we use method
`summary()` from package 'ggplot2', while methods `print()` and `plot()` will
display the actual plot.

```{r}
summary(p)
```

Layers in a ggplot object are stored in a list as nameless members. This means
that they have to be accessed using numerical indexes, and that we need to use
some indirect way of finding the indexes corresponding to the layers of
interest.

```{r}
names(p$layers)
```

The output of `summary()` is compact.

```{r}
summary(p$layers)
```

The default `print()` method for a list of layers displays only a small part of 
the information in a layer.

```{r}
print(p$layers)
```

To see all the fields, we need to use `str()`, which we use here for a single
layer.

```{r}
str(p$layers[[1]])
```

## Manipulation of plot layers

We start by using `which_layers()` as it produces simply a vector of indexes
into the list of layers. The third statement is useless here, but demonstrates
how layers are selected in all the functions described in this document. We can
see that each layer, as described in the first volume of this User Guide,
contains one geometry and one statistic.

```{r}
which_layers(p, "GeomPoint")
which_layers(p, "StatIdentity")
which_layers(p, "GeomPointrange")
which_layers(p, "StatSummary")
which_layers(p, idx = 1L)
```

We can also easily extract matching layers with `extract_layers()`. Here one
layer is returned, and displayed using the default `print()` method. Method
`str()` can also be used as shown above.

```{r}
extract_layers(p, "GeomPoint")
```

With `delete_layers()` we can remove layers from a plot, selecting them using
the match to a class, as shown here, or by a positional index as shown next.

```{r}
delete_layers(p, "GeomPoint")
```

```{r}
delete_layers(p, idx = 1L)
```

```{r}
delete_layers(p, "StatSummary")
```

With `move_layers()` we can alter the stacking order of layers. The layers to
move are selected in the same way as in the examples above, while `position`
gives where to move the layers to. Two character strings, `"top"` and `"bottom"`
are accepted as `position` argument, as well as `integer`s. In the later case, 
the layer(s) is/are appended after the supplied position with reference to the 
list of layers not being moved. 

```{r}
move_layers(p, "GeomPoint", position = "top")
```

The equivalent operation using a relative position. A positive value for `shift`
is interpreted as an upward displacement and a negative one as downwards
displacement.

```{r}
shift_layers(p, "GeomPoint", shift = +1)
```

Here we show how to add a layer behind all other layers.

```{r}
append_layers(p, geom_line(colour = "orange", size = 1), position = "bottom")

```

It is also possible to append the new layer immediately above an arbitrary
existing layer using a numeric index, which as shown here can be also obtained
by matching to a class name. In this example we insert a new layer in-between
two layers already present in the plot. As with the `+` operator of the Grammar
of Graphics, `object` also accepts a list of layers as argument (no example
shown).

```{r}
append_layers(p, object = geom_line(colour = "orange", size = 1), 
              position = which_layers(p, "GeomPoint"))
```

Annotations add layers, so they can be manipulated in the same way as other
layers.

```{r}
p1 <- p + 
  annotate("text", label = "text label", x = 1.1, y = 0, hjust = 0)
p1
```

```{r}
delete_layers(p1, "GeomText")
```

## Replacing scales, coordinates, whole themes and data.

Elements that are normally _added_ to a ggplot with operator 
`+`, such as scales, themes, aesthetics can be replaced with the `%+%` operator.
The situation with layers is different as a plot may contain multiple layers
and layers are nameless. With layers `%+%` is not a replacement operator.

```{r}
num_layers(p)
num_layers(p %+% geom_point(colour = "blue"))
num_layers(p + geom_point(colour = "blue"))
```

```{r}
p1 <- p + theme_bw()
p1
p1 + theme_void()
p1 %+% theme_void()
```

## Editing theme elements

Method `summary()` is available for themes.

```{r,eval=FALSE}
summary(theme_bw())
```

However, to see the actual values stored, we need to use `str()`. To avoid
excessive output we first find the names for the elements of the theme and then
look as how the default text settings are stored.

```{r, eval=FALSE}
names(theme_bw())
```

```{r, eval=FALSE}
str(theme_bw()$text)
```

Themes can be modified using `theme()`. See the 'ggplot2' documentation for
details.

## Removing unused data

The argument passed through `data` to `ggplot()` or a layer is stored in whole
in the `ggplot` object, even the data columns not mapped to any aesthetic. In
most cases this does not matter, but in the case of huge datasets, the use of
RAM and disk space can add up, and occasionally printing of each plot can slow
down. The reason for storing the whole data set is that it is always possible to
add layers with the grammar of graphics to an existing plot and consequently
only the user can know which variables can be removed or not.

One obvious way of not storing unused data in `ggplot` objects is for the user
to select the required variables and pass only these to the `ggplot()`
constructor or layers. A less efficient alternative, but possibly easier to use
for some users, is for users to drop the unused variables when they consider
that a plot is ready. We show here how to do this, with a function that started
as a self-imposed exercise.

To simplify the embedded data objects we need to find which variables are mapped
to aesthetics and which are not. Here is a naive attempt at handling the
possibility of mappings to expressions involving computations and multiple
variables per mapping, and facets. This is naive in that it ignores mapping
within layers and variables used for faceting.

```{r}
mapped.vars <- 
  gsub("[~*\\%^]", " ", as.character(p$mapping)) %>%
  str_split(boundary("word")) %>%
  unlist() %>%
  c(names(p$facet$params$facets))
```

We need also to find which variables are present in the data.

```{r}
data.vars <- names(p$data)
```

Next we identify which variables in `data` are not used, and delete them.

```{r}
unused.vars <- setdiff(data.vars, c(mapped.vars))
keep.idxs <- which(!data.vars %in% unused.vars)
```

```{r}
p1 <- p
p1$data <- p$data[ , keep.idxs]
```

For a data set this small, removing a single column saves very little space.

```{r}
object.size(my.data)
object.size(p)
object.size(p1)
```
```{r}
names(my.data)
names(p$data)
names(p1$data)
```
The plot has not changed.

```{r}
p1
```

We can assemble all the code into a function for convenience, and expand the
code to also recognize mappings within layers and variables used in faceting.
Such a function, only cursorily tested is included in the package as
`drop_vars()`. Given its design the most likely failure mode is keeping too many
variables rather than removing too many.

```{r}
drop_vars(p)
```

When saving `ggplot` objects to disk avoiding to carry along unused
data can be beneficial. Of course, removing unused data means that they will not
be available at a later time if we want to add more layers to the same saved
ggplot object.

It was not clear to me when R does make a copy of the data embedded in a
`ggplot` object and when not. R's policy is to copy data objects lazily, or
only when modified. Does the 'ggplot2' code modify the
argument passed to its `data` parameter triggering a real copy operation or not.
We can check this with the help of package 'pryr'.

```{r, eval = eval_pryr}
pryr::address(my.data)
z <- p$data
pryr::address(z)
```

In this case, R has not created a copy. So, from the point of view of total
memory usage, deleting the unused columns in `p` is not always beneficial. If
the object is saved to disk or `my.data` modified in any way after `p` was
created a copy of `my.data` will be created at this later time. In this simple
example we modify the value of an attribute.

```{r, eval = eval_pryr}
attr(my.data, "my.atr.num") <- 1324567
pryr::address(z)
pryr::address(my.data)
```

## Attributes of the embedded data object

'ggplot2' version 3.1.0 and later preserves most attributes of the object passed
as argument to the data parameter of the `ggplot()` constructor. The class of
the object seems to be modified if it is derived from data frame or tibble, but
other attributes are retained in the copy stored in the `gg` object.

```{r}
data_attributes(p)
```

Another interesting question is whether these user attributes are copied when
data are passed to geometries and statistics. We can find out with
`geom_debug_panel()` that they are not.

```{r}
p + geom_debug_panel(dbgfun.data = attributes, dbgfun.params = NULL)
```

## Coda

The are many other things that we could explore about ggplot objects, but a
package to be submitted to CRAN cannot have too many pages of documentation, so
we hope this package and its documentation can serve as a starting point for
further exploration.