---
title: "User Guide: 1 Debugging ggplots"
subtitle: "'gginnards' `r packageVersion('gginnards')`"
author: "Pedro J. Aphalo"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: yes
vignette: >
  %\VignetteIndexEntry{User Guide: 1 Debugging ggplots}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include=FALSE, echo=FALSE}
library(knitr)
opts_chunk$set(fig.align = 'center', 
               fig.show = 'hold', fig.width = 7, fig.height = 4)
options(warnPartialMatchArgs = FALSE)
```

## Preliminaries

```{r}
library(gginnards)
library(tibble)
```

We generate some artificial data.

```{r}
set.seed(4321)
# generate artificial data
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x, 
                      y, 
                      group = c("A", "B"), 
                      y2 = y * c(0.5, 2),
                      block = c("a", "a", "b", "b"))
```

We change the default theme to an uncluttered one.

```{r}
old_theme <- theme_set(theme_bw())
```

# ggplot construction

Package 'ggplot2' defines its own class system, and function `ggplot()` can be
considered as a constructor.

```{r}
class(ggplot())
```

These objects contain all the information needed to render a plot into
graphical output, but not the rendered plot itself. They are list-like
objects with heterogeneous named members.

The structure of objects of class `"ggplot"` can be explored with R's
method `str()` as is the case for any structured R object. Package 'gginnards'
defines a a specialization of  `str()` for class `"ggplot"`. Our `str()` allows
us to see the different _slots_ of these special type of lists. The difference
with the default `str()` method is in the values of default arguments, and in
the ability to control which components or members are displayed. 

We will use
the `str()` to follow the step by step construction of a `"ggplot"` object.

If we pass no arguments to the `ggplot()` constructor an empty plot will be
rendered if we print it.

```{r}
p0 <- ggplot()
p0
```

Object `p` contains members, but `"data"`, `"layers"`, `"mapping"`, `"theme"`
 and `"labels"` are empty lists.

```{r}
str(p0)
```

If we pass an argument to parameter `data` the data is copied into the
list slot with name `data`. As we also map the data to aesthetics, this
mapping is stored in slot `maaping`. 

```{r}
p1 <- ggplot(data = my.data, aes(x, y, colour = group))
str(p1)
```

```{r}
str(p1, max.level = 2, components = "data")
```

A geometry adds a layer.

```{r}
p2 <- p1 + geom_point()
str(p2)
```

A `summary()` method that produces a more compact output is available in recent
versions of 'ggplot2'. However, it does not reveal the internal structure of
the objects.

```{r}
summary(p2)
```

```{r}
str(p2, max.level = 2, components = "mapping")
```

```{r}
p3 <- p2 + theme_classic()
str(p3)
```

Themes are stored as nested lists. To keep the output short we use 
`max.level = 2` although using `max.level = 3` would be needed to see all
nested members.

```{r}
str(p3, max.level = 2, components = "theme")
```

# Data mappings in ggplots

How does mapping work? _Geometries_ (geoms) and _statistics_ (stats) do not "see"
the original variable names, instead the `data` passed to them is named according
to the _aesthetics_ user variables are mapped to. Geoms and stats work in
tandem, with geoms doing the actual plotting and stats summarizing or 
transforming the data. It can be instructive to be able to see what data is
received as input by a geom or stat, and what data is returned by a stat.

Both geoms and stats can have either panel- or group functions. Panel functions
receive as input the subset of the data that corresponds to a whole panel,
mapped to the aesthetics and with factors indicating the grouping (set by
the user by mapping to a discrete scale). Group functions receive as input
the subset of data corresponding to a single group based on the mapping, and
called once for each group present in a panel.

The motivation for writing the "debug" stats and geoms included in package
'gginnards' is that at the moment it is in many cases not possible to set
breakpoints inside the code of stats and geoms, because frequently nameless
panel and group functions are stored within list-like `"ggplot"` objects 
as seen above.

This can make it tedious to analyse how these functions work, as one may need to
add `print` statements to their definitions to see the data. I wrote the "debug"
stats and geoms as tools to help in the development of my packages 'ggpmisc' and
'ggspectra', and as a way of learning myself how data are passed around within
the different components of a `ggplot` object when it is printed.

# Data input to geometries

Data pass through a statistics before being received by a geometry. However,
many geometries, like `geom_point()` and `geom_line()` use by default
`stat_identity()` which simply relays the unmodified data to the geometries.

The _debug_ geometries and statistics in package 'gginnards' by default do
not add any graphical element to the plot but instead they make visible the
`data` as received as their input.

The geometry `geom_debug_panel()` uses `stat_identity()` by default. Here the same
data as rendered by `geom_point()` is printed as a tibble to the R console.
We can see that the columns are named according to the aesthetics the variables
in the user-supplied data have been mapped. In the case of colour, the levels
of the factor have been replaced by colour definitions. Columns `PANEL` and
`group` have been also added.

```{r}
ggplot(mpg, aes(cyl, hwy, colour = factor(cyl))) + 
  geom_point() +
  geom_debug_panel()
```

Below we show how `geom_debug_panel()` can be used together with functions that take
a data frame as input and return a value that can be printed. We use here 
`head()` but other functions such `summary()`, `nrow()` and `colnames()` as
well as user defined functions can be useful when `data` is large. As shown
here, additional arguments can be passed by name to the function.

```{r}
ggplot(my.data, aes(x, y, colour = group)) + 
  geom_point() + 
  geom_debug_panel(dbgfun.data = head, dbgfun.data.args = list(n = 3))
```

When using a statistic that modifies the data, we can pass `geom_debug_panel()` as
argument in the call to this statistic. In this way the data printed to the 
console will be those returned by the statistics and received by the geometry.

```{r}
ggplot(mpg, aes(cyl, hwy)) +
  stat_summary(fun.data = "mean_se") +
  stat_summary(fun.data = "mean_se", geom = "debug_panel") 
```

As shown above an important use of `geom_debug_panel()` it to display the data returned by a statistic and received as input by geometries. Not all extensions to 'ggplot2' document all the computed variables returned by statistics. In other cases like in the next example, the values returned will depend on the arguments passed. While in the previous example the statistic returned a data frame with one row per group, here the returned data frame has 160 rows. The data are by default plotted as a line with a confidence band.

```{r}
ggplot(my.data, aes(x, y, colour = group)) + 
  geom_point() + 
  stat_smooth(method = "lm", formula = y ~ poly(x, 2)) +
  stat_smooth(method = "lm", formula = y ~ poly(x, 2), 
              geom = "debug_panel", dbgfun.data = head)
```

# Data input to statistics

Statistics can be defined to operate on data corresponding to a whole panel or
separately on data corresponding to each individual group, as created by mapping
aesthetics to factors. The statistics described below print a summary of their
`data` input by default to the console. These statistics, in addition return a
data frame containing summary information including `labels` suitable for
"plotting" with `geom = "text"` or `geom = "label"`. However, package
'gginnards' defines a "null" geom, `geom_null()`, which is used as default by
the _debug_ statistics. This geom is similar to the more recently added
`ggplot2::geom_blank()`.

```{r}
ggplot(my.data, aes(x, y)) + 
  geom_null()
```

Using geom "null" allows to add the debug _stats_ for the side effect
of console output without altering the rendering of the plot when there is at
least one other plot layer. The default geom `"null"` does not alter the
rendering of the plot or print to the console the `data` output by the
debug stats.

Because of the way 'ggplot2' works, the values are listed to the console at the
time when the `ggplot` object is printed. As shown here, no other geom or stat
is required, however in the remaining examples we add `geom_point()` to make the
data also visible in the plot. 

```{r}
ggplot(my.data, aes(x, y)) + 
  stat_debug_group()
```
 
In the absence of facets or groups we get the printout of a single data 
frame, which is similar to that returned by `geom_debug_panel()`. Without grouping,
group is set to `-1` for all observations. As the we override the default geom with `geom_debug_panel()`
a summary computed by the stat is also printed to the console.

```{r}
ggplot(my.data, aes(x, y)) + 
  geom_point() + 
  stat_debug_group(geom = "debug_panel")
```

In a plot with no grouping, there is no difference in the `data` input for
`compute_panel()` and `compute_group()` functions except for the order of the
variables or columns in the data frame (this applies in general to ggplot
statistics).

```{r}
ggplot(my.data, aes(x, y)) + 
  geom_point() + 
  stat_debug_panel()
```

By mapping the `colour` aesthetic we create a grouping. In the case,
`compute_group()` is called with the data subset by group, and a separate
data frame is displayed for each call `compute_group()`, corresponding each to a
level in the mapped factor. In this case `group` takes as values positive 
consecutive integers. As a factor was mapped to colour, colour is encoded as
a factor.

```{r}
ggplot(my.data, aes(x, y, colour = group)) + 
  geom_point() + 
  stat_debug_group()
```

Without facets, we still have only one panel.

```{r}
ggplot(my.data, aes(x, y, colour = group)) + 
  geom_point() + 
  stat_debug_panel()
```

When we map the same factor to a different aesthetic the data remain similar, 
except for the column named after the aesthetic, in this case `shape`.

```{r}
ggplot(my.data, aes(x, y, shape = group)) + 
  geom_point() + 
  stat_debug_group()
```

Facets based on factors create panels within a plot. Here we create a plot with both facets and grouping. In this case, for each _panel_ the `compute_panel()` function is called once with a subset of the data that corresponds to one panel, but not split by groups. For our example, it is called twice.

```{r}
ggplot(my.data, aes(x, y, colour = group)) + 
  geom_point() + 
  stat_debug_panel(dbgfun.data = "nrow") +
  facet_wrap(~block)
```

with grouping and facets, within each _panel_ the `compute_group()` function is called for each group, in total four times.

```{r}
ggplot(my.data, aes(x, y, colour = group)) + 
  geom_point() + 
  stat_debug_group(dbgfun.data = "nrow") +
  facet_wrap(~block)
```

# Controlling the debug output

In the examples above we have demonstrated the use of the statistics and geometries
using default arguments. Here we show examples of generation of other types of
debug output.

`stat_debug_group()` and `stat_debug_panel()` return summary data that can be 
inspected using a geometry in addition to printing the data received as argument. 
If we use `geom_debug_panel()` a summary is printed to the console. With two groups,
we get two summaries when we use `stat_debug_group()`.

```{r}
ggplot(my.data, aes(x, y, shape = group)) + 
  geom_point() + 
  stat_debug_group(geom = "debug_panel")
```

If we use `stat_debug_panel()` we get a single summary.

```{r}
ggplot(my.data, aes(x, y, shape = group)) + 
  geom_point() + 
  stat_debug_panel(geom = "debug_panel")
```

In principle one can use other _geoms_ to annotate the plot with the debug summary.
In this case we silence all output to the R console and use the stat as any other
ggplot stat.

```{r}
ggplot(my.data, aes(x, y, colour = group)) + 
  geom_point() + 
  stat_debug_group(geom = "text",
                   mapping = aes(label = sprintf("group = %i", 
                                                 after_stat(group))),
                   dbgfun.data = function(x) {NULL})
```