---
title: "Customising Histogram Plots with Formula Input"
author: "Tom Kelly"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{histoplot: Customising Histogram Plots with Formula Input}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

Since boxplots have become the _de facto_ standard for plotting the distribution of data most users are familiar with these and the formula input for dataframes. However this input is not available in the standard `histoplot` package. Thus it has been restored here for enhanced backwards compatibility with `boxplot`.

As shown below for the `iris` dataset, histogram plots show distribution information taking formula input that `boxplot` implements but `histoplot` is unable to. This demonstrates the customisation demonstrated in [the main histoplot vignette using histoplot syntax](histogram_customisation.html) with the formula method commonly used for `boxplot`, `t.test`, and `lm`.

```{r}
library("vioplot")
```

```{r, message=FALSE, eval=FALSE}
data(iris)
boxplot(Sepal.Length~Species, data = iris)
```

```{r, message=FALSE, echo=FALSE}
data(iris)
boxplot(Sepal.Length~Species, data = iris, main = "Sepal Length")
```

Whereas performing the same function does not work with `vioplot` (0.2).

```{r, message=FALSE, eval=FALSE}
devtools::install_version("vioplot", version = "0.2")
library("vioplot")
vioplot(Sepal.Length~Species, data = iris)
```

```
Error in min(data) : invalid 'type' (language) of argument
```

## Plot Defaults

```{r, message=FALSE, eval=FALSE}
vioplot(Sepal.Length~Species, data = iris)
```

```{r, message=FALSE, echo=FALSE}
vioplot(Sepal.Length~Species, data = iris, main = "Sepal Length", col="magenta")
```

Another concern we see here is that the `vioplot` defaults are not aesthetically pleasing, with a rather glaring colour scheme unsuitable for professional or academic usage. Thus the plot default colours have been changed as shown here:

```{r}
vioplot(Sepal.Length~Species, data = iris, main = "Sepal Length")
```

## Plot colours: Histogram Fill

Plot colours can be further customised as with the original vioplot package using the `col` argument:

```{r}
histoplot(Sepal.Length~Species, data = iris, main = "Sepal Length", col="lightblue")
```

### Vectorisation

However the `vioplot` (0.2) function is unable to colour each histogram separately, thus this is enabled with a vectorised `col` in `histoplot` (0.4):

```{r}
histoplot(Sepal.Length~Species, data = iris, main = "Sepal Length", col=c("lightgreen", "lightblue", "palevioletred"))
legend("topleft", legend=c("setosa", "versicolor", "virginica"), fill=c("lightgreen", "lightblue", "palevioletred"), cex = 0.5)
```


## Plot colours: Violin Lines and Boxplot

Colours can also be customised for the histogram fill and border separately using the `col` and `border` arguments:

```{r}
histoplot(Sepal.Length~Species, data = iris, main = "Sepal Length", col="lightblue", border="royalblue")
```

Similarly, the arguments `lineCol` and `rectCol` specify the colours of the boxplot outline and rectangle fill. For simplicity the box and whiskers of the boxplot will always have the same colour.

```{r}
histoplot(Sepal.Length~Species, data = iris, main = "Sepal Length", rectCol="palevioletred", lineCol="violetred")
```
 
 The same applies to the colour of the median point with `colMed`:
 
```{r}
histoplot(Sepal.Length~Species, data = iris, main = "Sepal Length", colMed="violet")
```
 
 ### Combined customisation
 
 These can be customised colours can be combined:
 
```{r}
histoplot(Sepal.Length~Species, data = iris, main = "Sepal Length", col="lightblue", border="royalblue", rectCol="palevioletred", lineCol="violetred", colMed="violet")
```
 
### Vectorisation

These colour and shape settings can also be customised separately for each histogram:
```{r}
histoplot(Sepal.Length~Species, data = iris, main="Sepal Length", col=c("lightgreen", "lightblue", "palevioletred"), border=c("darkolivegreen4", "royalblue4", "violetred4"), rectCol=c("forestgreen", "blue", "palevioletred3"), lineCol=c("darkolivegreen", "royalblue", "violetred4"), colMed=c("green", "cyan", "magenta"), pchMed=c(15, 17, 19))
```

## Split Bihistogram Plots

We set up the data with two categories (Sepal Width) as follows:

```{r, message=FALSE}
data(iris)
summary(iris$Sepal.Width)
table(iris$Sepal.Width > mean(iris$Sepal.Width))
iris_large <- iris[iris$Sepal.Width > mean(iris$Sepal.Width), ]
iris_small <- iris[iris$Sepal.Width <= mean(iris$Sepal.Width), ]
```

A direct comparision of 2 datasets can be made with the `side` argument and `add = TRUE` on the second plot:

```{r, fig.align = 'center', fig.height = 3, fig.width = 6, fig.keep = 'last'}
histoplot(Sepal.Length~Species, data=iris_large, col = "palevioletred", plotCentre = "line", side = "right")
histoplot(Sepal.Length~Species, data=iris_small, col = "lightblue", plotCentre = "line", side = "left", add = T)
title(xlab = "Species", ylab = "Sepal Length")
legend("topleft", fill = c("lightblue", "palevioletred"), legend = c("small", "large"), title = "Sepal Width")
```