---
title: "Data sets in the heplots package"
author: Michael Friendly
date: "`r Sys.Date()`"
package: heplots
output: 
  bookdown::html_document2:
  base_format: rmarkdown::html_vignette
fig_caption: yes
toc: true
pkgdown:
  as_is: true
bibliography: "HE-examples.bib"
link-citations: yes
csl: apa.csl
vignette: >
  %\VignetteIndexEntry{Data sets in the heplots package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  message = FALSE,
  warning = FALSE,
  fig.height=5,
  fig.width=5,
  # results='hide',
  # fig.keep='none',
  fig.path='fig/datasets-',
  echo=TRUE,
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, echo=FALSE}
set.seed(1071)
options(width=80, digits=5, continue="  ")
library(heplots)
library(candisc)
library(ggplot2)
library(dplyr)
```

## Documenting package datasets {-}
Datasets used in package examples are such an important part of making a package understandable and usable, but is often overlooked.
In developing the `heplots` package I collected a large collection of data sets illustrating a
variety of multivariate linear models with some an analyses, and graphical displays. Each of these have much more than the
usual stub examples, that often look like:

```{r eval=FALSE}
data(dataset)
# str(dataset); plot(dataset)
```

But `.Rd`, and now `roxygen`, don't make it easy to work with numerous datasets in a package, or, more importantly, to document what they illustrate. I'm showing the work to create this vignette, in case these ideas are useful to others.

In this release, I started with a file generated by:

```{r}
vcdExtra::datasets("heplots") |> head(4)
```

Then, in the roxygen documentation, I added `@concept` tags to classify these datasets according to methods used.
(`@concept` entries are indexed with the package, so they work via `help.search()`)
For example,
the documentation for the `AddHealth` data contains these lines:

```{r eval=FALSE}
#' @name AddHealth
#' @docType data
 ...
#' @keywords datasets
#' @concept MANOVA
#' @concept ordered
```

With standard
processing, these concepts along with the keywords, appear in the **Index** section of the manual constructed by `devtools::build_manual()`. In the `pkgdown`
site for this package, they are also searchable in the **search** box.

With a bit of extra processing, I created a dataset [datasets.csv](https://raw.githubusercontent.com/friendly/heplots/master/extra/datasets.csv)
used below.

## Methods {-}
The main methods used in the example datasets are shown in the table below:

* **MANOVA**: Multivariate analysis of variance
* **MANCOVA**: Multivariate of covariance
* **MMRA**: Multivariate multiple regression
* **cancor**: Canonical correlation (using the [candisc](https://github.com/friendly/candisc/) package)
* **candisc**: Canonical discriminant analysis (using [candisc](https://github.com/friendly/candisc/))
* **repeated**: Repeated measures designs, analyzed from the multivariate perspective
* **robust**: Robust estimation of MLMs

In addition, a few examples illustrate special handling for
linear hypotheses concerning factors:

* **ordered**: ordered factors
* **contrasts**: other contrasts

The dataset names are linked to the documentation with graphical output on the
`pkgdown` website, [<http://friendly.github.io/heplots/>].

<!-- # fix problem with Probe1, Probe2, which are documented together and caused 404 errors -->
<!-- # fixed by deleting Probe2 from the list and renaming Probe1 -> Probe -->


## Dataset table {-}

```{r datasets}
library(here)
library(dplyr)
library(tinytable)
#dsets <- read.csv(here::here("extra", "datasets.csv"))  # doesn't work in a vignette
dsets <- read.csv("https://raw.githubusercontent.com/friendly/heplots/master/extra/datasets.csv")
dsets <- dsets |> 
  dplyr::select(-X) |> 
  arrange(tolower(dataset))

# link dataset to pkgdown doc
refurl <- "http://friendly.github.io/heplots/reference/"

dsets <- dsets |>
  mutate(dataset = glue::glue("[{dataset}]({refurl}{dataset}.html)")) 

#knitr::kable(dsets)
tinytable::tt(dsets)  |> format_tt(markdown = TRUE)
```

## Concept table {-}

This table can be inverted to list the datasets that illustrate each concept:

```{r concepts}
concepts <- dsets |>
  select(dataset, tags) |>
  tidyr::separate_longer_delim(tags, delim = " ") |>
  arrange(tags, dataset) |>
  summarize(datasets = toString(dataset), .by = tags) |>
  rename(concept = tags)

#knitr::kable(concepts)
tinytable::tt(concepts) |> format_tt(markdown = TRUE)
```