---
title: "Introduction to dad"
author: "Pierre Santagostini, Rachid Boumaza"
date: "2023-08-29"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
    # number_sections: true
vignette: >
  %\VignetteIndexEntry{Introduction to dad}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

Below is an overview of the data analysis methods provided by the dad package, and a presentation of the type of data manipulated.

For more information on these elements, see:
[https://journal.r-project.org/archive/2021/RJ-2021-071/index.html](https://doi.org/10.32614/RJ-2021-071)

## Data under consideration

The **dad** package provides tools for analysing multi-group data.
Such data consist of variables observed on individuals, these individuals being organised into groups (or occasions).
Hence, there are three types of objects: groups, individuals and variables.

## Implemented methods

For the analysis of such data, a probability density function is associated to each group.
Some methods dealing with these functions are implemented:

* **Multidimensional scaling (MDS) of probability density functions**: function `fmdsd` (continuous data) or `mdsdd` (discrete data)  
<!-- A probability density function is estimated on the data of each group, and the distances between these probability densities are calculated. The MDS is then performed on these distances. -->
* **Hierarchical cluster analysis (HCA) of probability density functions**: `fhclustd` (continuous) or `hclustdd` (discrete)
<!-- In the same way, the distances between the estimated probability density functions are calculated. -->
<!-- Then hierarchical cluster analysis is performed on these distances. -->
* **Discriminant analysis (DA) of probability density functions**:
  + Computation of the misclassification ratio using the one-leave-out method: `fdiscd.misclass` (continuous) or `discdd.misclass` (discrete)
  + Assignment of groups of individuals, one group after another, for which the class is unknown:  `fdiscd.predict` (continuous) or `discdd.predict` (discrete)

## Data organisation

In order to facilitate the work with these multi-group data, the **dad** package uses objects of class `"folder"` or `"folderh"`.
These objects are lists of data frames having particular formats. 

### Objects of class `folder`

Such objects are lists of data frames which have the same column names.
Each data frame matches with an occasion (a group of individuals).

An object of class `"folder"` is created by the functions `folder` or `as.folder` (see their help in R).

**Example:**
Ten rosebushes $A$, $B$, $\dots$, $J$ were evaluated by 14 assessors, at three sessions, according to several descriptors including their shape `Sha`, their foliage thickness `Den` and their symmetry `Sym`.

```{r}
library(dad)
data("roses")
x <- roses[, c("Sha", "Den", "Sym", "rose")]
head(x)
```

Coerce these data into an object of class `"folder"`:
```{r}
rosesf <- as.folder(x, groups = "rose")
print(rosesf, max = 9)
```

### Objects of class `folderh`

Objects of class `"folderh"` can be used to avoid redundancies in the data.

In the most useful case, such objects are hierarchical lists of two data frames `df1` and `df2` related by means of a key which describes the "1 to N" relationship between the data frames.

They are created by the function `folderh` (see its help in R for the case of three data frames or more).

**Example:**
Data about 5 rosebushes (`roseflowers$variety`). For each rosebush, measures on several flowers (`roseflowers$flower`).

```{r}
library(dad)
data(roseflowers)
df1 <- roseflowers$variety
df2 <- roseflowers$flower
```

Build an object of class `"folderh"`:
```{r}
fh1 <- folderh(df1, "rose", df2)
print(fh1)
```