---
title: "How to use FactorAssumptions"
author: "Jose Eduardo Storopoli"
date: "7/16/2019"
references:
- id: hair2018
title: Multivariate data analysis
author:
- family: Hair
given: Joseph F.
- family: Black
given: William C.
- family: Babin
given: Barry J.
- family: Anderson
given: Rolph E.
edition: 8th ed.
publisher: Cengage Learning
type: book
issued:
year: 2018
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{How to use FactorAssumptions}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{FactorAssumptions}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# FactorAssumptions

Set of Assumptions for Factor and Principal Component Analysis

Description:Tests for Kaiser-Meyer-Olkin (KMO) and communalities in a dataset. It provides a final sample by removing variables in a iterable manner while keeping account of the variables that were removed in each step.

## What is KMO and Communalities?

*Factor Analysis* and *Principal Components Analysis* (PCA) have some precautions and assumptions to be observed (@hair2018).

The first one is the KMO (Kaiser-Meyer-Olkin) measure, which measures the proportion of variance among the variables that can be derived from the common variance, also called systematic variance. KMO is computed between 0 and 1. Low values (close to 0) indicate that there are large partial correlations in comparison to the sum of the correlations, that is, there is a predominance of correlations of the variables that are problematic for the factorial/principal component analysis. @hair2018 suggest that individual KMOs smaller than 0.5 be removed from the factorial/principal component analysis. Consequently, this removal causes the overall KMO of the remaining variables of the factor/principal component analysis to be greater than 0.5.

The second assumption of a valid factor or PCA analysis is the communality of the rotated variables. The commonalities indicate the common variance shared by factors/components with certain variables. Greater communality indicated that a greater amount of variance in the variable was extracted by the factorial/principal component solution. For a better measurement of factorial/principal component analysis, communalities should be 0.5 or greater (@hair2018).

## Loading an example dataset

First we will load an example dataset `bfi` from `psych` and load the package `FactorAssumptions`

```{r bfi, message=FALSE}
library(FactorAssumptions, quietly = T, verbose = F)
bfi_data <- bfi
#Remove rows with missing values and keep only complete cases
bfi_data <- bfi_data[complete.cases(bfi_data),]
head(bfi_data)
```

## Performing the KMO Assumptions

First we will perform the $KMO > 0.5 assumption$ for all individuals variables in the dataset with the `kmo_optimal_solution` function

```{r KMO}
kmo_bfi <- kmo_optimal_solution(bfi_data, squared = FALSE)
```

Note that the `kmo_optimal_solution` outputs a list:

1. the final solution as `df`
2. removed variables with $invidual KMO < 0.5$ as `removed`
3. Anti-image covariance matrix as `AIS`
4. Anti-image correlation matrix as `AIR`

In our case none of the variables were removed due to low individual KMO values

```{r removed_kmo}
kmo_bfi$removed
```

## Performing the Communalities Assumptions

The parallel analysis of `bfi` data suggests seven factors we will then perform the assumptions for all $individual communalities > 0.5$ with the argument `nfactors` set to 7.

We can use either the values `principal` or `fa` functions from `psych` package for argument `type` as desired:

* `principal` will perform a *Principal Component Analysis* (PCA)
* `fa` will perform a *Factor Analysis*

*Note*: we are using the `df` generated from the `kmo_optimal_solution` function
*Note 2*: the default of rotation employed by the `communalities_optimal_solution` is `varimax`. You can change if you want.

```{r communalities}
comm_bfi <- communalities_optimal_solution(kmo_bfi$df, type = "principal", nfactors = 7, squared = FALSE)
```

Note that the `communalities_optimal_solution` outputs a list:

1. the final solution as `df`
2. removed variables with $invidual communalities < 0.5$ as `removed`
3. A table with the communalities loadings from the variables final iteration as `loadings`
4. Results of the final iteration of either the `principal` or `fa` functions from `psych` package as `results`

In our case 3 variables were removed in an iterable fashion due to low individual communality values. And they are listed from the lowest communality that were removed until rendered an optimal solution.

```{r removed_comm}
comm_bfi$removed
```

And finally we arrive at our final principal components analysis rotated matrix. You can export it as a CSV with `write.csv` or `write.csv2`

```{r final_solution}
comm_bfi$results
```
# References