---
title: "Short Intro into Gaussian Mixture Models"
author: "Michael C. Thrun"
date: "`r format(Sys.time(), '%d %b %Y')`"
output: 
          html_document:
            theme: united
            highlight: tango 
            toc: true
            number_sections: true
            doc_depth: 2
            toc_float: true
            fig.width: 8
            fig.height: 8
vignette: >
  %\VignetteIndexEntry{Short Intro into Gaussian Mixture Models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
# library(rgl)
# #library(rglwidget)
# setupKnitr()
# knitr::opts_chunk$set(echo = TRUE,
#                       fig.align = "center",
#                       warning = FALSE,
#                       webgl = TRUE,
#                       fig.width = 8, 
#                       fig.height = 8,
#                       fig.keep = "all",
#                       fig.ext = "jpeg"
#                       )
```
# Gaussian Mixture Models (GMM)
Examples in which using the EM algorithm for GMM itself is insufficient but a visual modelling approach appropriate can be found in [Ultsch et al., 2015].
In general, a GMM is explainable if the overlapping of Gaussians remains small. An good example for modeling of such a GMM in the case of natural data can be found in the ECDA presentation available on Research Gate in [Thrun/Ultsch, 2015].

In the example below the data is generated specifcally such that a the resulting GMM is statistitically signficant.
The interactive approach of AdaptGauss uses shiny. Hence, I dont know how to illustrate these examples in Rmarkdown.

```{}
data=c(rnorm(3000,2,1),rnorm(3000,7,3),rnorm(3000,-2,0.5))

gmm=AdaptGauss::AdaptGauss(data,

Means = c(-2, 2, 7),

SDs = c(0.5, 1, 4),

Weights = c(0.3333, 0.3333, 0.3333))

AdaptGauss::Chi2testMixtures(data,

gmm$Means,gmm$SDs,gmm$Weights,PlotIt=T)

AdaptGauss::QQplotGMM(data,gmm$Means,gmm$SDs,gmm$Weights)
```

## Multimodal Natural Dataset not Suitable for a GMM 
Not every multimodal dataset should be modelled with GMMs.
This is an example for a non-statistically significant model of a multimodal dataset.

```{}
data('LKWFahrzeitSeehafen2010')

gmm=AdaptGauss::AdaptGauss(LKWFahrzeitSeehafen2010,

Means = c(52.74, 385.38, 619.46, 162.08),

SDs = c(38.22, 93.21, 57.72, 48.36),

Weights = c(0.2434, 0.5589, 0.1484, 0.0749))

AdaptGauss::Chi2testMixtures(LKWFahrzeitSeehafen2010,

gmm$Means,gmm$SDs,gmm$Weights,PlotIt=T)

AdaptGauss::QQplotGMM(LKWFahrzeitSeehafen2010,gmm$Means,gmm$SDs,gmm$Weights)

```

# References

Thrun, M. C., & Ultsch, A. : Models of Income Distributions for Knowledge Discovery, Proc. European Conference on Data Analysis (ECDA), DOI: 10.13140/RG.2.1.4463.0244, pp. 136-137, Colchester, 2015. 

Ultsch, A., Thrun, M. C., Hansen-Goos, O., & Lotsch, J. : Identification of Molecular Fingerprints in Human Heat Pain Thresholds by Use of an Interactive Mixture Model R Toolbox (AdaptGauss), International journal of molecular sciences, Vol. 16(10), pp. 25897-25911, 2015.