---
# YAML header created by ox-ravel
title: Twin models
author: Klaus Holst & Thomas Scheike
output:
  rmarkdown::html_vignette:
    fig_caption: yes
vignette: >
  %\VignetteIndexEntry{Twin models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

This document provides a brief tutorial to analyzing twin data using the
**`mets`** package:

```{r  include=FALSE,echo=FALSE,message=FALSE,warning=FALSE }
options(warn=-1, family="Times")
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  dpi=50,
  fig.width=7.15, fig.height=5.5,
  out.width="600px",
  fig.retina=1
  ##dev="png",
  ##dpi=72,
  ## out.width = "70%")
)
library("mets")
```

\(
\newcommand{\cov}{\mathbb{C}\text{ov}}
\newcommand{\cor}{\mathbb{C}\text{or}}
\newcommand{\var}{\mathbb{V}\text{ar}}
\newcommand{\E}{\mathbb{E}}
\newcommand{\unitfrac}[2]{#1/#2}
\newcommand{\n}{}
\)

 <!-- 
 The development version may be installed from *github*: 
 -->
```{r  install, eval=FALSE, echo=FALSE}
# install.packages("remotes")
remotes::install_github("kkholst/mets", dependencies="Suggests")
```


# Twin analysis, continuous traits

  In the following we examine the heritability of Body Mass
Index\n{}<sup id="b71edfd9bc946c317f4a732845bcaf93"><a href="#korkeila_bmi_1991" title="Korkeila, Kaprio, Rissanen \&amp; Koskenvuo, {{E}ffects of gender and age on the heritability of body mass index}, {Int J Obes}, v(10), 647--654 (1991).">korkeila_bmi_1991</a></sup> <sup id="718839fcb6ade82ebb2d7de853582b80"><a href="#hjelmborg_bmi_2008" title="Hjelmborg, Fagnani, Silventoinen, McGue, Korkeila, Christensen, Rissanen \&amp; Kaprio, {{G}enetic influences on growth traits of {B}{M}{I}: a longitudinal study of adult twins}, {Obesity (Silver Spring)}, v(4), 847--852 (2008).">hjelmborg_bmi_2008</a></sup>, based on data
on self-reported BMI-values from a random sample of 11,411 same-sex
twins. First, we will load data

```{r  twinbmi }
library(mets)
data("twinbmi")
head(twinbmi)
```

The data is on *long* format with one subject per row.

-   **`tvparnr`:** twin id
-   **`bmi`:** Body Mass Index (\(\mathrm{kg}/{\mathrm{m}^2}\))
-   **`age`:** Age (years)
-   **`gender`:** Gender factor (male,female)
-   **`zyg`:** zygosity (MZ, DZ)

We transpose the data allowing us to do pairwise analyses

```{r  twinwide }
twinwide <- fast.reshape(twinbmi, id="tvparnr",varying=c("bmi"))
head(twinwide)
```

Next we plot the association within each zygosity group

```{r  scatterdens, echo=FALSE,message=FALSE,warning=FALSE }
library("cowplot")

scatterdens <- function(x) {
    require(ggplot2)
    sp <- ggplot(x,
                aes_string(colnames(x)[1], colnames(x)[2])) +
        theme_minimal() +
        geom_point(alpha=0.3) + geom_density_2d()
    xdens <- ggplot(x, aes_string(colnames(x)[1],fill=1)) +
        theme_minimal() +
        geom_density(alpha=.5)+
        theme(axis.text.x = element_blank(),
	      legend.position = "none") + labs(x=NULL)
    ydens <- ggplot(x, aes_string(colnames(x)[2],fill=1)) +
        theme_minimal() +
        geom_density(alpha=.5) +
        theme(axis.text.y = element_blank(),
	      axis.text.x = element_text(angle=90, vjust=0),
	      legend.position = "none") +
        labs(x=NULL) +
        coord_flip()
    g <- plot_grid(xdens,NULL,sp,ydens,
                  ncol=2,nrow=2,
                  rel_widths=c(4,1.4),rel_heights=c(1.4,4))
    return(g)
}
```

We here show the log-transformed data which is slightly more symmetric
and more appropiate for the twin analysis (see Figure \@ref(fig:scatter1) and \@ref(fig:scatter2))

```{r  scatter1, warning=FALSE,message=FALSE,fig.cap="Scatter plot of logarithmic BMI measurements in MZ twins" }
mz <- log(subset(twinwide, zyg=="MZ")[,c("bmi1","bmi2")])
scatterdens(mz)
```

```{r  scatter2, warning=FALSE,message=FALSE,fig.cap="Scatter plot of logarithmic BMI measurements in DZ twins" }
dz <- log(subset(twinwide, zyg=="DZ")[,c("bmi1","bmi2")])
scatterdens(dz)
```

The plots and raw association measures shows considerable stronger
dependence in the MZ twins, thus indicating genetic influence of the
trait

```{r   }
cor.test(mz[,1],mz[,2], method="spearman")
```

```{r   }
cor.test(dz[,1],dz[,2], method="spearman")
```

Ńext we examine the marginal distribution (GEE model with working
independence)

```{r gee  }
l0 <- lm(bmi ~ gender + I(age-40), data=twinbmi)
estimate(l0, id=twinbmi$tvparnr)
```

```{r  }
library("splines")
l1 <- lm(bmi ~ gender*ns(age,3), data=twinbmi)
marg1 <- estimate(l1, id=twinbmi$tvparnr)
```

```{r  marg1, warning=FALSE,message=FALSE,fig.cap="Marginal association between BMI and Age for males and females."}
dm <- lava::Expand(twinbmi,
	    bmi=0,
	    gender=c("male"),
	    age=seq(33,61,length.out=50))
df <- lava::Expand(twinbmi,
	    bmi=0,
	    gender=c("female"),
	    age=seq(33,61,length.out=50))

plot(marg1, function(p) model.matrix(l1,data=dm)%*%p,
     data=dm["age"], ylab="BMI", xlab="Age",
     ylim=c(22,26.5))
plot(marg1, function(p) model.matrix(l1,data=df)%*%p,
     data=df["age"], col="red", add=TRUE)
legend("bottomright", c("Male","Female"),
       col=c("black","red"), lty=1, bty="n")
```


## Polygenic model

We can decompose the trait into the following variance components

\begin{align*}
Y_i = A_i + D_i + C + E_i, \quad i=1,2
 \end{align*}

-   **\(A\):** Additive genetic effects of alleles
-   **\(D\):** Dominante genetic effects of alleles
-   **\(C\):** Shared environmental effects
-   **\(E\):** Unique environmental genetic effects

Dissimilarity of MZ twins arises from unshared environmental effects
only, \(\cor(E_1,E_2)=0\) and

\begin{align*}
\cor(A_1^{MZ},A_2^{MZ}) = 1, \quad
\cor(D_1^{MZ},D_2^{MZ}) = 1,
\end{align*}

\begin{align*}
\cor(A_1^{DZ},A_2^{DZ}) = 0.5, \quad
\cor(D_1^{DZ},D_2^{DZ}) = 0.25,
\end{align*}

\begin{align*}
Y_i = A_i + C_i + D_i + E_i
\end{align*}

\begin{align*}
A_i \sim\mathcal{N}(0,\sigma_A^2), C_i
\sim\mathcal{N}(0,\sigma_C^2), D_i
\sim\mathcal{N}(0,\sigma_D^2),
E_i \sim\mathcal{N}(0,\sigma_E^2)
\end{align*}

\begin{gather*}
    \cov(Y_{1},Y_{2}) = \\
    \begin{pmatrix}
      \sigma_A^2 & 2\Phi\sigma_A^2 \\
      2\Phi\sigma_A^2 & \sigma_A^2
    \end{pmatrix} +
    \begin{pmatrix}
      \sigma_C^2 & \sigma_C^2 \\
      \sigma_C^2 & \sigma_C^2
  \end{pmatrix} +
    \begin{pmatrix}
      \sigma_D^2 & \Delta_{7}\sigma_D^2 \\
      \Delta_{7}\sigma_D^2 & \sigma_D^2
  \end{pmatrix} +
  \begin{pmatrix}
    \sigma_E^2 & 0 \\
    0 & \sigma_E^2
  \end{pmatrix}
\end{gather*}


```{r}
dd <- na.omit(twinbmi)
```

Saturated model (different marginals in MZ and DZ twins and different marginals
for twin 1 and twin 2):
```{r lmsat, eval=FALSE}
l0 <- twinlm(bmi ~ age+gender, data=dd, DZ="DZ", zyg="zyg", id="tvparnr", type="sat")
```

Different marginals for MZ and DZ twins (but same marginals within a pair)
```{r lmflex, eval=FALSE}
lf <- twinlm(bmi ~ age+gender, data=dd,DZ="DZ", zyg="zyg", id="tvparnr", type="flex")
```
 
Same marginals but free correlation with MZ, DZ
```{r lmeqmarg}
lu <- twinlm(bmi ~ age+gender, data=dd, DZ="DZ", zyg="zyg", id="tvparnr", type="eqmarg")
estimate(lu)
```

A formal test of genetic effects can be obtained by comparing the MZ and DZ correlation:
```{r   }
estimate(lu,lava::contr(5:6,6))
```

We also consider the ACE model
```{r  ace}
ace0 <- twinlm(bmi ~ age+gender, data=dd, DZ="DZ", zyg="zyg", id="tvparnr", type="ace")
summary(ace0)
```


# Bibliography
<a id="korkeila_bmi_1991"></a>[korkeila_bmi_1991] Korkeila, Kaprio, Rissanen & Koskenvuo, Effects of gender and age on the heritability of body mass index, <i>Int J Obes</i>, <b>15(10)</b>, 647-654 (1991). [↩](#b71edfd9bc946c317f4a732845bcaf93)

<a id="hjelmborg_bmi_2008"></a>[hjelmborg_bmi_2008] Hjelmborg, Fagnani, Silventoinen, McGue, Korkeila, Christensen, Rissanen & Kaprio, Genetic influences on growth traits of BMI: a longitudinal study of adult twins, <i>Obesity (Silver Spring)</i>, <b>16(4)</b>, 847-852 (2008). [↩](#718839fcb6ade82ebb2d7de853582b80)