%\VignetteIndexEntry{Examples of focused model comparison: skew-normal models}
%\VignetteEngine{knitr::knitr}
%\VignetteDepends{fic,sn}

\documentclass[nojss,nofooter]{jss}

\author{Christopher Jackson\\\email{chris.jackson@mrc-bsu.cam.ac.uk}}
\title{Examples of focused model comparison: skew-normal models}
\Plainauthor{Christopher Jackson, MRC Biostatistics Unit}

\Abstract{
The following example (from \citet{claeskens:hjort:book}) illustrates 
how focused model comparison can be performed using the \pkg{fic}
package in a situation where: 
\begin{itemize}
\item a novel class of models is defined and fitted by custom \proglang{R} functions 
\item a simple model is extended in two different directions to define the models being compared
\end{itemize}
}
\Keywords{models}

\begin{document}

\section{Skew-normal models} 

An outcome $y_i$ and a covariate $x_i$ are observed for individuals
$i=1,\ldots,n$.    Four different models are compared: two normal
models with a constant variance, one without (1) and one with (2) a
linear regression term,  and two ``skew-normal''
models without (3) and with (4) the linear regression term.   
The skew-normal model is
defined by an error term $\sigma\epsilon_i$, where the $\epsilon_i$ 
are independently distributed with a density
$f(u|\lambda) = \lambda \Phi(u) ^{\lambda - 1} \phi(u)$.  
All four models
are nested in the ``wide'' model (4).  
\begin{enumerate}
\item[(1)] $y_i \sim N(\beta_0, \sigma^2) $
\item[(2)] $y_i \sim N(\beta_0 + \beta_1 x_i, \sigma^2) $
\item[(3)] $y_i = \beta_0 + \sigma\epsilon_i$
\item[(4)] $y_i = \beta_0 + \beta_1 x_i + \sigma\epsilon_i $
\end{enumerate}

%%https://www-sciencedirect-com.ezp.lib.cam.ac.uk/science/article/pii/S0047259X08000341#b3 
%% explains relation to standard skew normal param 
%% dsn also provided with sn package, in different parameterisation

\section{Fitting the models in R} 

To implement this class of models in \proglang{R}, firstly we define the log density function of the general skew-normal model with mean, scale and skewness parameters $\mu,\sigma,\lambda$ indicated by
arguments \code{mean}, \code{sigma} and \code{lambda}. 
This defines the distribution of $y_i$ in model (3) with
mean $\mu_i = \beta_0$,  and in (4) with $\mu_i = \beta_0 +
\beta_1x_i$.  
Models (1) and (2) are defined by setting $\lambda=1$ in models  (3) and (4)
respectively. 
<<>>=
ldsnorm <- function(x, mean, sd, lambda){
    log(lambda) + (lambda-1)*pnorm(x, mean, sd, log.p=TRUE) +
        dnorm(x, mean, sd, log=TRUE)
}
@ 

The models are fitted to data from the Australian Institute of Sports
\citep{cook1994introduction}, 
available as \code{ais} from the \pkg{sn} package \citep{sn}. 
The outcome $y_i$ is haematocrit level \code{Hc}, and the covariate is
body mass index \code{BMI}.   Observe the skewed 
distribution of the outcome and a mild association between the variables. 
<<fig.height=5>>=
if (!require("sn"))
    stop("The `sn` package should be installed to run code in this vignette") 
data(ais)
par(mfrow=c(1,2))
plot(density(ais$Hc), xlab="Haematocrit level", main="")
plot(ais$BMI, ais$Hc, pch=19,
     xlab="Body mass index", ylab="Haematocrit level")
@ 

The following defines the minus log likelihood for these data as a function of four parameters $\beta_0, \beta_1, \sigma$ and $\lambda$.

<<>>=
mloglik <- function(b0, b1, sd, lambda){
    -sum(ldsnorm(ais$Hc, b0 + b1*ais$BMI, sd, lambda))
}
@

Then to obtain the maximum likelihood estimates under models 1 to  4,
\code{mloglik} is rewritten as a function of a single vector \code{par}, containing either 2, 3
or 4 parameters to be minimised over, depending on the model.  Note here that
models 1 and 3 have $\beta_2$ fixed at 0, and models 1 and 2 have $\lambda$
fixed at 1. The positive parameters $\sigma$ and $\lambda$ will be
estimated by unconstrained maximisation on the log scale. 
These functions can be passed to the \code{nlm} function for
optimisation. 
<<>>=
fn1 <- function(par) mloglik(par[1], 0, exp(par[2]), 1)
fn2 <- function(par) mloglik(par[1], par[2], exp(par[3]), 1)
fn3 <- function(par) mloglik(par[1], 0, exp(par[2]), exp(par[3]))
fn4 <- function(par) mloglik(par[1], par[2], exp(par[3]), exp(par[4]))
@ 

\code{nlm} also requires plausible initial values for the parameters. 
A vector of these (\code{ini}) is obtained as follows by fitting model (2) 
with \code{lm} and extracting
the coefficients with \code{coef} (for $\beta_0$ and $\beta_1$) and the
residual standard deviation for $\log(\sigma)$.  An initial value of 0
is used for $\log(\lambda)$.   \footnote{Note these are the exact
  maximum likelihood estimates for model 2.  The added value of
  \code{nlm} in fitting models 1 and 2, compared to simply using
  \code{lm}, is to conveniently provide the covariance matrix at the
  maximum likelihood estimates in the same form as for the skew normal
  models, making focused model comparison more convenient here. }
<<>>=
lm2 <- lm(Hc ~ BMI, data=ais)
cf <- unname(coef(lm2))
ini <- c(beta0=cf[1], beta1=cf[2], 
         logsigma=log(summary(lm2)$sigma), loglambda=0)
@
The appropriate objective function
(\code{fn1}--\code{fn4}) is then optimised for each model, starting
from the given initial values\footnote{A warning message of \code{NA/Inf replaced by maximum positive value}
can be ignored and is the result of
\code{nlm} trying out extreme and implausible values on the way to
finding the maximum likelihood.}. 
<<warning=FALSE>>=
opt1 <- nlm(fn1, ini[c("beta0","logsigma")], hessian=TRUE)
opt2 <- nlm(fn2, ini[c("beta0","beta1","logsigma")], hessian=TRUE)
opt3 <- nlm(fn3, ini[c("beta0","logsigma","loglambda")], hessian=TRUE)
opt4 <- nlm(fn4, ini, hessian=TRUE)
@ 


Finally, the information required by the \code{fic} function (the
estimates and covariance matrices) is
extracted from the \code{nlm} results and arranged into a list, for
each of the four models. 
<<>>=
mod1 <- list(est=opt1$estimate, vcov=solve(opt1$hessian) )
mod2 <- list(est=opt2$estimate, vcov=solve(opt2$hessian) )
mod3 <- list(est=opt3$estimate, vcov=solve(opt3$hessian) )
mod4 <- list(est=opt4$estimate, vcov=solve(opt4$hessian) )
@ 


\section{Focused model comparison}

We now perform a focused comparison of the four models.  Two
alternative focuses are investigated: the mean and median outcome at a
covariate value of interest.   Expressions for the mean and median of 
the skew normal, in the parameterisation used here, are given by
\citet{claeskens:hjort:book} and implemented in the following \proglang{R} functions: 

<<>>=
mean_snorm <- function(mu, sigma, lambda){
    f <- function(u){u*exp(ldsnorm(u, 0, 1, lambda))}
    mu + sigma * integrate(f, -Inf, Inf)$value
}
median_snorm <- function(mu, sigma, lambda){
    mu + sigma * qnorm(0.5^(1/lambda))
}
@ 

As described in the main \code{fic} package vignette, the focus
function supplied to \code{fic} should have arguments defined by 
a vector \code{par} of parameters of the biggest model (in
this case the four-parameter model 4), and optionally also a matrix of covariate values
\code{X}, and should return the corresponding focus quantity.   In this
example,  the two focus functions are 

<<>>=
focus1 <- function(par, X){
    mean_snorm(mu = par[1] + X %*% par[2],
               sigma = exp(par[3]), lambda = exp(par[4]))
}
focus2 <- function(par, X){
    median_snorm(mu = par[1] + X %*% par[2],
                 sigma = exp(par[3]), lambda = exp(par[4]))
}
@

The \code{inds} matrix, required by \code{fic}, is now
constructed. Recall this indicates which parameters (columns) are
included in each of the models (rows) being compared.   The narrow
model is in the first row, and the wide model in the last.    The rows
are given names to describe the models, so the output is easier to
read. 

The functions \code{fns} required to extract the estimates and covariance matrix
from the fitted model objects are then defined. 

Finally \code{fic} is called to compare the four models for focuses
defined by the mean and median for average men and women, with
covariate values defined by \code{med.bmi}.   The \code{sub} argument
is supplied to ensure the focus estimates are returned too -- note
that \code{fic} can only automatically fit the submodels for standard
\proglang{R} model classes such as \code{glm}. 

<<>>=
inds <- rbind("intcpt"     =c(1,0,1,0),
              "cov"        =c(1,1,1,0),
              "intcpt_skew"=c(1,0,1,1),
              "cov_skew"   =c(1,1,1,1))

fns <- list(coef=function(x)x$est,
            vcov=function(x)x$vcov,
            nobs=function(x)nrow(ais))

med.bmi <- rbind(male=23.56,  female=21.82)

library(fic)
fmean <- fic(mod4, inds=inds, fns=fns, focus=focus1, X=med.bmi, FIC=TRUE,
             sub=list(mod1, mod2, mod3, mod4))
fmean 
fmed <- fic(mod4, inds=inds, fns=fns, focus=focus2, X=med.bmi, FIC=TRUE,
            sub=list(mod1, mod2, mod3, mod4))
fmed
@ 

<<fig.width=7,fig.height=4>>=
ggplot_fic(fmean)
ggplot_fic(fmed)
@ 

For both the mean and the median, including the covariate is judged to
be essential.  Including the skewness is pointless for estimating the
mean, since, as discussed by \citet{claeskens:hjort:book}, the skewness term
provides no extra information.  The utility of including the skewness
is also doubtful for estimating the median too --- given that the covariate is
included, the focus estimates and RMSE are very similar whether or not
the skewness is also included.

% for median: 
% can't reproduce calculation in book - looks strange on p165
% delta2 value given as 45.031 in book, but sqrt(n)(lamwide - 1) = 
% sqrt(nrow(ais))*(exp(mod4$est[4]) - 1) = 14.2*(1.95 - 1) = 13.5

\bibliography{fic}

\end{document}