\documentclass{article}
% \VignetteIndexEntry{adehabitatHS: Exploratory Analysis of the Habitat Selection by the Wildlife}

\usepackage{graphicx}
\usepackage[colorlinks=true,urlcolor=blue]{hyperref}

\usepackage{color}
\usepackage{url}

\usepackage{Sweave}
\newcommand{\strong}[1]{{\normalfont\fontseries{b}\selectfont #1}}
\let\pkg=\strong
\DeclareTextSymbol{\degre}{T1}{6}

\title{ Exploratory Analysis of the Habitat Selection by the Wildlife in R:\\
the {\tt adehabitatHS} Package }
\author{Clement Calenge,\\
  Office national de la chasse et de la faune
  sauvage\\
  Saint Benoist -- 78610 Auffargis -- France.}
\date{Feb 2011}

\begin{document}

\maketitle
\tableofcontents


<<echo=FALSE>>=
owidth <- getOption("width")
options("width"=80)
ow <- getOption("warn")
options("warn"=-1)
.PngNo <- 0
wi <- 480
pt <- 30
reso <- 40
@

<<label=afig,echo=FALSE,eval=FALSE>>=
.PngNo <- .PngNo + 1; file <- paste("Fig-bitmap-",
          .PngNo, ".png", sep="")
png(file=file, width = wi, height = wi, pointsize = pt, res=reso)
@


<<label=afigi,echo=FALSE,eval=FALSE>>=
.PngNo <- .PngNo + 1; file <- paste("Fig-bitmap-",
          .PngNo, ".png", sep="")
png(file=file, width = wi, height = wi, pointsize = pt, res=20)
@


<<label=zfig,echo=FALSE,eval=FALSE>>=
dev.null <- dev.off()
cat("\\includegraphics[height=7cm,width=7cm]{", file, "}\n\n", sep="")
@

<<label=zfigg,echo=FALSE,eval=FALSE>>=
dev.null <- dev.off()
cat("\\includegraphics[height=14cm,width=14cm]{", file, "}\n\n", sep="")
@


\section{History of the package adehabitatHS}

The package {\tt adehabitatHS} contains functions dealing with the analysis
of habitat selection by the wildlife that were originally available in
the package {\tt adehabitat} (Calenge, 2006).  The data used for such
analysis are generally relocation data collected on animals monitored
using VHF or GPS collars, as well as habitat data (available as maps).\\

I developped the package {\tt adehabitat} during my PhD (Calenge, 2005)
to make easier the analysis of habitat selection by animals.  The
package {\tt adehabitat} was designed to extend the capabilities of
the package {\tt ade4} concerning studies of habitat selection by
wildlife.\\

Since its first submission to CRAN in September 2004, a lot of
work has been done on the management and analysis of spatial data in
R, and especially with the release of the package {\tt sp} (Pebesma
and Bivand, 2005).  The package {\tt sp} provides classes of data that
are really useful to deal with spatial data...\\

In addition, with the increase of both the number (more than 250
functions in Oct. 2008) and the diversity of the functions in the
package \texttt{adehabitat}, it soon became apparent that a reshaping
of the package was needed, to make its content clearer to the users.  I
decided to ``split'' the package {\tt adehabitat} into four packages:\\

\begin{itemize}
\item {\tt adehabitatHR} package provides classes and methods for dealing
  with home range analysis in R.
\item {\tt adehabitatHS} package provides classes and methods for dealing
  with habitat selection analysis in R.
\item {\tt adehabitatLT} package provides classes and methods for dealing
  with animals trajectory analysis in R.
\item {\tt adehabitatMA} package provides classes and methods for dealing
  with maps in R.\\
\end{itemize}

We consider in this document the use of the package {\tt adehabitatHS} to
deal with habitat selection analysis.  All the methods available in
\texttt{adehabitat} are also available in \texttt{adehabitatHS}. Note
that the classes of data returned by the functions of
\texttt{adehabitatHS} are identical to the classes returned by the
same functions in \texttt{adehabitat}.\\


Package {\tt adehabitatHS} is loaded by

<<echo=TRUE,print=FALSE>>=
library(adehabitatHS)
@

<<echo=FALSE,print=FALSE>>=
set.seed(13431)
@


\section{Basic concepts}

\subsection{Use and availability}

The package \texttt{adehabitatHS} aims at making easier the exploration of
habitat selection by the wildlife.\\

According to a common definition, habitat corresponds to the
resources and conditions present in an area that produce occupancy --
including survival and reproduction -- by a given organism
(Hall et al. 1997). The aim of habitat selection studies is often to
identify the environmental characteristics (e.g., biomass, slope) that
make a place suitable for a species.\\

Theoretically, the distinction between habitat and non-habitat implies
the comparison of the environmental composition of sites where the
species is present with the environmental composition of sites where
the species is absent. However, sites where the species is absent  may
be practically difficult to identify. Indeed, a species may not appear
in a site if sampling failed to identify it (e.g. detection probability
lower than 1), if the species is absent from the site or
historical reasons, or if the environmental characteristics
do not define a habitat. Therefore, the analysis of habitat
selection often relies on the comparison of a
sample of used sites (sites where the species is present) with a
sample of available sites (sites where the species
presence is uncertain, but where we consider that it could be
present). We describe several tools in this vignette to deal with such
data in this context.\\

We will consider that the study area can be discretized into
\textit{resource units} (RU, which may correspond to
pixels of a raster map, or to patches of a vector map, see Manly et
al., 2002, for a deeper discussion on this concept). Each RU is
characterized by several environmental variables (elevation, slope,
biomass, etc.). \textbf{In habitat selection studies, the
  environmental variables should be carefully chosen according to the
  biological issue at hand}.\\

We will suppose that we have either censused, or collected a sample
of, resource units available to the species on the study area. An
important point is that \textit{we consider} that these RUs are
available.  In other words, the definition of the availability is
necessarily subjective and depends on the issue at hand (it is not a
property of the studied system). Each available RU may be
characterized by an availability weight describing how the RU is
available to the species. For example, these weights may be useful
when the RUs correspond to habitat patches and that all patches do not
cover the same surface area (this area is then used as an availability
weight).\\

Moreover, we have measured the use of the available RUs by the
species. This use may be measured by many different ways (see the
concept of ``currency of use'' by Bingham and Brennan, 2004). For
example, it may correspond to the number of animals detected in each
pixel of a raster map. We suppose a uniform sampling effort and a
uniform detection probability.\\

\subsection{Three types of designs}
\label{sec:typdes}

Thomas and Taylor (1993) distinguished three types of design used in
the study of habitat selection.\\

In \textbf{design I} studies, the animals are not identified; the
habitat use and availability are measured at the scale of the
population. For example, a sample of sites (the RUs) is drawn on a
given area and each site is investigated to determine whether the
species used it (e.g. presence of animals, presence of feces,
etc.). That is, the basic data structure for this kind of design is:\\

<<label=strdesI,echo=FALSE,fig=FALSE,eval=FALSE>>=
par(mar=c(0,0,0,0))
plot(0,0, xlim=c(0,1), ylim=c(0,1), ty="n")
polygon(c(0.1, 0.5, 0.5, 0.1, 0.1),
        c(0.2, 0.2, 0.8, 0.8, 0.2), col="grey", lwd=2)

polygon(c(0.6, 0.65, 0.65, 0.6, 0.6),
        c(0.2, 0.2, 0.8, 0.8, 0.2), col="grey", lwd=2)

polygon(c(0.8, 0.85, 0.85, 0.8, 0.8),
        c(0.2, 0.2, 0.8, 0.8, 0.2), col="grey", lwd=2)
text(0.07, 0.2, "N", font=3)
text(0.5, 0.83, "P", font=3)
text(0.3, 0.5, "X", font=2, cex=2)
text(0.625, 0.5, "a", font=2, cex=2)
text(0.825, 0.5, "u", font=2, cex=2)

@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<strdesI>>
<<zfig>>
@
\end{center}

\noindent where $\mathbf{X}$ is the table containing the value of the
$P$ environmental variables for the $N$ RUs, and $\mathbf{a}$ and
$\mathbf{u}$ are vectors containing respectively the
availability weights and the utilization weights of the $N$
RUs.\\

In \textbf{design II} studies, the animals are identified and the use
is measured for each one. For example, a sample of
animals is captured on an area, and each one is fitted with a
radio-collar. The animals are then monitored using radio-tracking, so
that it is possible to estimate the habitat use for each one (e.g. by
the proportion of relocations falling in each RU). However, the
availability is measured at the scale of the population (each RU is
supposed equally available to all monitored animals). The data
structure is therefore the following:

<<label=bastrdesII,echo=FALSE,fig=FALSE,eval=FALSE>>=
par(mar=c(0,0,0,0))
plot(0,0, xlim=c(0,1), ylim=c(0,1), ty="n")
polygon(c(0.1, 0.5, 0.5, 0.1, 0.1),
        c(0.2, 0.2, 0.8, 0.8, 0.2), col="grey", lwd=2)
polygon(c(0.55, 0.6, 0.6, 0.55, 0.55),
        c(0.2, 0.2, 0.8, 0.8, 0.2), col="grey", lwd=2)
polygon(c(0.65, 1, 1, 0.65, 0.65),
        c(0.2, 0.2, 0.8, 0.8, 0.2), col="grey", lwd=2)
text(0.07, 0.2, "N", font=3)
text(0.5, 0.83, "P", font=3)
text(0.3, 0.5, "X", font=2, cex=2)
text(0.575, 0.5, "a", font=2, cex=2)
text(0.825, 0.5, "U", font=2, cex=2)
text(1, 0.83, "K", font=3)
@


\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<bastrdesII>>
<<zfig>>
@
\end{center}

\noindent where $\mathbf{X}$ is the table containing the value of the
$P$ environmental variables for the $N$ RUs, $\mathbf{a}$ is the
vector containing the availability weights associated to the $n$ RUs,
and $\mathbf{U}$ is the matrix containing the utilization weights of
each RU (rows) by the $K$ animals (columns).\\


In \textbf{design III} studies, the animals are identified and both
the use and the availability are measured for each one. That is, all
the RUs are not supposed to be equally available to all animals, so
that the availability weights vary from one animal to the other.
For example, a sample of animals is captured on an area, and each one
is fitted with a radio-collar. The animals are then monitored using
radio-tracking. For each animal, the available RUs correspond to the
pixels falling inside the limits of the minimum convex polygon
enclosing all its relocations (these available RUs are therefore
specific to each animal). The used RUs correspond to the pixels
containing at least one relocation, and the utilization weights
correspond to the the proportion of relocations falling in each RU:\\

<<label=essaidesIII,echo=FALSE,fig=FALSE,eval=FALSE>>=
par(mar=c(0,0,0,0))
plot(0,0, xlim=c(0,1), ylim=c(0,1), ty="n")
polygon(c(0.1, 0.3, 0.3, 0.1, 0.1),
        c(0.1, 0.1, 0.3, 0.3, 0.3), col="grey", lwd=2)
text(0.2, 0.2, expression(X[k]), font=2, cex=2)
text(0.07, 0.1, expression(N[k]), font=3)

polygon(c(0.1, 0.3, 0.3, 0.1, 0.1),
        c(0.3, 0.3, 0.5, 0.5, 0.5), col="grey", lwd=2)
text(0.2, 0.4, "...", font=2, cex=2)

polygon(c(0.1, 0.3, 0.3, 0.1, 0.1),
        c(0.5, 0.5, 0.7, 0.7, 0.7), col="grey", lwd=2)
text(0.2, 0.6, expression(X[3]), font=2, cex=2)
text(0.07, 0.5, expression(N[3]), font=3)

polygon(c(0.1, 0.3, 0.3, 0.1, 0.1),
        c(0.7, 0.7, 0.85, 0.85, 0.85), col="grey", lwd=2)
text(0.2, 0.775, expression(X[2]), font=2, cex=2)
text(0.07, 0.7, expression(N[2]), font=3)

polygon(c(0.1, 0.3, 0.3, 0.1, 0.1),
        c(0.85, 0.85, 0.95, 0.95, 0.95), col="grey", lwd=2)
text(0.2, 0.9, expression(X[1]), font=2, cex=2)
text(0.07, 0.85, expression(N[1]), font=3)
text(0.3, 0.98, "P", font=3)

## Le vecteur
polygon(c(0.48, 0.55, 0.55, 0.48, 0.48),
        c(0.1, 0.1, 0.3, 0.3, 0.3), col="grey", lwd=2)
text(0.52, 0.2, expression(a[k]), font=2, cex=1.5)

polygon(c(0.48, 0.55, 0.55, 0.48, 0.48),
        c(0.3, 0.3, 0.5, 0.5, 0.5), col="grey", lwd=2)
text(0.52, 0.4, "...", font=2, cex=1.5)

polygon(c(0.48, 0.55, 0.55, 0.48, 0.48),
        c(0.5, 0.5, 0.7, 0.7, 0.7), col="grey", lwd=2)
text(0.52, 0.6, expression(a[3]), font=2, cex=1.5)

polygon(c(0.48, 0.55, 0.55, 0.48, 0.48),
        c(0.7, 0.7, 0.85, 0.85, 0.85), col="grey", lwd=2)
text(0.52, 0.775, expression(a[2]), font=2, cex=1.5)

polygon(c(0.48, 0.55, 0.55, 0.48, 0.48),
        c(0.85, 0.85, 0.95, 0.95, 0.95), col="grey", lwd=2)
text(0.52, 0.9, expression(a[1]), font=2, cex=1.5)



## Le vecteur
polygon(c(0.68, 0.75, 0.75, 0.68, 0.68),
        c(0.1, 0.1, 0.3, 0.3, 0.3), col="grey", lwd=2)
text(0.72, 0.2, expression(u[k]), font=2, cex=1.5)

polygon(c(0.68, 0.75, 0.75, 0.68, 0.68),
        c(0.3, 0.3, 0.5, 0.5, 0.5), col="grey", lwd=2)
text(0.72, 0.4, "...", font=2, cex=1.5)

polygon(c(0.68, 0.75, 0.75, 0.68, 0.68),
        c(0.5, 0.5, 0.7, 0.7, 0.7), col="grey", lwd=2)
text(0.72, 0.6, expression(u[3]), font=2, cex=1.5)

polygon(c(0.68, 0.75, 0.75, 0.68, 0.68),
        c(0.7, 0.7, 0.85, 0.85, 0.85), col="grey", lwd=2)
text(0.72, 0.775, expression(u[2]), font=2, cex=1.5)

polygon(c(0.68, 0.75, 0.75, 0.68, 0.68),
        c(0.85, 0.85, 0.95, 0.95, 0.95), col="grey", lwd=2)
text(0.72, 0.9, expression(u[1]), font=2, cex=1.5)

@


\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<essaidesIII>>
<<zfig>>
@
\end{center}

\noindent where $\mathbf{X_i}$ contains the value of the $P$
environment variables in the $N_i$ RUs available to the animal $i$,
$\mathbf{a}_i$ contains the availability weights for the animal $i$,
and $\mathbf{u}_i$ contains the utilization weights for the animal
$i$.


\subsection{The concept of ecological niche}
\label{sec:ecolni}

The ecological niche is a useful model to understand the methods
available in \texttt{adehabitatHS}. The
seminal paper of Hutchinson (1957) defined it as the hypervolume, in the
multidimensional space defined by the
environmental variables, where the species can potentially maintain a
viable population. Although this definition has been debated and
precised by many authors (e.g. Chase and Leibhold, 2003), the
original definition of the niche is useful for us as most ecologists
are already familiar with it. Therefore, we use this definition beyond
the limits of the original conceptual framework in which it was
developed, i.e. to study habitat selection.\\

For example, imagine that we are studying habitat selection of a
species on a given area. Moreover, $P$ environmental variables have
been mapped on this area. Suppose that these maps are raster maps, so
that the $N$ RUs correspond to the pixels of the maps. Suppose that we
have prospected the study area so that we know the position of all the
individuals belonging to a given species. The $P$ environmental
variables define a $P$-dimensional space. We will refer to this space
as the \textit{ecological space}. Because each RU is characterized by
a measure on each environmental variable, it follows that each RU
corresponds to a point in this space:

<<label=ploniche,echo=FALSE,fig=FALSE,eval=FALSE>>=
set.seed(321)
par(mar=c(0,0,0,0))
plot(0,0, xlim=c(-1,1), ylim=c(-1,1), ty="n")
arrows(0,0, 0, 0.9, lwd=2)
arrows(0,0, -0.9, -0.5, lwd=2)
arrows(0,0, 0.9, -0.5, lwd=2)
points(data.frame(rnorm(500,sd=0.25), rnorm(500, sd=0.25)),
       pch=21, bg="grey")
points(data.frame(rnorm(50, mean=0.3, sd=0.1),
                  rnorm(50, mean=0.3, sd=0.1)),
       pch=21, bg="red", cex=2)
text(0,0.96, expression(V[3]), cex=1.5)
text(-0.96, -0.5, expression(V[1]), cex=1.5)
text(0.96, -0.5, expression(V[2]), cex=1.5)

@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<ploniche>>
<<zfig>>
@
\end{center}

On this figure, the ecological space is defined by only three
variables ($V_1,V_2,V_3$), but the mathematical concept is still valid
for a larger number of variables. The grey points correspond to the
RUs available on the study area. A subset of RUs have
been used by the species (i.e. the species is
present in it; red points on the figure). We will refer to the
corresponding subset of points in the ecological space as the
``niche'' of the species on this area.\\

The concept of niche, as defined here, is a useful model to tackle the
study of habitat selection. Indeed, many tools available in the
package \texttt{adehabitatHS} are implementations of methods allowing to
compare the shape of the niche with the shape of the distribution
of available points. Although the methods allow to take into account
unequal utilization weights and unequal availability weights, thinking
in terms of ``niche'' as defined here (same utilization weights for
all used RUs and same availability weights for all available RUs) is
a very useful way to understand the methods provided by
\texttt{adehabitatHS}.

\subsection{Marginality and specialization}
\label{sec:marspe}

Two concepts are useful for the study of the niche: the marginality
and the specialization. Consider the niche model described in the
previous section:\\

\begin{itemize}
\item The \textbf{marginality vector} correspond to the vector
  connecting the centroid (i.e., the mean) of the distribution of
  availability weights to the centroid of the distribution of
  utilization weights. The squared length of this vector is the
  \textbf{marginality} \textit{per se}. For example, consider the data
  structure corresponding to design I studies (section
  \ref{sec:typdes}). The marginality vector is calculated by
  $\mathbf{m} = \mathbf{X}^t\mathbf{u} - \mathbf{X}^t\mathbf{a}$. And
  the marginality is equal to $m^2 = ||\mathbf{m}||^2$.\\

\item The \textbf{specialization} is a measure of habitat selection on
  a particular direction of the ecological space. For example,
  consider the variable $y_i$ giving the values of an environmental
  variable in the RU $i$. The specialization is defined
  as:
  $$
  S^2 = \frac{\sum_{i=1}^N a_i (y_i - \sum_{j=1}^N a_i
    y_i)^2}{\sum_{i=1}^N u_i (y_i - \sum_{j=1}^N u_i y_i)^2}
  $$
  where $a_i$ and $u_i$ are respectively the availability weight and
  the utilization weight of the RU $i$. In other words, the
  specialization is the ratio of the variance of availability weights
  divided by the variance of the utilization weights. A strong
  specialization implies that the variance of available RUs is large
  in comparison with the variance of used RUs. Or, in other words,
  that the niche is narrow in comparison to what is available to the
  species.\\
\end{itemize}


We will make use of these two concepts in this vignette.


\section{Design I studies}

\subsection{Basic approach}

The aim of \texttt{adehabitatHS} is to provide tools for the exploration of
habitat selection. We have seen in section \ref{sec:typdes} that the
basic data structure for design I studies is built by a table
$\mathbf{X}$ containing the value of the environmental variables in
each available RU, and vectors $\mathbf{a}$ and $\mathbf{u}$
containing respectively the availability weights and the utilization
weights for each RU. That is, our data structure is the following:

\begin{center}
\includegraphics[keepaspectratio]{exva.png}
\end{center}

Each RU defines a point in the multidimensional space defined by the
environmental variables (ecological space). To each point is
associated an available weight (in the vector $\mathbf{a}$) and an
utilization weight (in the vector $\mathbf{u}$). The set of points for
which the utilization weights are greater than 0 define the niche of
the species (as defined in section \ref{sec:ecolni}). Our aim is to
identify the differences between the two distributions of weights, and
to relate these differences with particular directions of the
ecological space.\\

To identify these differences, we will use graphical methods. Indeed,
as noted by Cleveland (1993), ``\textit{Visualisation is critical to
  data analysis. It provides a front line of attack, revealing
  intricate structure in data that cannot be absorbed in any other
  way. We discover unimagined effects, and we challenge imagined
  ones}''. However, because the ecological space is generally highly
multidimensional, it is hard to explore it with classical
graphical methods. For this reason, the package \texttt{adehabitatHS}
provides several tools relying on factor analysis, to find the most
``interesting'' directions on which most of the differences between
the two distributions of weights are expressed.


\subsection{The general framework for the statistical exploration of the niche}

\subsubsection{Presentation of the GNESFA}

All the factor analyses available in \texttt{adehabitatHS} to explore the
differences between the habitat use and habitat availability in design
I studies can be viewed as particular cases of a general framework
named \textit{General Niche-Environment System Factor Analysis}
(GNESFA). This framework is described in detail in Calenge and Basille
(2008).\\

The basic principle of this analysis consists in the choice of one of
these two distributions, either the utilization weights or the
availability weights, as the Reference distribution, and the other, as
the Focus distribution.\\

One will then ``distort'' the cloud of points, so that the Reference
distribution takes a standard spherical shape in the multidimensional
space (this is sometimes named ``sphering'' the data). Then, the
GNESFA searches for the directions where the Focus distribution
differs the most from this standard spherical shape (by performing a
noncentred principal component analysis of the ``sphered'' data, using
the Focus distribution as row weight). This analysis finds the
direction on which the following criterion in maximized:

$$
\gamma_1 = \frac{\sum_{i=1}^N f_i (y_i - \bar y_r)^2}
{ \sum_{i=1}^N r_i (y_i - \bar y_r)^2}
$$

\noindent $f_i$ and $r_i$ are respectively the focus and
reference weights (depending on which distribution of weights --
utilization or availability -- has been chosen as a reference). $y_i$
is the score of the $i^{th}$ RU on the first axis of the GNESFA. And,
finally, $\bar y_r$ is the ``reference mean'' of these scores, that
is:

$$
\bar y_r = \sum_{i=1}^N r_i y_i
$$

In other words, the GNESFA searches the values $b_1,b_2,...b_p$ such
that:\\\
\begin{itemize}
\item $y_i = b_1 X_{i1} + b_2 X_{i2} + ... + b_p X_{ip}$
\item $\sum_{i=1}^p b_i^2 = 1$
\item $\gamma_1$ is maximized\\
\end{itemize}

\noindent and such that the successive axes are uncorrelated. The GNESFA
provides a very flexible framework to tackle the exploration of
habitat selection, and the analysis has properties that depends on the
distribution chosen as a reference. Calenge and Basille (2008)
describes more completely the interesting properties of this approach.\\

In this vignette, we will illustrate these properties and the
practical use of the GNESFA with an example. First load the dataset
\texttt{bauges} from the package \texttt{adehabitatHS}:


<<>>=
data(bauges)
names(bauges)
@

This data set contains the map of 7 environmental variables in the
Bauges Mountains (French Alps):

<<label=fig1,echo=FALSE,fig=FALSE,eval=FALSE>>=
map <- bauges$map
mimage(map)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<fig1>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<fig1>>
<<zfig>>
@
\end{center}

The maps are stored as an object of class
\texttt{SpatialPixelsDataFrame} from the package
\texttt{sp}. Furthermore, this dataset also contains the relocations
of 198 chamois groups collected by volunteers and professionals
working in various French wildlife and forest management, from 1994 to
2004:

<<label=fig2,echo=FALSE,fig=FALSE,eval=FALSE>>=
image(map)
locs <- bauges$locs
points(locs, pch=3)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<fig2>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<fig2>>
<<zfig>>
@
\end{center}

These relocations are stored as an object of class
\texttt{SpatialPointsDataFrame} from the package \texttt{sp}.
We now compute the utilization weights associated to each pixel of the
map, by numbering the locations of chamois groups in each pixel of the
map. This is done using the function \texttt{count.points} from the
package \texttt{ademaps}:

<<>>=
cp <- count.points(locs, map)
@

Because all the pixels cover the same area, we will consider that they
all have the same availability weights (i.e., $\frac{1}{N}$). Now, we
``unspatialize'' the required elements:

<<>>=
tab <- slot(map, "data")
pr <- slot(cp, "data")[,1]
@

These two elements are:\\

\begin{itemize}
\item \texttt{tab}: a data frame containing the value of the
  environmental variables in each pixel of the map;\\
\item \texttt{pr}: a vector containing the utilization weights
  associated to each pixel; \\
\end{itemize}

First, let us have a look at the distribution of the environmental
variables on the area, as well as the distribution of the animals:

<<label=fig3,echo=FALSE,fig=FALSE,eval=FALSE>>=
histniche(tab, pr)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<fig3>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<fig3>>
<<zfig>>
@
\end{center}

The white histograms show the distributions of available RUs, whereas
the grey histograms show the distributions of used RUs.
We already highlight a strong selection for high elevation values,
high grass cover, high density of fallen rocks and low deciduous
cover. Let us perform a GNESFA to summarize these structures.


\subsubsection{A preliminary \texttt{dudi.*} analysis}
\label{sec:prelimdudi}

We use for this the function \texttt{gnesfa}:

<<>>=
args(gnesfa)
@

The help page of this function indicates that it takes as
main argument an object of class \texttt{dudi}. This class is defined
in the package \texttt{ade4} (Chessel et al. 2004), and is designed to
store the results of factor analyses provided by this
package. \textbf{We use the functions of the package \texttt{ade4} as
  a preliminary step to prepare the data tables for the
  analysis}. Three main functions are of interest for us:\\

\begin{itemize}
\item \texttt{dudi.pca}: performs a principal component analysis of
  the data frame passed as argument;
\item \texttt{dudi.acm}: performs a multiple correspondence analysis
  of the data frame passed as argument (Tenenhaus and Young, 1985);
\item \texttt{dudi.hillsmith}: performs a Hill-Smith analysis
  of the data frame passed as argument (Hill and Smith, 1976);\\
\end{itemize}

The function \texttt{dudi.pca} is to be used when all the variables
present in the data.frame are numeric. The function \texttt{dudi.acm}
is to be used when all the variables present in the data.frame are
factors. The function \texttt{dudi.hillsmith} (or, equivalently,
\texttt{dudi.mix}) is to be used when the data.frame contains both
types of variables. These functions, used as a preliminary to the
GNESFA, are needed to scale the table suitably (so that all the
variables have the same mean and the same variance), and to compute
the weights of the variables in the analysis. For example, the use of
\texttt{dudi.hillsmith} on a table containing a numeric variable and a
factor with four levels ensures that the factor will have the same
weight in the analysis as the numeric variable. Chessel et al. (2004)
give a full description of all the possible preliminary
\texttt{dudi.*} analysis.\\

The result of the functions \texttt{dudi.*} is a list with a component
\texttt{\$tab} containing the table scaled in a suitable way, and a
component \texttt{\$cw} containing the weights of the variables. The
function \texttt{gnesfa} only uses these components. However, we
will see later that other functions of the package also make use of
the component \texttt{\$lw} for the definition of the row weight of
the analysis.\\

\noindent \textit{Remark}: In many cases, it is extremely useful to
interpret the results of these preliminary analyses (this allows to
identify the main patterns on the study area, even if these patterns
are not necessarily related to habitat selection). The \texttt{dudi.*}
analyses are of interest in themselves. However, their use is already
detailed elsewhere (e.g. Chessel et al. 2004), so that we will not
describe them in this vignette.\\

Back to our example: we will use the function \texttt{dudi.pca} to
prepare the table, because all our variables are numeric (in this
particular case, this allows to center and scale the table, like the
function \texttt{scale}):

<<>>=
pc <- dudi.pca(tab, scannf=FALSE)
@


\subsubsection{The FANTER}

We now perform the GNESFA. We first have to choose one of the two
distribution of weights (utilisation weights or availability weights)
as the reference distribution and the other as the Focus
distribution. Depending on this choice, the result will not be the
same. We will first consider the choice:\\

\begin{itemize}
\item Reference = availability
\item Focus = utilization\\
\end{itemize}

The GNESFA, taking the availability distribution as the reference is
also named FANTER (Factor Analysis of the Niche, Taking the
Environment as the Reference). We perform the analysis:

\begin{verbatim}
> gn <- gnesfa(pc, Focus = pr)
Select the first number of axes (>=1): 1
Select the second number of axes (>=0): 1
\end{verbatim}


<<echo=FALSE>>=
gn <- gnesfa(pc, Focus = pr, scan=FALSE, nfFirst=1, nfLast=1)
@

The function asks the user to choose the number of first and last axes
to choose. Indeed, when the Reference distribution is the availability
weights and the Focus is the utilisation weights, both the first axes
and the last axes of the GNESFA may have a meaning. The first axes
correspond to the directions where the utilization distribution as a
whole is the furthest from the centroid of the availability
distribution (these directions often correspond to directions where
the \textit{marginality} is strong; i.e. the criterion $\gamma_1$
defined previously is maximized). The last axes correspond to the
directions where the \textit{width} of the utilization distribution is
the smallest relative to the width of the availability distribution
(these directions often correspond to directions where
the \textit{specialization} is strong, i.e. the criterion $\gamma_1$
defined previously is minimized). The first barplot showed by the
function correspond to the eigenvalues of the analysis:

<<label=fig5,echo=FALSE,fig=FALSE,eval=FALSE>>=
barplot(gn$eig)
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<fig5>>
<<zfig>>
@
\end{center}

Note that it is hard to identify a clear break in the decrease of the
eigenvalues. The analysis fails to identify a clear pattern
here... The second barplot corresponds to 1/eigenvalues, and allows to
see more clearly the possible patterns on the last axes:

<<label=fig6,echo=FALSE,fig=FALSE,eval=FALSE>>=
barplot(1/gn$eig)
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<fig6>>
<<zfig>>
@
\end{center}

No clear pattern appears on the last axes (no clear
break in the increase of 1/eigenvalues)... This analysis does not find
any interesting direction in the ecological space. Have a look at the
results:

<<>>=
gn
@

Just for the sake of illustration, we can have a look at the niche on
the factorial plane built by the first and last axis of the
analysis:

<<label=fig7,echo=FALSE,fig=FALSE,eval=FALSE>>=
scatterniche(gn$li, pr, pts=TRUE)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<fig7>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<fig7>>
<<zfig>>
@
\end{center}

The grey points show the distribution of the RUs (here the pixels) on
the axes found by the analysis. The black points correspond to the RUs
used by the chamois. We can see that the used points are distributed
over the whole range of the factorial axes found by the analysis. This
confirms that the analysis fails to identify any intesting direction.


\subsubsection{The MADIFA and Mahalanobis distances}

But now, consider the opposite point of view, that is:\\

\begin{itemize}
\item Reference = utilization
\item Focus = availability\\
\end{itemize}

In this case, the last axes of the analysis do not have any biological
meaning so that we do not consider them (see Calenge and Basille
2008):

\begin{verbatim}
> gn2 <- gnesfa(pc, Reference = pr)
Select the first number of axes (>=1): 2
Select the second number of axes (>=0): 0
\end{verbatim}


<<echo=FALSE>>=
gn2 <- gnesfa(pc, Reference = pr, scan=FALSE, nfFirst=2, nfLast=0)
@

<<label=fig8,echo=FALSE,fig=FALSE,eval=FALSE>>=
barplot(gn2$eig)
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<fig8>>
<<zfig>>
@
\end{center}

In this case, there is a clear break in the decrease of the
eigenvalues. The first axis of the analysis identifies a strong
pattern. We can have a look at the niche of the chamois on the
first factorial plane found by the analysis:

<<label=fig9,echo=FALSE,fig=FALSE,eval=FALSE>>=
scatterniche(gn2$li, pr)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<fig9>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<fig9>>
<<zfig>>
@
\end{center}

Whereas the utilization distribution is centred on the origin of the
space (as we defined it as the reference), the availability
distribution is strongly skewed toward the positive values of the
first axis. We may have a look at the coefficients of the
environmental variables in the definition of this axis to give a
meaning to it:

<<label=fig10,echo=FALSE,fig=FALSE,eval=FALSE>>=
s.arrow(gn2$co)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<fig10>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<fig10>>
<<zfig>>
@
\end{center}

The grass cover seems to be the main variable affecting the
distribution of the chamois, and to a lesser extent, the elevation and
presence of Fallen Rocks. However, these variables are correlated on
the study area (grass and fallen rocks occur at high elevation,
whereas deciduous cover occurs at low elevation). This may
render the interpretation of the coefficients difficult in an
exploratory context. For this reason, we prefer to give a meaning to
the axes of the GNESFA with the help of the correlations between the
environmental variables and the axes of the GNESFA:

<<label=fig11,echo=FALSE,fig=FALSE,eval=FALSE>>=
s.arrow(gn2$cor)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<fig11>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<fig11>>
<<zfig>>
@
\end{center}

We now see why interpreting the results with the coefficient may pose
problems: the first axis of the GNESFA is negatively correlated with the
grass cover, the elevation, and positively correlated with the
deciduous cover. This did not appeared with the graph showing the
columns coefficients (on which deciduous even had a negative
coefficient!).\\

Therefore, positive values of the first axis correspond to areas at
low elevation with a high deciduous cover and a low grass cover, and,
to a lesser extent, with a low density of fallen rocks. These areas
are rarely used by the chamois, which prefers area at high elevation,
with high grass cover and close to the fallen rocks.\\

\textit{Remark}: in this case, the results are not great biological
discoveries (we demonstrate that the chamois lives in the
mountains!). Indeed, the dataset used to illustrate the methods has
been slightly destroyed to preserve copyright (for the original
analysis of this dataset, see Calenge et al. 2008). However, this
dataset has an interesting property: choosing the availability or the
utilization as the Reference distribution does not return the same
results.\\

\textbf{The GNESFA applied with the availability distribution as the
  focus is also named MADIFA} (Mahalanobis Distances factor
  analysis, Calenge et al. 2008). Note that this function can also be
  applied using the function \texttt{madifa} (see the help page of
  this function):

<<>>=
(mad <- madifa(pc, pr, scan=FALSE))
@

Note that the results returned by this function are identical to the
results of the function \texttt{gnesfa}:

<<>>=
mad$eig
gn2$eig
@

However, more methods are provided by \texttt{adehabitatHS} to deal with the
results of the function \texttt{madifa}. For example, a full summary
of the analysis can be obtained with:


<<echo=FALSE>>=
pt <- 12
@

<<label=fig14,echo=FALSE,fig=FALSE,eval=FALSE>>=
plot(mad, map)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<fig14>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<fig14>>
<<zfig>>
@
\end{center}

<<echo=FALSE>>=
pt <- 20
@

This figure presents:\\

\begin{itemize}
\item the eigenvalue diagram of the analysis;
\item the coefficients of the variables in the definition of the axis;
\item the plot of the niche on the factorial axes;
\item the map of the Mahalanobis distances (see below);
\item the map of the approximate Mahalanobis distances computed from
  the axes of the analysis;
\item the correlation between the environmental variables and the axes
  of the analysis;
\item the maps of the scores of the pixels of the map on the axes  of
  the analysis.\\
\end{itemize}

\noindent \textit{Remark}: by default, the MADIFA is carried out with
equal availability weights for all RUs when the function
\texttt{madifa} is used. However, unequal availability weights can
also be defined with this function. To proceed, the user should pass
the vector of availability weights as row weights to the function
\texttt{dudi.*} performed as a preliminary step (see previous
section). In our example, if we had a vector named \texttt{av.w}
containing the availability weights of the RUs, the preliminary PCA
could have been performed with the following code:

\begin{verbatim}
> pc <- dudi.pca(tab, row.w = av.w, scannf=FALSE)
\end{verbatim}

\noindent (not executed here).\\

The MADIFA, as its names indicates, is closely related to the
Mahalanobis distances methodology introduced by Clark et al. (1993) in
Ecology. They proposed to use the
Mahalanobis distance between a pixel and the distribution of
utilization weights in the ecological space as a measure of habitat
suitability of this pixel for the species. A map of these Mahalanobis
distances can be computed by:

<<label=fig16,echo=FALSE,fig=FALSE,eval=FALSE>>=
MD <- mahasuhab(map, locs)
image(MD)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<fig16>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afigi>>
<<fig16>>
<<zfig>>
@
\end{center}

As we can see, this map is very noisy. The MADIFA provides a way to
remove a part of this noise. Actually, the MADIFA finds the direction
in the ecological space where the average squared Mahalanobis
distances between the available RUs and the distribution of
utilization weights is the largest (this is the meaning of the
criterion $\gamma_1$ when the utilization weights are chosen as the
reference). It is then possible to compute a
reduced-rank Mahalanobis distance between pixels and this utilization
from the results of the analysis. Practically, this is done by summing
the squared scores of the pixels on the successive axes of the
analysis. For example, for a map based on only one axis:


<<label=figRMD,echo=FALSE,fig=FALSE,eval=FALSE>>=
RMD1 <- data.frame(RMD1=mad$li[,1]^2)
coordinates(RMD1) <- coordinates(map)
gridded(RMD1) <- TRUE
image(RMD1)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<figRMD>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afigi>>
<<figRMD>>
<<zfig>>
@
\end{center}

This map is much clearer than the previous one. For the sake of the
illustration, to compute reduced rank Mahalanobis distances based on
two axes:

<<label=figRMD2,echo=FALSE,fig=FALSE,eval=FALSE>>=
RMD2 <- data.frame(RMD2 = apply(mad$li[,1:2], 1, function(x) sum(x^2)))
coordinates(RMD2) <- coordinates(map)
gridded(RMD2) <- TRUE
image(RMD2)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<figRMD2>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afigi>>
<<figRMD2>>
<<zfig>>
@
\end{center}

This second map is more noisy than the first. We would keep the first
one to describe habitat selection.\\

More details about the MADIFA and the GNESFA can be found on the help
pages of the functions \texttt{madifa} and \texttt{gnesfa}.

\subsubsection{The ENFA}

The Ecological niche factor analysis has been proposed by
Hirzel et al. (2002) for habitat suitability mapping purposes. Basille
et al. (2008) have showed that this analysis could be used efficiently
to explore the ecological niche. The basic principle of this analysis
is the following:\\

\begin{itemize}
\item First compute the marginality vector (connecting the centroid of
  the distribution of availability weights to the centroid of the
  distribution of utilization weights);\\

\item Then project the cloud of RUs on the hyperplane orthogonal to
  the marginality vector;\\

\item Find the directions in this subspace on which the specialization
  (ratio variance of the distribution of availability weights /
  variance of the distribution of utilization weights) is the
  largest. These axes are named ``specialization axes''\\
\end{itemize}

Basille et al. (2008) have showed that the ENFA can also be viewed as
a particular case of the GNESFA. Actually, once the data have been
projected on the hyperplane orthogonal to the marginality vector, the
GNESFA of the data table is identical to the ENFA, whatever the
distribution chosen as a Reference (in the case where the availability
is chosen as the reference, the last axes maximize the
specialization; in the case where the utilization is chosen as the
reference, the first axes maximize the specialization). The ENFA can
be carried out with the function \texttt{enfa} or the function
\texttt{gnesfa}:

<<>>=
en1 <- enfa(pc, pr, scan=FALSE)
gn3 <- gnesfa(pc, Reference=pr, scan=FALSE, centering="twice")
en1$s
gn3$eig
@

The eigenvalues diagram is presented below:

<<label=baploenfa,echo=FALSE,fig=FALSE,eval=FALSE>>=
barplot(en1$s)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<baploenfa>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<baploenfa>>
<<zfig>>
@
\end{center}

The ENFA fails to identify any specialization pattern in the
data. However, remember that the ENFA is an analysis of the table
projected on the hyperplane orthogonal to the marginality
vector. There may be an interesting pattern in the direction of the
marginality vector, which is therefore not expressed in the directions
found by the ENFA. For this reason, it is essential to consider the
projection of the data table on the marginality vector as well as the
projections of the data table on the directions of the specialization
axes (see Basille et al. 2008).\\

Basille et al. (2008) showed that a biplot (Gabriel, 1971) can be used
to show both the variables and the RU scores on the plane formed by
the marginality vector (X direction) and any specialization axis (Y
direction). This biplot can be drawn by:

<<label=scattenfa,echo=FALSE,fig=FALSE,eval=FALSE>>=
scatter(en1)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<scattenfa>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<scattenfa>>
<<zfig>>
@
\end{center}


This biplot is the main result of the ENFA. The dark grey polygon
shows the position of the distribution of used RUs, whereas the light
grey polygon displays the position of the distribution of available
RUs. The abscissa is the marginality axis (the direction where the
centroid of the distribution of utilization weights -- displayed by a
dot -- is the furthest from the centroid of the distribution of
available weights -- the origin of the axes), and the ordinate is the
first specialization axis (the direction where the variance of the
utilization distribution is the smallest relative to the variance of
the availability distribution).\\

The eigenvalue diagram indicated that the ENFA failed to identify a
direction where the specialization is high (it is clear on the
previous figure that the variance of the niche on the y-direction is
not much smaller than the variance of the distribution of available
RUs) . However, we can see that the main pattern in the dataset
(identified by the MADIFA) is expressed by the marginality axis. The
results of the ENFA are therefore consistent with the results returned
by the MADIFA. Indeed the marginality axis is strongly correlated with
the first axis of the MADIFA:

<<label=plotenmad,echo=FALSE,fig=FALSE,eval=FALSE>>=
plot(mad$li[,1], en1$li[,1], xlab="First axis of the MADIFA",
     ylab="Marginality axis")
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<plotenmad>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<plotenmad>>
<<zfig>>
@
\end{center}


\noindent \textit{Remark}: by default, the ENFA is carried out with
equal availability weights for all RUs when the function
\texttt{enfa} is used. However, unequal availability weights can
also be defined with this function. To proceed, the user should pass
the vector of availability weights as row weights to the function
\texttt{dudi.*} performed as a preliminary step (see section
\ref{sec:prelimdudi}). In our example, if we had a vector named
\texttt{av.w} containing the availability weights for each RU, the
preliminary PCA could have been performed with the following code:

\begin{verbatim}
> pc <- dudi.pca(tab, row.w = av.w, scannf=FALSE)
\end{verbatim}

\noindent (not executed here).\\


\subsubsection{Conclusions regarding the GNESFA}

In practice, the three analyses taking place within the framework of
the GNESFA often return the same results. However, this is not always
the case, as demonstrated by our example. Taking the utilization
weights as the Focus distribution (the FANTER) does not
allow to identify any pattern in the data here. Actually, even if
the FANTER tends to maximize the marginality on the first axes and the
specialization on the last axes, it does
not explicitly make a distinction between the marginality and the
specialization (for more formal details, see Calenge and Basille,
2008).\\

The MADIFA identifies a strong pattern in the data, and the ENFA
indicates that this pattern is essentially a marginality axis. So, we
may wonder why it does not appear on the first axis of the
FANTER... Actually, on the first axis of the MADIFA, both the
marginality and the specialization are strong. That is, on the
marginality axis, the ratio (used variance)/(available variance) is
very low. Therefore, because both parameters (specialization and
marginality) are strong, the FANTER fails to disentangle the
information carried by two measures... and does not identify any
pattern. By forcing the extraction of the marginality axis, the ENFA
identifies this direction, and by combining marginality and
specialization into a single measure (the average Mahalanobis
distances), the MADIFA identifies this direction.\\

We should not conclude from this example that the FANTER is
useless. Each method has particular properties that are not shared
by the other methods. Thus, the FANTER is the only analysis of this
framework allowing to identify bimodal niches (as demonstrated by
Calenge and Basille 2008). The MADIFA is the only analysis combining
the marginality and the specialization into a unique measure of
habitat selection. The ENFA is the only analysis distinguishing
formally the marginality and the specialization. The three methods
provide three complementary points of view on the ecological space.

\subsubsection{An alternative analysis proposed by James Dunn}

During an e-mail discussion related to the MADIFA, James Dunn
(formerly University of Arkansas) proposed an alternative analysis
with several interesting properties. Because this analysis is
unpublished, we reproduce the derivation of the analysis in the
appendix (with the authorization of Pr. Dunn). This analysis finds the
direction where the specialization is maximized, \textit{without
  projecting the data orthogonally to the marginality vector}. Thus,
the first axis of the analysis maximizes the ratio:

$$
\frac{\mbox{Variance of the availability distribution}}{\mbox{Variance
    of the utilization distribution}}
$$

\noindent and does not consider the marginality as a separate
parameter (actually, it does not consider the marginality at
all!). Thus, if both the marginality and the specialization are
strong in a direction, this direction will be found by this analysis.
We have programed this approach in the function \texttt{dunnfa} of the
package \texttt{adehabitatHS}. Note that this function does not (yet) allow
to define unequal availability weights for the RUs. We can try this
approach on the \texttt{bauges} dataset:

<<>>=
dun <- dunnfa(pc, pr, scann=FALSE)
@

Have a look at the eigenvalues:

<<label=bpdun,echo=FALSE,fig=FALSE,eval=FALSE>>=
barplot(dun$eig)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<bpdun>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<bpdun>>
<<zfig>>
@
\end{center}

There is a clear break in the decrease of the eigenvalues after the
first one. The first axis therefore expresses a clear pattern. We can
have a look at the niche of the species on this first axis:

<<label=hindun,echo=FALSE,fig=FALSE,eval=FALSE>>=
histniche(data.frame(dun$liA[,1]), pr, main="First Axis")
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<hindun>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<hindun>>
<<zfig>>
@
\end{center}

The white histogram shows the distribution of the available RUs,
whereas the grey histogram corresponds to the distribution of the used
RUs. We can see that the niche of the species is positively skewed on
the first axis: the chamois searches for high values of the first
axis.\\

Similarly to the GNESFA, we can interpret the meaning of the axis with
the help of the coefficients of the variables or with the correlations
between the variables and the scores of the available RUs on the first
axis. We choose the latter:

<<label=scordun,echo=FALSE,fig=FALSE,eval=FALSE>>=
s.arrow(dun$cor)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<scordun>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<scordun>>
<<zfig>>
@
\end{center}

We find again the structure highlighted on the first axis of the
MADIFA and the marginality axis of the ENFA. There is indeed a very
strong correlation between the first axis of the MADIFA and the first
axis of this analysis:

<<label=figdunmad,echo=FALSE,fig=FALSE,eval=FALSE>>=
plot(dun$liA[,1], mad$li[,1], xlab="DUNNFA 1", ylab="MADIFA 1")
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<figdunmad>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<figdunmad>>
<<zfig>>
@
\end{center}

Note that the DUNNFA also allows to compute reduced ranks Mahalanobis
distances (see appendix). Presently, the DUNNFA has been implemented
only for the data.frames containing only numeric variables (the
preliminary \texttt{dudi.*} analysis should be \texttt{dudi.pca}).

\subsection{One word about habitat suitability maps}

In the previous sections, we already have seen two methods designed to
map habitat suitability for a species: the Mahalanobis distances
(Clark et al. 1993, implemented in \texttt{mahasuhab}) and the reduced
ranks Mahalanobis distances (computed with the help of the MADIFA or
the DUNNFA).\\

\textbf{We now stress that the package adehabitatHS is not designed for
  predictive purpose, but rather for exploratory purposes}. There are
many packages available for habitat prediction in R (see the CRAN Task
view:
\texttt{http://cran.r\-project.org/web/views/Environmetrics.html}). The
methods in \texttt{adehabitatHS} allowing habitat suitability mapping are
there as a recognition that visual exploration of such maps may bring
insight into the processes at work on the area. Except for the methods
related to the Mahalanobis distances, the package \texttt{adehabitatHS}
provides only the DOMAIN algorithm for such mapping (Carpenter et
al. 1993, implemented in the function \texttt{domain}).


\subsection{When habitat is defined by several categories}
\label{sec:catdes1}

A very common approach to the study of habitat selection consists in
defining several habitat categories on the study area and comparing
the use and availability of each habitat category by the species. For
example, let us consider the elevation map on the study area. Define
four classes of elevation (using the R function \texttt{cut}):

<<>>=
elev <- map[,1]
av <- factor(cut(slot(elev, "data")[,1], 4),
             labels=c("Low","Medium","High","Very High"))
@

The number of pixels in each class is:

<<>>=
(tav <- table(av))
@

Let us map this variable:

<<label=cartelev,echo=FALSE,fig=FALSE,eval=FALSE>>=
slot(elev, "data")[,1] <- as.numeric(av)
image(elev)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<cartelev>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afigi>>
<<cartelev>>
<<zfig>>
@
\end{center}

Now, compute the percentage of use of each habitat class by the
chamois. That is, we compute the number of chamois detections in each
habitat class:

<<>>=
us <- join(locs, elev)
tus <- table(us)
names(tus) <- names(tav)
tus
@

To study the habitat selection of the chamois, we have to compare the
use and availability of each habitat class. Manly et al. (2002)
provide a methodology for this kind of design, relying on the
calculation of selection ratios:

$$
w_j = \frac{u_j}{a_j}
$$

\noindent where $u_j$ is the proportion of use of the habitat class
$j$ and $a_j$ is the proportion of availability of this habitat class
$j$. Manly et al. (2002) present the use of these selection ratios in
an inferential context, but such ratios are also useful in exploratory
contexts. Note that these ratios may be scaled so that their sum is
equal to 1, that is:\\

$$
B_j = \frac{w_j}{\sum_i w_i}
$$

The function \texttt{widesI} implements the approach of Manly et
al. (2002; more details are presented in the example section of the
help page of this function):

<<>>=
class(tus) <- NULL
tav <- tav/sum(tav)
class(tav) <- NULL
(Wi <- widesI(tus, tav))
@

The ``table of ratios'' presents the selection ratios, together with
the proportion of use and availability (and other measures fully
described in Manly et al. 2002). This table illustrates that higher
elevations are selected by the chamois. Note that a graphical summary
of the results can be obtained by typing:

<<eval=FALSE>>=
plot(Wi)
@

\noindent (not executed in this report).
Selection ratios are a useful approach to the study of habitat
selection, especially in the context of designs II and III.\\

\noindent \textit{Remark}: There are close connections between the
theory of selection ratios and the GNESFA presented in the previous
sections. Indeed, let us consider the table $\mathbf{X}$ containing
binary data, indicating whether each pixel (in row) contains (1) or
not (0) each habitat type (in column). Let $\mathbf{u}$ be the vector
containing the number of chamois detections in each pixel. It can be
shown that when the distribution of utilization weights $\mathbf{u}$
is chosen as the Focus distribution, the analysis finds the direction
where the selection ratios are the largest. Indeed, the sum of the
eigenvalues of the GNESFA (with utilization weights corresponding to
the utilization distribution) corresponds to the sum of the selection
ratios minus one. We chan check it on our example:

<<>>=
dis <- acm.disjonctif(slot(elev, "data"))
pc2 <- dudi.pca(dis, scan=FALSE)
gnf <- gnesfa(pc2, pr, scan=FALSE)
@

We first used the function \texttt{acm.disjonctif} from the package
ade4 to convert the factor variable into a complete disjonctive
table. Then, after a preliminary PCA on this table (see section
\ref{sec:prelimdudi}), we performed the GNESFA on this table, using
the vector \texttt{pr} created in previous sections. Now check that
the sum of the eigenvalues of the GNESFA corresponds to the sum of the
selection ratios plus one:

<<>>=
sum(gnf$eig)
sum(Wi$wi)
@

Thus, in the particular case where there is only one factor variable
describing the habitat, the selection ratios and the GNESFA are
closely connected. Although the former are easier to understand in
this case, when the number of habitat variables increase, the GNESFA
provides a consistent way to explore habitat selection in design I
studies.



\section{Design II studies}

\subsection{Basic data structure}
\label{sec:refstru2}

As explained in section \ref{sec:typdes}, design II studies correspond
to studies for which animals are identified (e.g. using radio-tracking)
and habitat use is measured for each one, but availability is
considered to be the same for all animals of the population. Again,
the model of the ecological niche can be very useful in this context:


<<label=figs,echo=FALSE,fig=FALSE,eval=FALSE>>=
set.seed(321)
par(mar=c(0,0,0,0))
plot(0,0, xlim=c(-1,1), ylim=c(-1,1), ty="n", axes=FALSE)
arrows(0,0, 0, 0.9, lwd=2)
arrows(0,0, -0.9, -0.5, lwd=2)
arrows(0,0, 0.9, -0.5, lwd=2)
points(data.frame(rnorm(500,sd=0.25), rnorm(500, sd=0.25)),
       pch=16, col="grey")
points(data.frame(rnorm(50, mean=0.3, sd=0.1),
                  rnorm(50, mean=0.3, sd=0.1)),
       pch=21, bg="red", cex=2)
points(data.frame(rnorm(50, mean=0, sd=0.1),
                  rnorm(50, mean=0.4, sd=0.1)),
       pch=21, bg="green", cex=2)
points(data.frame(rnorm(50, mean=-0.3, sd=0.1),
                  rnorm(50, mean=0.3, sd=0.1)),
       pch=21, bg="blue", cex=2)
text(0,0.96, expression(V[3]), cex=1.5)
text(-0.96, -0.5, expression(V[1]), cex=1.5)
text(0.96, -0.5, expression(V[2]), cex=1.5)
arrows(0,0,0.3,0.3, lwd=4)
arrows(0,0,0,0.4, lwd=4)
arrows(0,0,-0.3,0.3, lwd=4)
arrows(0,0,0.3,0.3, lwd=2, col="pink")
arrows(0,0,0,0.4, lwd=2, col="lightgreen")
arrows(0,0,-0.3,0.3, lwd=2, col="lightblue")

@


\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<figs>>
<<zfig>>
@
\end{center}


Each RU is characterized by a value for all the environmental
variables, so that to each RU is associated a point in the ecological
space (grey points on the figure). For each animal, the set of used
RUs defines a ``niche'' in the ecological space (as defined in section
\ref{sec:ecolni}). So that there are as many niches in the ecological
space as there are animals. Note that a given RU may possibly be used
by several animals. Although both the availability weights and
utilization weights may vary from one RU to the other, this model is
useful to understand the methods that can be used to study habitat
selection with such designs. \\

Our aim is to identify the directions in the ecological space on which
(i) the niches are the most different from the distribution of
available points, (ii) these differences between the distribution of
available points and the niches are the most similar.\\

The functions of \texttt{adehabitatHS} make an extensive use of the
marginality vectors in this context. These vectors connect the mean of
the distribution of available points (here, at the origin of the
ecological space) to the mean of the niches. These vectors are a rough
summary of the selection (they measure the distance between what is
available in average and what is used in average by an animal). These
vectors are represented by arrows on the figure.\\

In this section, we will use as an example a dataset collected by
Daniel Maillard (Office national de la chasse et de la faune sauvage)
on the wild boar. First load the data:

<<>>=
data(puech)
names(puech)
@

This data set has two components: the component \texttt{maps}
describes the values of 9 environmental variables on the study area:

<<label=imdes2,echo=FALSE,fig=FALSE,eval=FALSE>>=
maps <- puech$maps
mimage(maps)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<imdes2>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<imdes2>>
<<zfig>>
@
\end{center}

\noindent and the component \texttt{relocations} contains the
relocations of 6 wild boar on the study area:

<<label=ploloc,echo=FALSE,fig=FALSE,eval=FALSE>>=
locs <- puech$relocations
image(maps)
points(locs, col=as.numeric(slot(locs, "data")[,1]), pch=16)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<ploloc>>
@


\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<ploloc>>
<<zfig>>
@
\end{center}

We now count the number of relocations in each pixel of the map,
for each animal (see the help page of the function
\texttt{count.points}):

<<label=cplocs,echo=FALSE,fig=FALSE,eval=FALSE>>=
cp <- count.points(locs, maps)
mimage(cp)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<cplocs>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<cplocs>>
<<zfig>>
@
\end{center}

We can now derive the elements required for the analysis of habitat
selection:

<<>>=
X <- slot(maps, "data")
U <- slot(cp, "data")
@

\noindent where $\mathbf{X}$ and $\mathbf{U}$ correspond respectively
to the table containing the values of the environmental variables and
to the utilization weights of each RU and for each animal (see section
\ref{sec:typdes}).


\subsection{The OMI analysis}
\label{sec:omi}

The Outlying Mean Index (OMI) analysis is one possible approach to the
study of habitat selection (Doledec et al. 2000). First the table
$\mathbf{X}$ is centred for the availability weights, so that the
origin of the space corresponds to what is available in average to the
animals. Then, one performs a noncentred principal component analysis
of the coordinates of the marginality vectors in this space. This
allows to find the directions where the marginality is in average the
largest. More formally, this analysis finds the vector $\mathbf{b} =
(b_1,b_2,...,b_p)$ such that:\\

\begin{itemize}
\item $y = b_1 x_1 + ... + b_p x_p$
\item $\sum_{i=1}^K \bar y_{ui}^2 = \sum_{i=1}^K (\bar y_{ui} - \bar
  y_{ai})^2$ is maximum on the first axis (where $\bar y_{ui}$ is the
  mean of the utilization distribution of the $i^{th}$ animal and
  $\bar y_{ai}$ is the mean of the availability distribution for this
  animal)\\
\end{itemize}

The OMI analysis is implemented in the function \texttt{niche} of the
package \texttt{ade4}. Let us try it on our example. As for the
GNESFA, it is required that a \texttt{dudi.*} analysis is
performed as a preliminary step (see section
\ref{sec:prelimdudi}). Because all our variables are numeric, we first
perform a principal component analysis:

<<>>=
pc <- dudi.pca(X, scannf=FALSE)
@

Then, we use the function \texttt{niche}:

<<>>=
(ni <- niche(pc, U, scannf=FALSE))
@

A summary of the analysis is obtained by typing:

<<label=plotnichee,echo=FALSE,fig=FALSE,eval=FALSE>>=
plot(ni)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<plotnichee>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<plotnichee>>
<<zfig>>
@
\end{center}

This figure contains a full summary of the analysis:\\

\noindent The \textbf{eigenvalues diagram} shows the amount of
marginality accounted for by each axis. It is very clear that one axis
accounts for most marginality present in the dataset. Actually,
the first axis accounts for:

<<>>=
ni$eig[1]/sum(ni$eig)
@

\noindent more than 90\% of the marginality in the dataset!! However,
the second axis also expresses a notable amount of the marginality:
there is a clear break in the decrease of the eigenvalues after the
second one. The second axis accounts for:

<<>>=
ni$eig[2]/sum(ni$eig)
@

\noindent 7.5\% of the marginality. Together, the two axes account
for 98\% of the marginality in the dataset.\\

\noindent The main graph (\textbf{Samples and species}) shows the
projection of the RUs (named ``samples'' on the graph) on the first
factorial plane, as well as the position of the mean of the
distribution of utilization weights for each animal (named ``species''
on the graph -- this analysis was originally developed for the
analysis of multiple species distribution). Four animals out of 6 are
characterized by a strong selection of the positive values of the
first axis. Note that two animals are characterized by strong negative
values of the second axis.\\

To give a meaning to these axes, have a look at the graph labelled
``\textbf{Variables}'', which presents the scores of the variables on
the axes of the analysis (i.e., the values of the coefficients
$b_i$ found by the analysis). The positive values of the first axis
(strongly selected by four animals) correspond to areas
characterized by steep slopes (difficult access for human), far from
trails and crops (idem) and low sunshine (this area is Mediterranean,
and therefore very warm [often $>$ 40\degre~C] and the animals are relocated
during the day in summer). The strong selection for the positive
values of the first axis are therefore easily interpretable
biologically. The negative values of the second axis (strongly
selected by two animals) correspond to areas with high grass cover
(it is easier for the animals to hide there). \\

The graph labelled ``\textbf{Axis}'' is often of interest. It
represents the correlation between the first two axes of the PCA of
the available RUs (i.e. the preliminary \texttt{dudi.*} analysis) and
the axes of the OMI analysis. The first axis of the preliminary PCA
identifies the direction where the variance of the availability
distribution is the largest. Thus, the first two axes of the PCA
identifies the main directions structuring the study area. The strong
correlation between the scores of the available RUs on the first axis
of the PCA and the scores of the available RUs on first axis of
the OMI analysis indicates that the direction where the habitat
selection appears the strongest is also the direction where the
environment is the most variable. In some studies, this observation
may be of biological interest (and we will see in the next chapter
that this may also be problematic in some studies).\\

The other graphs are not of interest for us.\\

Another useful display consists in mapping, in the geographical space,
the scores of the RUs on the axes. This is done simply by:

<<label=cartaxeniche,echo=FALSE,fig=FALSE,eval=FALSE>>=
ls <- ni$ls
coordinates(ls) <- coordinates(maps)
gridded(ls) <- TRUE
mimage(ls)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<cartaxeniche>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<cartaxeniche>>
<<zfig>>
@
\end{center}

This kind of maps is also very useful to give a biological meaning to
the axes found by the analysis. In this case, it is very clear that
the four animals selecting the positive values of the first axis are
located at the extreme north of the study area (this area correspond
to the gorges of the Herault river).


\subsection{The canonical OMI analysis}
\label{sec:canomi}

We noted that the strong correlation between the first axis of the PCA
and the first axis of the OMI analysis may sometimes indicate a
problem. We now explain why. Dray et al. (2003) noted that the OMI
analysis is a co-inertia analysis. A full description of the
co-inertia analysis can be found in Dray et al. (2003) and Doledec and
Chessel (1994). We give here a rapid description of this method. The
co-inertia analysis is a ``two-table'' factor analysis, like the
redundancy analysis or the canonical correlation analysis. Its basic
aim is similar to the aim of canonical correlation analysis: to find
similarities between two tables \textbf{A} and \textbf{B} with the
same number of rows. Mathematically, it corresponds to
the principal component analysis of the table
$\mathbf{T} = \mathbf{A}^t\mathbf{B}$, using uniform row weights and
column weights for \textbf{T}. It can be shown that when both
\textbf{A} and \textbf{B} are centred and scaled, so that their
columns all have a mean equal to zero and a standard deviation of 1,
then this analysis finds a direction $\mathbf{u}$ in the space defined
by the columns of $\mathbf{A}$ and a direction $\mathbf{v}$ in the
space defined by the columns of $\mathbf{B}$ so that the covariance
between $\mathbf{Au}$ and \textbf{Bv} is maximized. The main quality
of the co-inertia analysis is that there is no mathematical constraint
on the number of rows of the tables $\mathbf{A}$ and $\mathbf{B}$ (The
number of rows of these tables may even be smaller than the number of
columns).\\

An important point here is that:

$$
\mbox{cov}(\mathbf{Au}, \mathbf{Bv}) = \mbox{cor}(\mathbf{Au},
\mathbf{Bv}) \sqrt{\mbox{var}(\mathbf{Au})}
\sqrt{\mbox{var}(\mathbf{Bv})}
$$

Therefore, if the variance of, say, the table $\mathbf{A}$ is
\textit{very} large in a given direction, there is a risk that the
analysis returns this direction, even if the correlation between this
direction and any direction in the space defined by the columns of
$\mathbf{B}$ is small. This may or may not be a problem, depending on
the context in which the co-inertia analysis is used. However, for the
study of habitat selection, it is problematic.\\

In the case of OMI analysis, the table $\mathbf{A}$ is centred (it
corresponds to the table \textbf{X} defined in section
\ref{sec:typdes}), but the table $\mathbf{B}$ is not (it corresponds
to the table \textbf{U} in section \ref{sec:typdes}), so that it is
not the \textit{covariance} that is maximized by the analysis, but
more generally a \textit{co-inertia} (a covariance is a particular
type of co-inertia). In particular, as we have already seen, the OMI
analysis maximizes the average marginality of the animals. However,
the arguments given above still hold. \textit{When there is a strong
  environmental pattern on the study area, there is a non-negligible
  risk that the first axes of the OMI analysis identifies this
  pattern, more than habitat selection}.\\

This can be illustrated on the following figure, illustrating the
habitat selection of four animals on two variables:\\

<<label=figokcanomi,echo=FALSE,fig=FALSE,eval=FALSE>>=
par(mar=c(0.1,0.1,0.1,0.1))
xx <- cbind(rnorm(1000), rnorm(1000, sd=0.3))
plot(xx, asp=1, bg="grey", pch=21, axes=FALSE)
xx1 <- cbind(rnorm(100, mean=-1.5, sd=0.15),
             rnorm(100, mean=0.2, sd=0.05))
xx2 <- cbind(rnorm(100, mean=-0.5, sd=0.15),
             rnorm(100, mean=0.5, sd=0.05))
xx3 <- cbind(rnorm(100, mean=0.5, sd=0.15),
             rnorm(100, mean=0.5, sd=0.05))
xx4 <- cbind(rnorm(100, mean=1.5, sd=0.15),
             rnorm(100, mean=0.2, sd=0.05))
points(xx1, pch=21, bg="red", cex=1.5)
points(xx2, pch=21, bg="yellow", cex=1.5)
points(xx3, pch=21, bg="green", cex=1.5)
points(xx4, pch=21, bg="blue", cex=1.5)
arrows(0,0,-1.5, 0.2, lwd=5)
arrows(0,0,-1.5, 0.2, lwd=3, col="red")
arrows(0,0,-0.5, 0.5, lwd=5)
arrows(0,0,-0.5, 0.5, lwd=3, col="yellow")
arrows(0,0,0.5, 0.5, lwd=5)
arrows(0,0,0.5, 0.5, lwd=3, col="green")
arrows(0,0,1.5, 0.2, lwd=5)
arrows(0,0,1.5, 0.2, lwd=3, col="blue")
box()
abline(v=0, h=0)

@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<figokcanomi>>
<<zfig>>
@
\end{center}

The grey points correspond to the available RUs and the colored points
correspond to the used RUs of 4 animals. The marginality vectors are
also indicated. It is clear on this figure that the direction that
maximizes the marginality is the abscissa (because this is the
direction that is in average the ``closest'' to the marginality
vectors). However, in this direction, we find animals using both very
positive and very negative values of this axis, so that there is no
strong selection of the animals on this direction. On the other hand,
it is also clear that the direction on which habitat selection occurs
is the ordinate (all the marginality vectors are characterized by a
positive coordinate, no use of the negative values in this
direction). Therefore, maximizing the marginality explained by the
first axis does not necessarily returns the direction where the
habitat selection is the largest. If there is a direction of the
ecological space where the variance of the available RUs is much
larger than in any other direction, there is a risk that the OMI
analysis return this direction, whatever the habitat selection by the
species. This may be a problem with OMI analysis.\\

I already have met this problem, when I was working on the habitat
selection of the mouflon in the Bauges mountains (Darmon et
al., 2012). In this mountainous area, the elevation strongly
structures all the environmental variables. The first axis of
the OMI analysis was mainly determined by the elevation pattern, and
no common direction for the marginality vectors was identified.\\

This is where the \textit{canonical OMI analysis} may be useful. This
analysis first distort the ecological space so that the distribution
of availability weights takes a standard spherical shape:

<<label=canom,echo=FALSE,fig=FALSE,eval=FALSE>>=
par(mar=c(0.1,0.1,0.1,0.1))
plot(xx[,1]*0.3, xx[,2], asp=1, bg="grey", pch=21, axes=FALSE)
xx1 <- cbind(rnorm(100, mean=-1.5, sd=0.15),
             rnorm(100, mean=0.2, sd=0.05))
xx2 <- cbind(rnorm(100, mean=-0.5, sd=0.15),
             rnorm(100, mean=0.5, sd=0.05))
xx3 <- cbind(rnorm(100, mean=0.5, sd=0.15),
             rnorm(100, mean=0.5, sd=0.05))
xx4 <- cbind(rnorm(100, mean=1.5, sd=0.15),
             rnorm(100, mean=0.2, sd=0.05))
points(xx1[,1]*0.3, xx1[,2], pch=21, bg="red", cex=1.5)
points(xx2[,1]*0.3, xx2[,2], pch=21, bg="yellow", cex=1.5)
points(xx3[,1]*0.3, xx3[,2], pch=21, bg="green", cex=1.5)
points(xx4[,1]*0.3, xx4[,2], pch=21, bg="blue", cex=1.5)
arrows(0,0,-1.5*0.3, 0.2, lwd=5)
arrows(0,0,-1.5*0.3, 0.2, lwd=3, col="red")
arrows(0,0,-0.5*0.3, 0.5, lwd=5)
arrows(0,0,-0.5*0.3, 0.5, lwd=3, col="yellow")
arrows(0,0,0.5*0.3, 0.5, lwd=5)
arrows(0,0,0.5*0.3, 0.5, lwd=3, col="green")
arrows(0,0,1.5*0.3, 0.2, lwd=5)
arrows(0,0,1.5*0.3, 0.2, lwd=3, col="blue")
box()
abline(v=0, h=0)
@


\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<canom>>
<<zfig>>
@
\end{center}

It is clear then that the direction where the marginality is the
largest in this distorted space is the direction where habitat
selection will be the clearest. However, there is a counterpart: the
canonical OMI analysis places mathematical constraint on (i) the
number of RUs (it should be larger than the number of environmental
variables to allow the matrix inversion required by the analysis) and
(ii) the environmental variables should be linearly independent (also
to allow this inversion).\\

Additional mathematical details can be found in Darmon et
al. (2012). The canonical OMI analysis is implemented in the function
\texttt{canomi}. We can carry out the analysis:

<<>>=
com <- canomi(pc, U, scannf=FALSE)
@

A full summary of the results is displayed below:

<<label=plotcan,echo=FALSE,fig=FALSE,eval=FALSE>>=
plot(com)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<plotcan>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<plotcan>>
<<zfig>>
@
\end{center}

The \textbf{Eigenvalues} indicates that the first two axes account for
most of the marginality of the dataset, once the ecological space has
been distorted to sphere the distribution of availability weights
(clear break after the second eigenvalue). The main graph (\textbf{RUs
and animals}) indicates that four animals select strongly for negative
values of the first axis, and two animals for negative values of the
second axis. This is confirmed by the \textbf{Marginality vectors}.
Both the \textbf{variable scores} and the \textbf{correlations}
indicate that negative values of the first axis correspond to areas
with steep slopes, far from the crops and from recreational trails,
with low elevation and sunshine (but the two graphs may sometimes
indicate different results (see section \ref{sec:canomi} for further
discussion on this aspect).\\

There is actually a strong correlation between the results of the
classical OMI analysis and the results of the canonical OMI analysis:

<<echo=FALSE>>=
pt <- 8
wi <- 900
@

<<label=plocompcomni,echo=FALSE,fig=FALSE,eval=FALSE>>=
par(mfrow=c(2,1))
plot(ni$ls[,1], com$ls[,1],
     xlab="Scores on the first axis of the OMI",
     ylab="Scores on the first axis of the can. OMI")
plot(ni$ls[,2], com$ls[,2],
     xlab="Scores on the first axis of the OMI",
     ylab="Scores on the first axis of the can. OMI")
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<plocompcomni>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<plocompcomni>>
<<zfig>>
@
\end{center}

<<echo=FALSE>>=
pt <- 20
wi <- 480
@

\noindent The classical and canonical OMI analyses here return the
same patterns, but this may not always be the case.\\

\noindent \textit{Remark}: note that variable availability weights can
be defined for both the OMI and canonical OMI analysis, by passing
them as the \texttt{row.w} argument to the preliminary \texttt{dudi.*}
analysis.


\subsection{When habitat is defined by several categories}

Similarly to design I studies, it is a very common approach to study
habitat selection by several animals by defining several habitat
categories on the study area. For example, let us consider the
slope map on the study area. Define four classes of slope (using the R
function \texttt{cut}):

<<>>=
slope <- maps[,8]
sl <- slot(slope, "data")[,1]
av <- factor(cut(sl, c(-0.1, 2, 5, 12, 50)),
             labels=c("Low","Medium","High","Very High"))

@

The number of pixels in each class is:

<<>>=
(tav <- table(av))
@

Let us map this variable:

<<label=carteslope,echo=FALSE,fig=FALSE,eval=FALSE>>=
slot(slope, "data")[,1] <- as.numeric(av)
image(slope)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<carteslope>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afigi>>
<<carteslope>>
<<zfig>>
@
\end{center}

Now, compute the percentage of use of each habitat class by each
animal. That is, we compute the number of relocations for each animal
in each habitat class:

<<>>=
us <- join(locs, slope)
tus <- table(slot(locs,"data")[,1],us)
class(tus) <- NULL
tus <- as.data.frame(tus)
colnames(tus) <- names(tav)
tus
@

There are several ways to compare use and availability in design
II studies when several habitat types have been defined. A common
approach is the compositional analysis proposed by Aebischer et
al. (1993). This approach is implemented in the function
\texttt{compana}:

<<>>=
tav2 <- matrix(rep(tav, nrow(tus)), nrow=nrow(tus), byrow=TRUE)
colnames(tav2) <- names(tav)
compana(tus, tav2, test = "randomisation",
        rnv = 0.01, nrep = 500, alpha = 0.1)
@

Here, the compositional analysis does not identify any significant
habitat selection. However, this approach relies on the hypothesis
that all the animals are selecting habitat in the same way. And we
have seen in the previous sections that this is not necessarily the
case...\\

Another approach has been proposed by Manly et al. (2002), relying on
selection ratios (see section \ref{sec:catdes1}). For each animal, a
selection ratio can be calculated for each habitat type. Then, after
having tested whether habitat selection is the same for all animals,
it is possible to average selection ratios over all animals:


<<>>=
tav <- as.vector(tav)
names(tav) <- names(tus)
(WiII <- widesII(tus, tav))
@

In this case, we can see that habitat selection is not the same from
one animal to the other. Therefore, it does not make sense to compute
the average selection ratios. Rather, we should investigate what
causes these differences.\\

Again, a factorial analysis will be of help here. The
\textit{eigenanalysis of selection ratios} (Calenge and Dufour, 2006)
has been developed to explore graphically habitat selection by the wildlife
when habitat is defined by several categories. This analysis is
closely related to the theoretical context underlying the selection
ratios. Indeed, let $\mathbf{W}$ be the table containing the selection
ratios for each animal (in row) and each habitat type (in
column). The eigenanalysis consists in a noncentred and nonscaled
principal component analysis of the table $\mathbf{W}-\mathbf{1}$,
using the availability weight of each habitat type as column weight
and the number of relocations of each animal as row weight (see
Calenge and Dufour, 2006, for more mathematical details). This
analysis partition the statistics:

$$
S = \sum_{i=1}^P \sum_{j=1}^K \frac{(u_{ij} - p_i u_{+j})^2}{p_i
  u_{+j}}
$$

\noindent where $u_{ij}$ is the number of relocations of animal $j$ in
habitat $i$, $p_i$ is the proportion of available resource units in
habitat $i$ and $u_{+j}$ is the total number of relocations of animal
$j$. This statistic was proposed by White and Garrott (1990) to test
habitat selection in design II studies. What is interesting is that
this analysis connects two widely used approach for habitat selection
studies into a unified framework (selection ratios and White and
Garrott (1990) statistic). The direction of the ecological space that
maximizes the statistic $S$ -- and therefore habitat selection -- is
found by an analysis of the selection ratios.\\

We now perform the eigenanalysis of selection ratios on our example
dataset, with the function \texttt{eisera}:

<<>>=
(eis <- eisera(tus,tav2, scannf=FALSE))
@

We focus our interpretation on two axes after considering the
eigenvalues diagram:

<<label=ploteiser,echo=FALSE,fig=FALSE,eval=FALSE>>=
barplot(eis$eig)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<ploteiser>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<ploteiser>>
<<zfig>>
@
\end{center}

The results of the analysis are presented below:

<<label=scateis,echo=FALSE,fig=FALSE,eval=FALSE>>=
scatter(eis)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<scateis>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<scateis>>
<<zfig>>
@
\end{center}

Here, we find similar results as in previous sections (a group of four
animals is selecting for steep slopes and a group of three animals is
selecting for low slopes).


\subsection{Concluding remarks regarding design II analyses}

We have seen that the eigenanalysis of selection ratios connects two
widely used approaches for habitat selection studies into a
unified framework (selection ratios and White and Garrott
statistic). Actually, it can be shown that this analysis is a
particular case of canonical OMI analysis. Let us perform a canonical
OMI analysis on our example dataset, to illustrate this point. First
transform the map of slopes into a complete disjonctive table:

<<>>=
avdis <- acm.disjonctif(data.frame(av))
head(avdis)
@

Then, perform a canonical OMI analysis using this dataset:

<<>>=
pc <- dudi.pca(avdis, scannf=FALSE)
com2 <- canomi(pc, U, scannf=FALSE)
@

Now compare the eigenvalues of the canonical OMI analysis with the
eigenvalues of the eigenanalysis of selection ratios:

<<>>=
eis$eig/com2$eig
@

The eigenvalues of the eigenanalysis of selection ratios are equal to
the eigenvalues of the canonical OMI analysis multiplied by a constant
(this constant is equal to $u_{++}-P-1$, where $u_{++}$ is the total number
of relocations for all animals and $P$ is the number of habitat types).
The absolute value of the coordinates of the marginality vectors are
also identical:

<<>>=
eis$li[,1]/com2$li[,1]
@


\section{Design III studies}

\subsection{Basic data structure}

As explained in section \ref{sec:typdes}, design III studies
correspond to studies for which animals are identified (e.g. using
radio-tracking) and both habitat use and availability is measured for
each one. The key aspect is that the availability varies from one
animal to the other. Again, the model of the ecological niche can be
very useful in this context:


<<label=plotnidesIII,echo=FALSE,fig=FALSE,eval=FALSE>>=
par(mar=c(0.1,0.1,0.1,0.1))

set.seed(2351)
plot(1,1, asp=1, bg="grey", pch=21, axes=FALSE, ty="n",
     xlim=c(-3.5,3.5), ylim=c(-3.5,3.5))
xx1 <- cbind(rnorm(100, mean=-1.5, sd=0.25),
             rnorm(100, mean=1.5, sd=0.25))
points(xx1, bg="grey", col="blue", pch=21)
set.seed(2)
xx1b <- cbind(rnorm(20, mean=-1.3, sd=0.1),
              rnorm(20, mean=1.8, sd=0.1))
points(xx1b, bg="blue", col="black", pch=21, cex=1.5)
arrows(-1.5, 1.5, -1.3, 1.8, lwd=5, length=0.1, angle=20)
arrows(-1.5, 1.5, -1.3, 1.8, lwd=3, col="red", length=0.1, angle=20)


set.seed(240)
xx2 <- cbind(rnorm(100, mean=1.5, sd=0.25),
             rnorm(100, mean=1.5, sd=0.25))
points(xx2, bg="grey", col="green", pch=21)
set.seed(2488)
xx2b <- cbind(rnorm(20, mean=1.7, sd=0.1),
              rnorm(20, mean=1.8, sd=0.1))
points(xx2b, bg="green", col="black", pch=21, cex=1.5)
arrows(1.5, 1.5, 1.7, 1.8, lwd=5, length=0.1, angle=20)
arrows(1.5, 1.5, 1.7, 1.8, lwd=3, col="red", length=0.1, angle=20)


set.seed(240987)
xx2 <- cbind(rnorm(100, mean=0, sd=0.25),
             rnorm(100, mean=-1.5, sd=0.25))
points(xx2, bg="grey", col="orange", pch=21)
set.seed(2488980)
xx2b <- cbind(rnorm(20, mean=0.3, sd=0.1),
              rnorm(20, mean=-1.6, sd=0.1))
points(xx2b, bg="orange", col="black", pch=21, cex=1.5)
arrows(0, -1.5, 0.3, -1.6, lwd=5, length=0.1, angle=20)
arrows(0, -1.5, 0.3, -1.6, lwd=3, col="red", length=0.1, angle=20)

arrows(0,0,0,3)
arrows(0,0,-sqrt(4.5),-sqrt(4.5))
arrows(0,0,sqrt(4.5),-sqrt(4.5))

text(0,3.2, "V3")
text(-sqrt(4.5)-0.2,-sqrt(4.5)-0.2, "V1")
text(sqrt(4.5)+0.2,-sqrt(4.5)-0.2, "V2")

@


\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<plotnidesIII>>
<<zfig>>
@
\end{center}

A particular set of RUs is available to each animal, and a subset of
it is used by the animal. Therefore, each animal is characterized by a
niche defined in an ``available space'' particular to it. As in design
II studies, we will work with the marginality vectors (that relate
what is available to the animal in average to what has been used in
average by the animal).\\

We will use as an example the dataset analyzed in the previous
section (see section \ref{sec:refstru2} for a description of this data
set). Let us estimate the home range of the animals, e.g. using a
minimum convex polygon:

<<label=pplomcp,echo=FALSE,fig=FALSE,eval=FALSE>>=
pcc <- mcp(locs)
image(maps)
plot(pcc, col=rainbow(6), add=TRUE)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<pplomcp>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afigi>>
<<pplomcp>>
<<zfig>>
@
\end{center}

We rasterize these polygons

<<label=plohr,echo=FALSE,fig=FALSE,eval=FALSE>>=
hr <- do.call("data.frame", lapply(1:nrow(pcc), function(i) {
    over(maps,geometry(pcc[i,]))
}))
names(hr) <- slot(pcc, "data")[,1]
coordinates(hr) <- coordinates(maps)
gridded(hr) <- TRUE
mimage(hr)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<plohr>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<plohr>>
<<zfig>>
@
\end{center}

We will study habitat selection inside the home-range (third order
habitat selection according to the scale of Johnson, 1980). The black
pixels define what is available to each animal, and pixels containing at
least one relocation define the use by the animal (stored in the
object \texttt{cp} built in section \ref{sec:refstru2}). The key
problem is that the available pixels are not the same from one animal
to the other.


\subsection{The K-select analysis}

The K-select analysis is one possible approach to this kind of
study (Calenge et al. 2005). This analysis can be seen as an extension
of the OMI analysis presented in section \ref{sec:omi}. This analysis
focuses on the marginality vectors of the animals. These marginality
vectors are recentred so that they have a common origin. Then, a
noncentred principal analysis is performed on the table containing the
coordinates of these recentred marginality vectors. This analysis
is therefore similar to the OMI analysis: it finds the direction in
the ecological space where the marginality is the strongest (it is
identical to the OMI analysis when the available RUs are the same for
all animals). More formal details can be found in Calenge et
al. (2005).\\

Let us perform a K-select analysis on our dataset. First prepare the
data for the analysis:

<<>>=
pks <- prepksel(maps, hr, cp)
names(pks)
@

The component \texttt{tab} correspond to the concatenated tables
$\mathbf{X_i}$ containing the values of the environmental variables
(columns) in each pixel (rows) available to the animal $i$. The vector
\texttt{weight} contains the corresponding utilization weights, and
the vector \texttt{factor} is a factor allowing to define the limits
of the $\mathbf{X}_i$ in the component \texttt{tab}. We suppose here
that the availability weights are the same for all pixels.\\

We perform a preliminary PCA of the data (the section \ref{sec:prelimdudi}
explains why this preliminary PCA is required):

<<>>=
pc <- dudi.pca(pks$tab, scannf=FALSE)
@

And we perform the K-select analysis:

<<label=plotksel,echo=FALSE,fig=FALSE,eval=FALSE>>=
ksel <- kselect(pc, pks$factor, pks$weight, scann=FALSE)
plot(ksel)
@

<<results=tex,echo=TRUE,fig=FALSE,eval=FALSE>>=
<<plotksel>>
@

\begin{center}
<<results=tex,echo=FALSE>>=
<<afig>>
<<plotksel>>
<<zfig>>
@
\end{center}

This graph summarizes the results of the analysis. The
\textbf{Eigenvalues} diagram indicate that the first axis explains
most of the marginality present in the dataset (there is a clear break
in the decrease of the eigenvalues after the first one). The graph
labelled \textbf{Animals} shows the recentred marginality vectors
(i.e. marginality vectors shifted so that they have a common
origin). This graph indicates that all animals are characterized by a
positive score on the first axis of the analysis.\\

The graph labelled \textbf{Variables} gives the scores of the
variables. This graph allows to give a biological meaning to the
axes. Here, positive values of the axis correspond to the RUs located
at low elevation, close to water, with low sunshine, far from trails
and crops, with high slopes. This axis therefore opposes the areas
located in the gorges of the Herault river and the areas located on
the plateau.\\

The graph labelled \textbf{Available resource units} shows the
distribution of available RUs on the first two axes of the
analysis. It is clear on this graph that Jean and Chou are
characterized by available RUs all located on the plateau (with a
restricted range of values on the first axis), Schnock and Brock are
characterized by available RUs all located in the gorges of the
Herault river (also with a restricted range of values on the first
axis), and Suzanne and Kinou have the two types of habitat within
their home range (large range of values on the first axis).\\

The main graph, labelled \textbf{Marginality vectors} shows the
original (i.e. non-recentred) marginality vectors. The origin of the
arrow indicates what is available in average to the animal, the end of
the arrow indicates what has been used in average by the animal, and
the direction and length of the arrows indicate respectively the
direction and strength of habitat selection. This graph shows that the
strongest selection is showed by the two animals having the largest
choice of values of the first axis of the analysis. A possible
interpretation would be that the gorges of the Herault river is a
preferred habitat that is selected when available.



\subsection{When habitat are defined by several categories}

As for other designs, studying habitat selection when habitat is
defined by several categories is a common approach. We will not
illustrate the methods available to deal with such designs here, as
the methods available to deal with design III are just extensions of
the methods available for design II. We therefore refer the reader to
the help pages of the following functions:\\

\begin{itemize}
\item \texttt{compana}: the compositional analysis is also a possible
  approach for the analysis of habitat selection in design III studies
  (Aebischer et al. 1993);\\
\item \texttt{widesIII}: Manly et al. (2002) also extended the
  theoretical framework underlying the selection ratios to the
  analysis of habitat selection;\\
\item \texttt{eisera}: the eigenanalysis of selection ratios is also a
  possible approach to the analysis of habitat selection in design III
  studies (Calenge and Dufour, 2006).
\end{itemize}


\section{Conclusion}

I included in the package \texttt{adehabitatHS} several functions allowing
the exploration of habitat selection by the wildlife. The functions
from the other brother packages can be used to explore habitat
selection using a wide variety of approaches.


\section*{References}
\begin{description}
\item Aebischer, N., Robertson, P. and Kenward, R. 1993. Compositional
  analysis of habitat use from animal radio-tracking data. Ecology, 74,
  1313-1325.
\item Bingham, R. and Brennan, L. 2004. Comparison of type I error
  rates for statistical analyses of resource selection Journal of
  Wildlife Management, 68, 206--212.
\item Basille, M., Calenge, C., Marboutin, E., Andersen, R. and
  Gaillard, J.M. 2008. Assessing habitat selection using multivariate
  statistics: Some refinements of the ecological-niche factor
  analysis. Ecological Modelling, 211, 233-240.
\item Calenge, C. 2005. Des outils statistiques pour l'analyse des
  semis de points dans l'espace ecologique. Universite Claude Bernard
  Lyon 1.
\item Calenge, C., Dufour, A. and Maillard, D. 2005. K-select
  analysis: a new method to analyse habitat selection in
  radio-tracking studies Ecological Modelling, 186, 143-153.
\item Calenge, C. 2006. The package adehabitat for the R software: a
  tool for the analysis of space and habitat use by
  animals. Ecological modelling, 197, 516--519.
\item Calenge, C. and Dufour, A. 2006. Eigenanalysis of selection
  ratios from animal radio-tracking data. Ecology, 87, 2349--2355.
\item Calenge, C. and Basille, M. (2008) A general framework for the
  statistical exploration of the ecological niche. Journal of
  Theoretical Biology, 252, 674-685.
\item Calenge, C., Darmon, G., Basille, M., Loison, A. and Jullien, J.
  2008. The factorial decomposition of the Mahalanobis distances in
  habitat selection studies. Ecology, 89, 555-566.
\item Carpenter, G., Gillison, A. and Winter, J. 1993. DOMAIN: a
  flexible modelling procedure for mapping potential distributions of
  plants and animals. Biodiversity and Conservation, 2, 667-680.
\item Chase, J. and Leibold, M. 2003. Ecological niches. Linking class
  and contemporary approaches. The University of Chicago Press.
\item Chessel, D., Dufour, A. and Thioulouse, J. (2004) The ade4
  package. R news.
\item Clark, J., Dunn, J. and Smith, K. 1993. A multivariate model of
  female black bear habitat use for a geographic information system.
  Journal of Wildlife Management, 57, 519-526
\item Cleveland, W. 1993. Visualizing data. Hobart Press.
\item Darmon, G., Calenge, C., Loison, A., Jullien, J.M., Maillard,
  D. and Lopez, J.F. 2012. Spatial distribution and habitat selection
  in coexisting species of mountain ungulates. Ecography, 35, 44-53.
\item Doledec, S., Chessel, D. and Gimaret-Carpentier, C. 2000. Niche
  separation in community analysis: a new method. Ecology, 81,
  2914-2927.
\item Doledec, S. and Chessel, D. 1994. Co-inertia analysis: an
  alternative method for studying species-environment
  relationships. Freshwater Biology, 31, 277-294.
\item Dray, S., Chessel, D. and Thioulouse, J. 2003. Co-inertia
  analysis and the linking of ecological tables. Ecology, 84,
  3078-3089.
\item Gabriel, K. 1971. The biplot graphic display of matrices with
  application to principal component analysis. Biometrika, 58,
  453-467.
\item Hall, L., Krausman, P. and Morrison, M. 1997. The habitat
  concept and a plea for standard terminology. Wildlife Society
  Bulletin, 25, 173--182.
\item Hill, M. and Smith, A. 1976. Principal component analysis of
  taxonomic data with multi-state discrete characters. Taxon, 25,
  249-255.
\item Hirzel, A., Hausser, J., Chessel, D. and Perrin,
  N. 2002. Ecological-niche factor analysis: How to compute habitat
  suitability maps without absence data? Ecology, 83, 2027-2036.
\item Hutchinson, G. 1957. Concluding remarks. Cold Spring Harbour
  Symposium, Quantitative Biology, 22, 415--427.
\item Johnson, D. 1980. The comparison of usage and availability
  measurements for evaluating resource preference. Ecology, 61,
  65-71.
\item Manly, B., McDonald, L., Thomas, D., MacDonald, T. and Erickson,
  W. 2002. Resource selection by animals. Statistical design and
  analysis for field studies. Kluwer Academic Publisher.
\item Pebesma, E. and Bivand, R.S. 2005. Classes and Methods for
  Spatial data in R. R News, 5, 9--13.
\item Tenenhaus, M. and Young, F. (1985). An analysis and synthesis of
  multiple correspondence analysis, optimal scaling, dual scaling,
  homogeneity analysis and other methods for quantifying categorical
  multivariate data. Psychometrika, 5, 91--119.
\item Thomas, D. and Taylor, E. 1990. Study designs and tests for
  comparing resource use and availability. Journal of Wildlife
  Management, 54, 322--330.
\item White, G. and Garrott, R. 1990. Analysis of wildlife
  radio-tracking data Academic press.
\end{description}



\section{Appendix: the derivation of a new factor analysis by James Dunn}

Let $\mathbf{y}$ be a $P$-dimensional vector of random
variables, with successive realizations being the data vectors
corresponding to the z vectors of the manuscript. Let \textbf{a}
denote any $P$-dimensional vector of length one whose extensions
define an axis in $P$-space. The projection of $\mathbf{y}$ on the
axis defined by $\mathbf{a}$ is $x = \mathbf{a}^t
\mathbf{y}$ and immediately:

$$
\mbox{Var}(x_u) = \mathbf{a}^t \Sigma_u \mathbf{a}
$$

\noindent if $\mathbf{y}$ is any ``used'' pixel, and

$$
\mbox{Var}(x_a) = \mathbf{a}^t \Sigma_a \mathbf{a}
$$

\noindent if $\mathbf{y}$ is any ``available'' pixel.
Note that $\Sigma_u$ and $\Sigma_a$ are correlation matrices if the
elements of $\mathbf{y}$ are properly scaled.\\

\noindent The problem: Find $\mathbf{a}$ of length one such that:

$$
\phi = \mbox{var}(x_a)/\mbox{var}(x_u) = \mathbf{a}^t \Sigma_a \mathbf{a} /
\mathbf{a}^t \Sigma_u \mathbf{a}
$$

\noindent is maximized.\\

We arrive at the stationary values of $\phi$ by differentiating with
respect to $\mathbf{a}$ and equating the result to a vector of zeros:

$$
\frac{\partial \phi}{\partial \mathbf{a}} =
\frac{2 \Sigma_a \mathbf{a}}{\mathbf{a}^t\Sigma_u \mathbf{a}} -
\frac{(2 \mathbf{a}^t\Sigma_a \mathbf{a}) \Sigma_u
  \mathbf{a}}{(\mathbf{a}^t\Sigma_u \mathbf{a})^2}
$$

\noindent Equating this to \textbf{0} and performing the obvious cancellations
yields:

$$
\Sigma_a \mathbf{a} - \left (\frac{\mathbf{a}^t \Sigma_a
  \mathbf{a}}{\mathbf{a}^t \Sigma_u \mathbf{a}} \right ) \Sigma_u
\mathbf{a} = \mathbf{0}
$$

\noindent or

$$
(\Sigma_a - \phi \Sigma_u)\mathbf{a} = \mathbf{0}
$$

\noindent by recognizing the form of the criterion function, or

$$
(\Sigma_u^{-1} \Sigma_a  - \phi \mathbf{I}) \mathbf{a} = \mathbf{0}
$$

The latter equation identifies the required solution, \textbf{a}, as
an eigenvector (scaled to length one) corresponding to the largest
eigenvalue of $\Sigma_u^{-1}\Sigma_a$. Clearly smaller stationary
values of $\phi$ correspond to the smaller eigenvalues of the same
matrix and occur at coordinates given by their respective associated
eigenvectors.\\

The apparent computational difficulty associated with
finding eigenvalues and eigenvectors of non-symmetric
$\Sigma_u^{-1}\Sigma_a$ is alleviated by the factorization $\Sigma_a =
\mathbf{T}^t \mathbf{T}$, where \textbf{T} is upper triangular. In
terms of the characteristic root (eigenvalue) operator, ch(.), then\\

$$
\phi_1 = \mbox{Max } ch(\Sigma_u^{-1}\Sigma_a) = \mbox{max }
ch(\Sigma_u^{-1}\mathbf{T}^t\mathbf{T}) = \mbox{max } ch(\mathbf{T}
\Sigma_u^{-1} \mathbf{T}^t)
$$

\noindent where $\mathbf{T} \Sigma_u^{-1} \mathbf{T}^t$ is symmetric. Its
associated eigenvector $\mathbf{a}_1$ must satisfy:

\begin{eqnarray}
  (\Sigma_u^{-1}\Sigma_a - \phi_1 \mathbf{I})\mathbf{a}_1 & = &
  (\Sigma_u^{-1}\mathbf{T}^t\mathbf{T} - \phi_1 \mathbf{I})
  \mathbf{a}_1 = \mathbf{0} \\
  (\Sigma_u^{-1} \mathbf{T}^t - \phi_1 \mathbf{T}^{-1})
  \mathbf{T}\mathbf{a}_1 & = & \mathbf{0} \\
  (\mathbf{T}\Sigma_u^{-1} \mathbf{T}^t - \phi_1 \mathbf{I})
  \mathbf{T}\mathbf{a}_1 & = & (\mathbf{T}\Sigma_u^{-1} \mathbf{T}^t -
  \phi_1 \mathbf{I}) \mathbf{b}_1 = \mathbf{0}
\end{eqnarray}

where $\mathbf{b}_1 = \mathbf{Ta}_1$ is the eigenvector of the
symmetric matrix $\mathbf{T} \Sigma_u^{-1} \mathbf{T}^t$ associated
with its largest eigenvalue $\phi_1$. Clearly then $\mathbf{a}_1 =
\mathbf{T}^{-1} \mathbf{b}_1$ is the required solution to the problem
as posed. Additional eigenvalue-eigenvector pairs are similarly
found. If the $\mathbf{A} = (\mathbf{a}_1, ..., \mathbf{a}_P)$ and
$\mathbf{B} = (\mathbf{b}_1, ..., \mathbf{b}_P)$ represent columnwise
concatenations of the respective eigenvectors, then $\mathbf{B} =
\mathbf{TA}$, or $\mathbf{A} = \mathbf{T}^{-1}\mathbf{B}$ whose first
column defines the axis required to solve the original
problem. Additional orthogonal axes are defined by the remaining
columns of \textbf{A} (as in principal components analysis).\\

The foregoing development relates to Mahalanobis $D^2$ as follows:\\

We shall want $D^2$ to reflect dissimilarity between any pixel,
\textbf{y}, and the mean of the ``used'' pixels, $\mu_u$,
using $\Sigma_u$ as the metric. Our reasoning is that any
``available'' site has the potential of being a
``used'' site until proved otherwise by the magnitude of $D^2$.\\

\begin{eqnarray}
  D^2 & = & (\mathbf{y} - \mu_u)^t \Sigma_u^{-1} (\mathbf{y} -
  \mu_u)\\
  & = &   (\mathbf{y} - \mu_u)^t \mathbf{T}^{-1} \mathbf{T}
  \Sigma_u^{-1} \mathbf{T}^t \mathbf{T}^{-t} (\mathbf{y} -
  \mu_u) \mbox{   (where } \mathbf{T}^{-t} = (\mathbf{T}^t)^{-1})\\
  & = &   (\mathbf{y} - \mu_u)^t \mathbf{T}^{-1}
  \mathbf{BD}_\phi\mathbf{B}^t \mathbf{T}^{-t} (\mathbf{y} -
  \mu_u) \mbox{   (where } \mathbf{D}_\phi = diag(\phi_1, ...,
  \phi_P)\\
  & = &  (\mathbf{y} - \mu_u)^t \mathbf{AD}_\phi\mathbf{A}^t
  (\mathbf{y} - \mu_u)\\
  & = & \sum_{j=1}^P \phi_j ((\mathbf{y} - \mu_u)^t \mathbf{a}_j)^2\\
  & = & \sum_{j=1}^P \left ( \frac{(\mathbf{y} - \mu_u)^t
      \mathbf{a}_j}{1 / \sqrt{\phi_j}} \right )^2
\end{eqnarray}

Clearly any R-dimensional subset of these components also could be
used to reflect the species-suitability of a pixel, e.g,

$$
D_R^2 = \sum_{j=1}^R \left ( \frac{(\mathbf{y} - \mu_u)^t
    \mathbf{a}_j}{1 / \sqrt{\phi_j}}\right )^2
$$

\noindent assuming that the eigenvalues are ordered $\phi_j > \phi_{j+1}$.
Thus, $D^2$ is seen to partition into a weighted sum of squares of
projections of $\mathbf{y} - \mu_u$ on each successive derived axes.




\end{document}

% vim:syntax=tex