\documentclass{article}
\usepackage{url,Sweave,html,alltt}
%\VignetteIndexEntry{Overview and Examples}
\newcommand{\R}{{\normalfont\textsf{R}}{}}
\renewcommand{\vec}{\mathrm{vec}}
\newcommand{\vn}[1]{\mbox{{\em #1}}}
\newcommand{\betahat}{\hat{\beta}}
\newcommand{\diag}{\mathrm{diag}}
\newcommand{\Cov}{\mathrm{Var}}
\newcommand{\arc}{{\em Arc}}
\newcommand{\dcode}[1]{{\small{\tt #1}}}
\newcommand{\dr}{{\tt dr}}
\newcommand{\sir}{{\sffamily sir}}
\newcommand{\ire}{{\sffamily ire}}
\newcommand{\pire}{{\sffamily pire}}
\newcommand{\save}{{\sffamily save}}
\newcommand{\phd}{{\sffamily phd}}
\newcommand{\phdy}{{\sffamily phdy}}
\newcommand{\phdq}{{\sffamily phdq}}
\newcommand{\phdres}{{\sffamily phdres}}
\renewcommand{\span}{{\mathcal S}}
\newcommand{\E}{\mbox{E}}
\newcommand{\N}{\mbox{N}}
\newcommand{\var}{\mbox{Var}}

\newenvironment{references}{%
  \section{References}%
    \begin{list}%
            {}%
            {\setlength{\leftmargin}{.25in}\setlength{\itemindent}{-.25in}}}%
   {\end{list}}

\newcommand{\jasa} {{\it Journal of the American Statistical Association}}


\title{The dr package}
\author{Sanford Weisberg\\
{\small \em School of Statistics, University of Minnesota,
Minneapolis, MN 55455.}}
\date{\today}
\begin{document}
\SweaveOpts{concordance=TRUE}
\maketitle

\begin{abstract}
The \dr\ package for \R\ for {\em dimension reduction regression}
was first documented in Weisberg (2002).  This is a revision of that
article, to correspond to version 3.0.0 of \dr\ for \R\ added to
CRAN (cran.r-project.org) in Fall 2007.

Regression is the study of the dependence of a response variable $y$
on a collection of $p$ predictors collected in $x$. In {\em
dimension reduction regression}, we seek to find a few linearly
independent linear combinations $\beta_1'x,\ldots,\beta_d'x$, such
that all the information about the regression is contained in these
$d$ linear combinations.  If $d$ is very small, perhaps one or two,
then the regression problem can be summarized using simple graphics;
for example, for $d=1$, the plot of $y$ versus $\beta_1'x$ contains
all the regression information.  When $d=2$, a 3D plot contains all
the information.  Formal estimation methods can additionally be used
given the lower dimensional problem based on the few linear
combinations.

The primary goals of dimension reduction regression are determining
the dimension $d$, and estimating $\beta_1, \ldots, \beta_d$, or
more precisely the subspace of $\Re^p$ spanned by $\beta_1, \ldots,
\beta_d$.  Several methods for these goals have been suggested in
the literature and some of these are implemented in \dr. Methods
implemented in the package include sliced inverse regression or
\sir, sliced average variance estimates, or \save, principal Hessian
directions, or \phd, and inverse regression estimation, or \ire. The
first three of these methods estimate the span of the $\beta$s by
examining the inverse regression problem of $x|y$, rather than the
forward regression problem of $y|x$, and all estimate the $\beta$s
as the solution of an eigenvalue problem. The method \ire\ is new in
Version 3.0.0 and estimates a basis for the subspace of interest by
minimizing a quadratic objective function.

Determining $d$ is approached as a testing problem. {\em Marginal
dimension tests} are used for a hypothesis of the form $d=d_0$
versus an alternative $d>d_0$, without restriction on the subspaces
involved other than their dimension.  The first of these tests was
proposed by Li (1991) in his paper on \sir.  Added to Version 3.0.0
are tests of {\em coordinate hypotheses}, proposed by Cook (2004),
which effectively provides a test for elimination of predictors to
reduce dimension, although the method is more general than this.
These latter tests provide the basis for stepwise elimination of
predictors.  These tests are available in \dr\ for methods \sir,
\save, and \ire.

Finally, the methods \sir, \save\ and \ire\ are extended to allow
for inclusion of categorical predictors in $x$.
\end{abstract}

\section{Introduction}
In the general regression problem, we have a response $y$ of
dimension $k \ge 1$ and $p$-dimensional predictor $x$, and the goal
is to learn about how the conditional distributions $F(y|x)$ change
as $x$ varies through its sample space.  In parametric regression,
we specify a known functional form for these conditional
distributions, apart from a few parameters to be estimated. In
nonparametric regression no assumptions are made about $F$, but
progress is really only possible if the dimensions $p$ and $k$ are
small.

{\em Dimension reduction regression} is one intermediate possibility
between the parametric and nonparametric extremes.  We assume
without loss of information that the conditional distributions can
be indexed by $d$ linear combinations, or for some probably unknown
$p\times d$ matrix $B$
\begin{equation}
F(y|x) = F(y|B'x) \label{eq:eq1}
\end{equation}
or in words, all the information in $x$ about $y$ is contained in
the $d$ linear combinations $B'x$. This representation always holds
trivially, by setting $B=I$, the $p\times p$ identity matrix, and so
the usual goal is to find the $B$ of lowest possible dimension for
which this representation holds.  If (\ref{eq:eq1}) holds for a
particular $B$, then it also holds for $B^*=BA$, where $A$ is any
full rank matrix, and hence the unique part of the regression
summary is the subspace that is spanned by the columns of $B$, which
we denote $\span(B)$.  The choice of $B$ that gives $\span(B)$ of
smallest dimension is called the {\em central subspace}. Cook (1998)
provides a more complete introduction to these ideas, including
discussion of when the central subspace exists and is unique.

The \dr\ package for \R\ provides methods for this approach to
regression problems.  The basic idea is to estimate both a dimension
$d$, and a set of basis vectors $\betahat_1,\ldots,\betahat_d$ that
are linearly independent and span $\span(B)$.

For readers familiar with earlier versions of \dr, here is list of
the additions to Version 3.0.0.  If you are new to the package, you
may wish to skip to this list.
\begin{description}
\item[New methods]  Inverse regression estimates, Cook and Ni (2005)
and Wen and Cook (2007) are specified in a call to \dr\ with
\dcode{method="ire"},  as described in Section~\ref{sec:ire}. This
method is different from the other methods in \dr\ in that the
estimated directions minimize an objective function.  Other methods
compute a matrix $M$ and then estimate a subspace by the span
eigenvectors corresponding to the first few eigenvalues of
$M$\footnote{It is possible to view the \sir\ method as minimizing
an objective function as well, although this is not the way it was
originally presented; Cook (2004)}.

\item[Marginal dimension tests for \save]  Tests for the
dimension hypotheses in \save\ were presented in Shao, Cook and
Weisberg (2007), and are included in \dr.

\item[Coordinate hypotheses]
In previous versions of \dr, the only tests available were marginal
dimension tests about the dimension of the central (mean) subspace,
without actually specifying any characteristics of what that
subspace might be.  Cook (2004) suggested both marginal and
conditional tests, in which specific hypotheses can be tested. For
example, one could test to see if the central (mean) subspace is
orthogonal to a particular predictor. Cook described both marginal
and conditional coordinate tests.  The marginal coordinate tests are
available for \sir, \save\ and \ire, with and without categorical
predictors.  Conditional coordinate tests are available for \sir\
without categorical predictors, and for \ire\ with and without
categorical predictors.  Conditional coordinate tests for other
methods may be added to \dr\ in a later release.

Based on the marginal coordinate tests, we could imagine a stepwise
deletion of potential predictors from a pool of possible predictors.
This is implemented for \sir, \save\ and \ire\ using the standard R
function \dcode{drop1}. Since \dcode{step} in \R\ is not a generic
function, a function called \dcode{dr.step} is included here.

\item[Categorical predictors] Chiaromonte, Cook and Li (2002)
suggested how dimension reduction methods could be extended to
problems with categorical predictors.  This methodology is available
in \dr\ using the \dcode{groups} argument.  For example,
\verb+group = ~Sex+
used a one-sided formula to specify that the levels of a
variable \dcode{Sex} provides a grouping variable. This argument is
recognized for \sir, Chiaromonte, Cook and Li (2002), \save, Shao,
Cook and Weisberg (submitted), and for \ire, Wen and Cook (2007).
\end{description}

In addition to these major additions, several minor changes have
been made as well.  Many of the methods require approximation of
$p$-values for statistics distributed as a linear combination of
$\chi^2$ random variables. The user may select to use either the
Wood (1989) or Bentler and Xie (2000) approximation to a linear
combination of Chi-squared random variables to approximation
significance levels of many of the tests.

The standard \dcode{update} method in \R\ can be used with \dr.
Standard short-cuts for formulas can also be used.  For example, the
function call \dcode{dr(ais)} will automatically select the first column
of the data frame \dcode{ais} as the response and the remaining
columns as predictors, using defaults for all other arguments.
Similarly, \verb+dr(LBM~.,ais)+ will use \vn{LBM} as the response
and all other columns as predictors.

Several of the methods require approximating the response $Y$ by a
discrete version by slicing into nonoverlapping slices.  The
function \dcode{dr.slices} is used for this purpose, and has been
improved to give better handling of ties. To correspond to the
program \arc, \dcode{dr.slices.arc}, which is the old slicing
function, is also available for use.


\section{Generalities and notation}
All the methods require a few underlying assumptions.  The following
three are most common (Cook, 2004, Section 3):
\begin{description}
\item[Linearity condition]  This assumption concerns only the the
marginal distribution of the predictors $x$.   If we write
$P_{\span(B)}$ as the projection on the central subspace $\span(B)$,
then we must have $\E(x|P_{\span(B)}x) = P_{\span(B)}x$.  A stronger
condition that could be checked in practice is that $\E(a'x|b'x)$ is
linear in $b'x$ for all vectors $a$ and $b$.  If $x$ is normal the
condition is satisfied, but normality is much stronger than needed.
\item[Constant covariance condition]  Some methods require
conditions on second moments of the predictors as well that the
matrix $\var(x|P_{\span(B)}x)$ is a nonrandom matrix.  This
condition is also satisfied by $x$ normal, and is satisfied
approximately if $x$ has an elliptically contoured distribution.
\item[Coverage condition] This condition generally requires
    that a method can recover all of the central subspace,
    not just part of it.  Not all methods can satisfy this
    condition, however.  Ordinary least squares regression
    is a dimension reduction method if the linearity
    condition is satisfied, but it can only recover at most
    one basis vector in $\span(B)$.  Sliced inverse
    regression can recover only directions associated with
    the mean $\E(y|x)$ but not of higher moments.  In
    addition, if the number of slices for \sir\ is less
    than $d$, all directions cannot be recovered.
\end{description}

Suppose we have data $(x_i,y_i)$, for $i=1,\ldots,n$ that are
independent and collected into a $n \times p$ matrix $X$ and an
$n$-vector $Y$ if $k=1$ and a $n \times k$ matrix $Y$ if $k>1$.  A
multivariate response is not implemented for \phd\ or \ire. In
addition, suppose we have nonnegative weights $w_1,\ldots,w_n$ whose
sum is $n$; if unspecified, we take all the $w_i=1$.  The weights
are used to improve the elliptical symmetry of the (weighted)
distribution of the $x_i$ and thereby satisfy the linearity
condition at least approximately; see Section~\ref{sec:weights}.

All the methods described here are invariant under location and
scale change, and all start by replacing $X$ by a matrix $Z$ such
that the columns of $Z$ have mean zero and identity covariance
matrix.  One way to do the centering and scaling uses the QR
decomposition.  Define $\bar{x} = \sum w_ix_i/n$ to be the vector of
weighted column means, $W = \mathrm{diag}(w_i)$. We use the $QR$
decomposition to find a matrix $Q$ with orthonormal columns and an
upper triangular matrix $R$ such that
\begin{equation}
QR = W^{1/2}(X - 1\bar{x}') \mbox{ and } Z= \sqrt{n}Q \label{eq:z}
\end{equation}
Then the matrix $Z$ has column means zero and sample identity
covariance matrix.  If $\hat{B}$ has columns that estimate a basis
for the central subspace for the regression of $y|z$, then
$R^{-1}\hat{B}$ provides a basis for the space the regression of
$y|x$.

With the exception of the \dcode{ire} method described in
Section~\ref{sec:ire}, the methods implemented in \dr\ have the
following general pattern.
\begin{enumerate}
\item
The methods \sir, \save\ and \phd\ are based on study of the inverse
regression problem of $x|y$ or $z|y$ rather than the forward
regression problem of $y|x$ or $y|z$.  All three methods require the
linearity condition, and possibly the constant covariance condition.
They use the scaled and centered data $Z$ to find a matrix $\hat{M}$
that is a consistent estimate of a population matrix $M$ with the
property that $\span_z(M) \subseteq \span_z(B)$. For \sir, we
estimate only vectors in the {\em central mean subspace}, Cook and
Li (2002), which guarantees only that $\E(y|B'z)=\E(y|z)$, ignoring
any information beyond the first moment; \save\ and \phd\ are more
general.  If the coverage condition is assumed, than all of
$\span_z(B)$ is recovered.

Assuming the dimension of $\span_z(B)$ is $d$, a basis for
$\span(B)$ in $X$-coordinates is $R^{-1}\hat{B}$, which can be used
to form on orthonormal basis for this space in the original $p$
dimensional coordinate system of the predictors.

\item
The tests concerning the dimension $d$ are called {\em marginal
dimensional tests} and are available for \sir, \save, \ire\ and one
version of \phd; details are given in the following sections or the
references therein.  It is usual to do testing in a sequential
fashion, first testing $d=0$ versus $d>0$, then testing $d\leq 1$
versus $d>1$, and continue until a nonsignificant result is
obtained.

Also available are {\em coordinate tests}, in which we test to see
if the central subspace is orthogonal to a prespecified subspace.
These tests can be done marginally, with no assumptions about the
dimension $d$, or for some methods conditionally, assuming $d$ is
known. This is particularly useful for deciding if a predictor,
which represents a one-dimensional subspace, can be removed from a
model. This allows the use of stepwise deletion of variables, as
will be discussed in Section~\ref{sec:mct}.

\item
If the matrix $\hat{B}$ has columns with basis vectors for the
estimated central subspace, then $(X - 1\bar{x}')\hat{B}$ gives the
$d$ linear combinations of the variables in the $n$-dimensional
observation space; these are computed using \dcode{dr.directions},
and can be used in graphical procedures or in other fitting methods.
\end{enumerate}

\section{The {\tt dr} function}
The main function in the \dr\ package is also called \dr. Here are
the arguments, with their defaults:
\begin{verbatim}
dr(formula, data, subset, group=NULL, na.action = na.fail,
    weights,method = "sir", chi2approx="bx",...)
\end{verbatim}
The arguments \dcode{formula}, \dcode{data}, \dcode{subset},
\dcode{na.action}, and \dcode{weights} are the save as the arguments
of the same name with the standard \R\ function \dcode{lm}.  The
other arguments are unique to \dr.

The required \dcode{formula} is a two-sided formula with the
response on the left side, or a matrix if the response is
multivariate, and a set of predictors on the right side. Although
any formula that is valid for \dcode{lm} can be used, the predictors
are expected to all be continuous without any nesting or crossing. A
typical formula might be \verb|y~x1+x2+x3| for a univariate
response, or \verb|cbind(y1,y2)~x1+x2+x3| for a bivariate response.

The argument \dcode{data} specifies an optional data frame for the
variables.  The \dcode{subset} argument specifies which rows of the
data are to be used in the analysis, as in \dcode{lm}.
\dcode{na.action} is the missing value action.  The default \dcode{na.fail} will return from the function with no computing done.  Setting  \dcode{na.action=na.omit} will omit any rows with missing data and then proceed.

The \dcode{group} argument, if used, requires a one-sided formula
that evaluates to a list of a small number of levels.  Dimension
reduction is done for each combination of the grouping variables.
For example, \verb+group=~Sex+ would fit separately for each
\dcode{Sex}, and \verb+group=~Sex:AgeClass+ would fit separately for
each combination of \dcode{Sex} and \dcode{AgeClass}.  For this
latter case, both \dcode{Sex} and \dcode{AgeClass} {\em must} be
declared as factors.

The methods generally require linearly related predictors, and this
can often be achieved by transformation of predictors (see Cook and
Weisberg, 1999, and the function \dcode{bctran} in the \R\ package
\verb+alr3+), or by weighting the observations.  Weights can be
computed using \dcode{dr.weights}, described in
Section~\ref{sec:weights}.  The argument \dcode{weights} would
usually either be \dcode{NULL} of the result of calling
\dcode{dr.weights}.

The \dcode{method} argument is used to select the particular method
of computing dimension reduction. The default is \dcode{"sir"}, but
see Table~\ref{tab:1} for other choices.  Each of these methods is
described in later sections.

\begin{table}
\caption{Methods implemented in \dr.  ``Multivariate" is yes if the
response can be multivariate.  ``Partial" is yes if conditioning on
categorical predictors is allowed using the \dcode{group} argument.
``Coord" is yes if coordinate tests are available.}\label{tab:1}
\begin{tabular}{|lp{2.0in}|ccc|} \hline
Name & Reference & Multivariate & Partial & Coord \\ \hline
\sir\ & Li (1991), Chiaromonte, Cook and Li (2002) & Yes & Yes & Yes \\
\save\ & Cook and Weisberg (1991); Shao, Cook and Weisberg (2007)
& Yes & Yes & Yes \\
\phdy\ & Li (1992), Cook (1998) & No & No & No \\
\phdres\ & Cook (1998) & No & No & No \\
\phdq\ & Li (1992) & No & No & No \\
\ire\ & Cook and Ni (2006), Wen and Cook (2007) & No & Yes & Yes \\ \hline
\end{tabular}
\end{table}

The ``\ldots'' allows for additional arguments to be passed to
particular dimension reduction methods. For example, \sir\ is a
slicing method and so an argument {\tt nslices} may be used. Another
useful argument is \dcode{numdir}, which is set a maximum value for
$d$.  The default is \dcode{numdir=4}.  These additional arguments
are described in later sections where the methods are described.

The argument \dcode{slice.info} used in earlier versions of \dr\ has
been removed. A new argument \dcode{slice.function} has been added.
If \dcode{slice.function=dr.slices.arc}, then the algorithm used by
Arc for computing slices is used.  Otherwise, a newer algorithm
given in the function \dcode{dr.slices} is used.

An additional new argument is \dcode{chi2approx = c("bx","wood")}.
The default value of \dcode{"bx"} will use the Bentler and Xie
(2000) approximation to a linear combination of Chi-square random
variables when needed for computing significance levels.  If
\dcode{chi2approx="wood"}, then the Wood (1989) method, which was
previously the default in \dr, is used.

\section{SIR}\label{sec:sir}
\subsection{Basics}

Sliced inverse regression or \sir\ was proposed by Li (1991). We
assume that data consists of $n$ rows $(y_i,x_i)$ for
$i=1,\ldots,n$.  We write $z_i$ for the $i$-th row of the centered
and scaled version of $x_i$, from (\ref{eq:z}).  We suppose that the
response variable $y$ has $h$ distinct levels or values; if it does
not have $h$ levels but $y$ is univariate, then the range of $y$ is
sliced into $h$ non-overlapping {\em slices} so that the number of
observations in each slice is approximately equal.  If $y$ is
multivariate, $h$ slices are formed in a somewhat more complex way;
see Section~\ref{sec:slice}.  The only difference between univariate
and multivariate \sir\ is the way that slices are formed.

Suppose that $\bar{z}_j$, for $j=1,\ldots,h$, is the mean of the
$z_i$ for $i$ in the $j$th slice. The kernel matrix for \sir\ is
\begin{equation}
\hat{M} = \frac{1}{n} \sum g_j\bar{z}_j\bar{z}_j'
\label{eq:sir}
\end{equation}
where $g_j$ is the sum of the weights in slice $j$, and the weights
are always scaled so that $\sum w_i = n$.  If the dimension of the
central (mean) subspace is $d$, then the estimated central mean
subspace is the span of the $d$ eigenvectors corresponding to the
$d$ largest eigenvalues of $\hat{M}$, translated back to
$x$-coordinates.

The default computational method using the \dr\ function is \sir.
Here is an example of its use.
<<a1,echo=FALSE>>=
library(dr)
opt <- options(width=66)
@
<<a11,echo=TRUE>>=
summary(s0 <- dr(LBM~log(SSF)+log(Wt)+log(Hg)+log(Ht)+log(WCC)+log(RCC)+
  log(Hc)+log(Ferr),data=ais,slice.function=dr.slices.arc,nslices=8,
  chi2approx="wood",numdir=4,method="sir"))
@
This output is not identical to the output for \sir\ obtained from
previous versions of \dr\ because the defaults for forming slices
have changed, and the labels of sections of the output have changes
as well. We have set the arguments
\dcode{slice.function=dr.slices.arc}, \dcode{nslices=8} and
\dcode{chi2approx="wood"} to get results that agree with Table~5 in
Cook (2004), apart from multiplication of some basis vectors by
$-1$.

The \dcode{summary} output is similar for all \dr\ methods. Basic
information, like the fitting method, the formula, and the slice
information, is printed.  The vectors collected into a matrix
$\hat{B}$ in the section called \dcode{Estimated Basis Vectors for
Central Subspace} are computed by taking the eigenvectors in the
$z$-scale, backtransforming to the original $x$-scale as previously
discussed, and then normalizing so each column has length one. These
vectors are an estimated basis for $\span(B)$, and they are
orthogonal relative to the inner product determined by
$\widehat{\Cov(X)}$, so $\hat{B}'\widehat{\Cov(X)}\hat{B}$ is a
diagonal matrix.


Next, the corresponding eigenvalues are given, ordered largest to
smallest.  The line marked \verb+R^2*(OLS|dr)+ is the square of the
correlation between the linear combination obtained by the ols
regression of the response on the predictors and the subspace
spanned by the first direction, the first two directions, and so on.
In this example, the correlation with \dcode{Dir1} is almost 1, so
this direction is almost the same as ols.  Under the assumption of
linearly related predictors, ols estimates a direction in the
central subspace, so the large correlation is to be generally
expected.

For \sir, the marginal dimension tests have asymptotic $\chi^2$
distributions, as suggested by Li (1991), who assumed that the
marginal distribution of $X$ was normal.  Bura and Cook (2001) show
that the same asymptotic distribution holds under the linearity and
constant covariance conditions. The tests are customarily viewed
sequentially, first testing $d=0$ versus $d>0$, then $d=1$ versus
$d>1$, and continue until we get a nonsignificant result. For the
example, we would tentatively conclude $d=2$ from these tests.

\subsection{Coordinate hypothesis tests}\label{sec:mct}
\subsubsection{Marginal coordinate tests}\label{sec:mctm} These tests
were proposed by Cook (2004). Suppose that $\span(B)$ represents the
central subspace with basis given by the columns of $B$. Coordinate
tests allow testing hypotheses that the space $\span(B)$ is
orthogonal to one (or more linear combinations) of the predictors;
were this so, we could achieve dimension reduction simply by
dropping these predictors.

Define a {\em hypothesis matrix} $H$, and under the null hypothesis
we have that all vectors in $\span(B)$ can be represented as linear
combinations of the columns of $H$, or if $P_H$ is the projection on
$H$, the null and alternative hypotheses are $(I-P_H)B=0$ versus
$(I-P_H)B \neq 0$. The method \dcode{dr.coordinate.test} will
compute the appropriate test.  Cook (2004) proposed two approaches
to testing this hypothesis, the {\em marginal} approach in which we
assume nothing about the dimension $d$, and the {\em conditional}
approach in which we assume that $d$ is known.

To do a test the user must specify $H$, but \dr\ includes automatic
tools to make this easier for the most common coordinate hypotheses.
For example,
<<a2,echo=TRUE>>=
dr.coordinate.test(s0,hypothesis=~.-log(RCC))
@
The null hypothesis is specified by the formula given in the
argument \dcode{hypothesis}.  In this case, the null hypothesis
is that $\span(B)$ can be expressed as a linear combination of
all the predictors (determined by the ``\verb+~.+'') except
for $\log(\vn{RCC})$.  The function recognizes that
\dcode{hypothesis} is a formula and called the function
\dcode{coord.hyp.basis} to generate the $H$ matrix.
Alternatively, the user can provide a matrix with $p$ rows and
$p^*$ columns, where $p$ is the number of predictors, and
$p^*<p$ is the dimension of the hypothesis space. See the
documentation for \dcode{coord.hyp.basis} for details. Cook
provided several ways of estimating the $p$-value, depending on
the assumptions made.  We present the most general
approximation that requires the fewest assumptions.
%Adding the argument
%\dcode{pval="restricted"} will use more assumptions to get the
%$p$-value.

\subsubsection{Conditional coordinate tests}\label{sec:mctc}
These tests are of the same hypothesis as in Section~\ref{sec:mctm},
except we condition on a known value $d$ for the overall dimension.
The same function is used for the computation, except that the
argument \dcode{d} is set to the hypothesized dimension.
<<a2,echo=TRUE>>=
dr.coordinate.test(s0,hypothesis=~.-log(RCC),d=2)
@
We see that the outcome of the test can be quite different when we
condition on the dimension.  Conditional coordinate tests are
available only for \sir, \ire, and \ire\ with categorical
predictors.

\subsection{Backward elimination}\label{sec:be}
The function \dcode{dr.coordinate.test} is used in the standard \R\
function \dcode{drop1} to examine dropping each predictor in turn.
<<a2,echo=TRUE>>=
m0 <- drop1(s0,update=TRUE)
@
The result of using this function is another \dr\ object obtained by
dropping the predictor with the largest $p$-value.    This is the
basis for stepwise fitting.  If \dcode{update=FALSE}, then refitting
\dr\ is skipped.

The function \dcode{dr.step(object,scope,d,stop)} calls
\dcode{drop1} repeatedly, each time dropping the variable with the
largest $p$-value according to the marginal coordinate test, or the
conditional coordinate test (for \ire\ or for \sir\ or \save\
without categorical predictors) if a non-null value of \dcode{d} is
specified. The documentation for these functions gives more
details\footnote{The standard function \dcode{step} in \R\ is not a
generic function and so it can't be used with \dr.}. The argument
\dcode{scope} is a one-sided formula listing predictors that are
never deleted.  The argument \dcode{d} is the dimension of the
central subspace for \ire\ with or without categorical predictors,
or for \sir\ without categorical variables. The algorithm continues
until the number of variables remaining is equal to the maximum
dimension set by the argument \dcode{numdir} used in the call the
\dr, which is 4 by default, or until the $p$-value for the next
variable to be removed is less than \dcode{stop}.  The default for
\dcode{stop} is 0.  As an example, the output below will delete
predictors one at a time as long as the largest $p$-value exceeds
0.20.  The variable \dcode{log(Wt)} will never be deleted.
<<echo=TRUE>>=
s1a <- dr.step(s0,scope=~log(Wt),stop=0.20)
@

\subsection{Categorical predictors and partial SIR}\label{sec:catpred}
Suppose we have a categorical predictor or grouping variable $G$
with $g$ levels.  Chiaromonte, Cook and Li (2002) described how such
a categorial predictor could be included in a dimension reduction
regression problem.  The basic idea is to divide the problem into
$g$ regression problems, and define the central subspace to the the
union of the subspaces for the regression problems $(y|x,G=1),
(y|x,G=2),\ldots,(y|x,G=g)$.  Chiaromonte {\em et al.} called this
{\em partial sir}.

In the context of \sir, we can estimate a matrix $\hat{M}_g$ of the
form (\ref{eq:sir}) separately for of the $g$ levels of $G$, combine
them, and then estimate the central subspace from the eigenvectors
corresponding to larger eigenvalues of the combined $\hat{M}$. Using
the ideas in Cook (2004), Shao, Cook and Weisberg (2009) provide marginal and
conditional coordinate hypothesis tests for partial \sir.

Partial \sir\ is fit in \dr\ using by adding an argument
\dcode{group} in the call.  The value of \dcode{group} should be a
one-sided formula. For example, if \dcode{Sex} was a factor with two
levels and \dcode{AgeClass} a factor with three levels, then
\verb+group=~Sex+ would condition on \dcode{Sex}, and
\verb+~Sex:AgeClass+ would condition on all possible non-empty
combinations of \dcode{Sex} and \dcode{AgeClass}.  An additional
argument \dcode{pool} is also available on the call to \dcode{dr}.
If \dcode{pool=TRUE}, then a pooled estimate of the variance of $X$
is used in each of the $g$ groups to compute the standardized
predictors.  The default is \dcode{pool=FALSE}, in which the
variance is estimated separately in each group.  Chiaromonte {\em et
al.} used the former choice to get the marginal dimension tests,
while Shao {\em et al.} uses the latter choice to get the coordinate hypothesis
tests.
<<echo=TRUE>>=
summary(s1 <- update(s0, group=~Sex))
@

\subsection{Accessor functions}\label{sec:access}
Several accessor functions can be applied to \dr\
objects\footnote{If you type a command like \dcode{m1 <-
dr(arguments)}, then \dcode{m1} is an object.} to get additional
information.  These functions apply for \sir, \save, \phd, or \ire.
\begin{verbatim}
dr.x(object)
dr.y(object)
dr.z(object)
dr.wts(object)
\end{verbatim}
These functions return, respectively, the $X$ matrix the response
$Y$, the scaled $Z$ matrix (but not for partial \sir\ or partial
\save), and the weights.
\begin{verbatim}
dr.test(object,numdir)
\end{verbatim}
Returns the marginal dimension tests for all methods for which these
are available.
\begin{verbatim}
dr.coordinate.test(object,hypothesis,d)
\end{verbatim}
Returns the coordinate test for the hypothesis given by the
one-sided formula \dcode{hypothesis}.  If $d$ is set and the method
is either \sir, \ire, or \ire\ with categorical predictors, the test
is conditional on the dimension of the central subspace; otherwise,
the test is marginal.
\begin{verbatim}
drop1(object,d=NULL,update=FALSE)
\end{verbatim}
Calls \dcode{dr.coordinate.test} repeatedly to fit models that drop
each predictor in turn.
\begin{verbatim}
dr.basis(object,numdir=4)
dr.evalues(object)
\end{verbatim}
This function returns the first \dcode{numdir} basis vectors for the
estimated central subspace (orthogonalized using the QR
decomposition if \dcode{orth=TRUE}).  The function
\dcode{dr.evalues} returns the eigenvalues for \sir, \save, and
\phd.
\begin{verbatim}
dr.directions(object, which=c(1,2,3,4), x=dr.x(object))
\end{verbatim}
Let $\hat{B}$ be the columns of the estimated basis for the central
subspace selected by the list \dcode{which}, and $X_1$ be the matrix
specified by the argument \dcode{x}, centered to have mean zero in
each column.  Then this function returns $X_1\hat{B}$.  If \dcode{x}
has its default values, then
$\hat{B}'X_1'\widehat{\Cov(X)}X_1\hat{B}$ is a diagonal matrix, so
the columns of $X_1\hat{B}$ are orthogonal in the inner product
determined by $\widehat{\Cov(X)}$.

These values
are very useful in plotting.
\begin{verbatim}
plot(object,which=1:object$numdir,mark.by.y=FALSE,
     plot.method=pairs,...)
\end{verbatim}
This is the plot method for \dr, which extracts by default the first
\dcode{numdir} direction vectors and the response, and plots them
using the \dcode{plot.method}, which by default is \dcode{pairs}.  Additional arguments are passed to the plotting method.


\section{SAVE}\label{sec:save}
The method \save, Cook and Weisberg (1991), is more comprehensive
than \sir\ as it can estimate vectors that depend on both mean and
variances changes between slices.

As with \sir, we slice the range of $y$ into $h$ slices, but rather
than compute the within-slice mean we compute within-slice
covariance matrices.  If $C_i$ is the weighted within slice sample
covariance matrix in slice $i$ in $z$-scale, then the matrix
$\hat{M}$ is given by
\[
\hat{M} = \frac{1}{n}\sum g_j(I-C_j)^2
\]
where $g_j$ is the sum of the weights in slice $j$; if all weights
are equal, then the $g_j$ are just the number of observations in
each slice. \save\ looks at second moment information and may miss
first-moment information, in particular it may miss linear trends.
<<a2,echo=T>>=
s2 <- update(s0,method="save")
summary(s2)
@
As demonstrated here, the \dcode{update} function can be used with
\dr\ objects. Shao, Cook and Weisberg (2007) have provided marginal
dimension tests based on \save. Two $p$-values are available: if we
assume predictors are normally distributed, then the test is
asymptotically Chi-squared with df given in the column
\dcode{df(Nor)}, and $p$-value in the column \dcode{p.val(Nor)}. The
column \dcode{p.val(Gen)} is based on the weaker assumption that the
predictors are linearly related but not necessarily normal.

As with \sir, marginal coordinate tests can be computed with the
function \dcode{dr.coordinate.test}. The functions \dcode{drop1} and
\dcode{dr.step}, described in Section~\ref{sec:be} can be used as
well.
<<save>>=
drop1(s1,update=FALSE)
@
As with \sir\ categorial predictors can be used with \save; see
Section~\ref{sec:catpred}.
<<echo=TRUE>>=
summary(s3 <- update(s2,group=~Sex))
@

\section{Principal Hessian direction}\label{sec:phd}
Principal Hessian direction methods were originally proposed by Li
(1992).  These methods are much more complicated to use and to
interpret responses than the other methods in the \dr\ package. The
\phd\ methods are unchanged from earlier versions of \dr.

The \phd\ methods are not based on slicing, but rather on the sample
Hessian matrix.
\begin{equation}
\hat{M} = \frac{1}{n}\sum_{i=1}^n w_i f_i z_iz_i' \label{eq2}
\end{equation}
In this equation the $z_i$ are from (\ref{eq:z}), the $w_i$ are the
weights.  Different choices of the $f_i$ correspond to variations of
the \phd\ method.  The central subspace in $z$-scale is estimated by
the eigenvectors corresponding to the larger eigenvalues of
$\hat{M}$ in (\ref{eq2}).  Tests are based on sums of the smaller
eigenvalues of $\hat{M}$.

\subsection{Response based pHd}
Seting $f_i = y_i$ gives response based pHd, obtained in \dr\ using
\dcode{method="phdy"}.  This is the method proposed by Li, and is
the most straightforward to use.  However, Cook (1998, Chapter 12)
shows that tests based on this choice for $f$ proposed by Li are not
correct, and no tests for dimension are available using this method.

\subsection{Residual based pHd}
Cook (1998, Chapter 12) suggests setting $f_i$ to be the residuals
from the ols regression of $y$ on $x$.  This method is called
\dcode{method="phdres"} in \dr.  Cook shows that tests of dimension
based on this choice are correct, but the hypotheses concern a
subspace other than the central subspace for $y|x$; see Cook (1998,
Chapter 12) for details.

Output for \phd\ is again similar to \sir, except for the tests.  Here is
the output for the same setup as before, but for method \phdres:
<<echo=TRUE>>=
summary(s2 <- update(s0,method="phdres"))
@
The eigenvectors and eigenvalues are as for \sir\ and \save. The
test statistics are based on the eigenvalues.  The column of tests
called ``Normal theory" were proposed by Li (1992) and require that
the predictors are normally distributed. These statistics are
asymptotically distributed as Chi-square, with the degrees of
freedom shown.

When the method is \phdres\, additional tests are provided. Since
this method is based on residuals, it gives tests concerning the
central subspace for the regression of the residuals on $X$ rather
than the response on $X$.  The subspace for this residual regression
may be, but need not be, smaller than the subspace for the original
regression. For example, the column marked ``Indep. test" is
essentially a test of $d=0$ versus $d>0$ described by Cook (1998)
for the residual regression.  Should the significance level for this
test be large, we might conclude that the residual regression
subspace is of dimension zero.  From this we have two possible
conclusions: (1) the dimension of the response regression may be 1
if using the residuals removed a linear trend, or (2) the dimension
may be 0 if the residuals did not remove a linear trend.

Similarly, if the significance level for the independence test is small, then
we can conclude that the dimension is at least 1.  It could be one if the
method is picking up a nonlinear trend in the OLS direction, but it will be
2 if the nonlinearity is in some other direction.

The independence test and the final column, also from Cook (1998),
use the same test statistic, but different distributions based on
different assumptions. Significance levels are obtained by comparing
the statistic to the distribution of a random linear combination of
Chi-square statistics, each with one df.  These statistics do not
require normality of the predictors.

\section{Inverse regression estimation}\label{sec:ire}
This discussion of inverse regression estimation is more or less
self contained, and uses somewhat different notation from earlier
sections.
\subsection{Estimation}
All methods fit by \dr\ are {\em location and scale invariant},
and so for the purpose of testing, we can use any linear
transformation of the predictors we like.  In particular if $X$ is
the $n\times p$ matrix of predictors, and $1_n$ is an $n$-vector
of ones, we can compute the QR factorization,
\[
(1_n, X) = (Q_0,Q)\left(\begin{array}{cc}
R_0&R_{01}\\0&R\end{array}\right)\label{eq:qr}
\]
and replace $X$ by $\sqrt{n}Q$ in all computations concerning
objective functions and tests.  For quantities that are coordinate
dependent, like coordinate tests, and subspace bases,
back-transformation to $X$-scale is done as indicated below.  All
formulas given in this article are in the simpler $QR$ scale.

The data we use consists of the matrix $\sqrt{n}Q$ with rows $q_i'$
and the response vector has elements $Y_i$; we have not implemented
a multivariate response version. We assume that $Y$ is discrete with
$h$ distinct values $1,\ldots,h$; if $Y$ is continuous, we
discretize it into these $h$ distinct values by slicing and number
these from 1 to $h$; see Cook and Ni (2005, Section 2.1). \dr\
provides two functions for slicing: \dcode{dr.slices.arc} uses the
same algorithm that is used in Arc, while the default
\dcode{dr.slices} uses an algorithm that is better at handling ties.
The former should be used for comparison to Arc, but the latter is
recommended in general.

If $\delta(Y_i=y)$ is the indicator function equal to one if $Y_i=y$
and zero otherwise, then define
\begin{eqnarray}
n_y &=& \sum_i \delta(Y_i=y) \\
f_y &=& n_y/n \\
\hat{\xi}_y &=& (1/n_y) \sum_i \delta(Y_i=y) q_i\\
\hat{\xi} &=& (\hat{\xi}_1 ,\ldots, \hat{\xi}_h)
\end{eqnarray}
The vector $\hat{\xi}_y$ is the mean of the $q_i$ for which
$\delta(Y_i=y)$, the ``slice mean'', and $\hat{\xi}$ is the $p\times
h$ matrix of slice means.

Under assumptions discussed by Cook and Ni (2005, Section 2.1), the
columns of $\hat{\xi}$ estimate vectors in the central subspace.
Sliced inverse regression, for example, estimates the central
subspace by the span the the eigenvectors of $\hat{\xi}\hat{\xi}'$
corresponding to the $d$ largest eigenvalues. In contrast, inverse
regression estimation will estimate the span of the central subspace
using the columns of the $p\times d$ matrix $B$ determined by
minimizing the following objective function (see Cook and Ni, 2005,
equation 2):
\begin{equation}
F_d(B,C) = \left(\vec(\hat{\xi}\diag(f_y)A_n)-\vec(BC)\right)' V_n
\left(\vec(\hat{\xi}\diag(f_y)A_n)-\vec(BC)\right) \label{eq:crit}
\end{equation}
The matrix $\diag(f_y)A_n$ is required to convert vectors in
$\hat{\xi}$ to a fixed coordinate system; in \dr, following Cook
and Ni (2005), $A_n$ is obtained from the \R\ command
\begin{verbatim}
An <- qr.Q(qr(contr.helmert(h)))
\end{verbatim}
where $h$ is the number of slices. Thus $A_n$ is an $h \times
(h-1)$ matrix whose columns are orthogonal to each other and to
$1_h$. Alternatively, $A_n$ could be defined as any $h \times
(h-1)$ orthogonal matrix with orthogonal columns that spans the
same space; this would simply change coordinates and $C$, but
leave $B$ unchanged.  The matrix $V_n$ is a positive definite
symmetric inner product matrix. The optimal choice for $V_n$ is
the inverse of the asymptotic variance of
$\vec(\hat{\xi}\diag(f_y)A_n))$, and an estimate of this is used
in the \ire\ method in \dr.

Minimization of (\ref{eq:crit}) is done over both $B$ and $C$.  The
minimizing value of $C$ is of no interest because it depends on the
coordinates determined by $A_n$.  The column space of the $p \times
d$ matrix $B$ provides the estimate of the central subspace assuming
it is of dimension $d$.

Computations use an alternating algorithm, which is likely to
converge at only a linear rate, so good starting values can be
very important. Assuming $V_n$ is fixed, if $B$ is fixed, then $C$
can be estimated from ordinary least squares fit of
$V_n^{1/2}\vec(\hat{\xi}A_n)$ on $V_n^{1/2}\vec(I_{h-1} \otimes
B)$. Given the new value of $C$, Cook and Ni (2005, Section 3.3)
describe a method of updating $B$, column by column.  For starting
values in this algorithm, \dr\ sets $B$ equal to the estimated $d$
dimensional space obtained using \sir.

The computational method will be complete if we describe the
computation of $V_n$.  Cook and Ni (2005, Section 3.3), write
\[
V_n^{-1} = (A_n' \otimes I)\Gamma(A_n \otimes I)
\]
A consistent sample version of $\Gamma = \Cov(\vec(q\epsilon'))$ is
required, where $q'$ is a typical row of $Q$, and $\epsilon$ has
typical elements $\epsilon_y = \delta(Y=y) - \E(\delta(Y=y)) -
q'\E(q\delta(Y=y))$, the population residuals from the regression of
$\delta(Y=y)$ on $q$.  The sample version of these residuals are
given by
\[
\hat{\epsilon}_{yi} = \delta(Y_i=y) - f_y - f_yq_i'\hat{\psi}_y
\]
where $\hat{\psi}_y$ depends on the value of the dimension $d$ and
on the estimated central subspace.  We use the following procedure
for setting $\hat{\psi}_y$:
\begin{enumerate}
\item For computations under the assumption $d=0$, set $\hat{\psi}_y = 0$.
\item For computations under the assumption $d>0$,
initially set $\hat{\psi}_y = \hat{\xi}_y$, and obtain an estimator
$B$.
\item If $d>0$ and we set the argument \dcode{steps} to equal a
positive integer, first compute step 2, and then repeat with
$\hat{\psi} = P_B{\hat{\xi}}$.  If \dcode{steps} is greater than
one, repeat the number of times indicated by the argument; one step
seems to be adequate.
\end{enumerate}

At completion of the computations the columns of $B$ provide a
basis for the central subspace, assuming that it is of dimension
$d$, in the coordinates determined by the $QR$ factorization.  We
could replace $B$ by any matrix $TB$ where $T$ is any $p\times p$
{\em restriction} matrix.  Indeed, the function
\dcode{dr.iteration.ire} minimizes the function
\begin{equation}
F_d(B,C,T) = \left(\vec(\hat{\xi}\diag(f_y)A_n)-\vec(TBC)\right)'
V_n \left(\vec(\hat{\xi}\diag(f_y)A_n)-\vec(TBC)\right)
\label{eq:crit1}
\end{equation}
rather than (\ref{eq:crit}), maximizing over $B$ and $C$ with $T$
fixed.

Using this extension, \dr\ uses the algorithm outlined in Cook and Ni
(2005, Section 3.3) to replace $B$ by another basis matrix $\hat{B}$
such that the first column of $\hat{B}$ minimizes (\ref{eq:crit})
for $d=1$, with the minimization over all vectors in the span of
$B$, the second column of $\hat{B}$ is orthogonal to the first
column such that $(\hat{b}_1,\hat{b}_2)$ minimizes (\ref{eq:crit})
for $d=2$ with $\hat{b}_1$ fixed and $\hat{b}_2$ in the span of $B$,
and so on.


To duplicate the results for the Australian Athletes data in Cook
and Ni (2005), the following can be used:
<<one,include=FALSE>>=
(m1 <- dr(LBM~log(Ht)+log(Wt)+log(SSF)+log(RCC)+log(WCC)+log(Ferr)+
           log(Hc)+log(Hg),data=ais,method="ire",nslices=8,numdir=4,
           slice.function=dr.slices.arc,itmax=200,steps=1,eps=1.e-6))
@
The call to the \dr\ function is the same as for the other dimension
reduction methods, except that the argument \dcode{method="ire"}. We
have specified \dcode{nslices=8} to match Cook and Ni, rather than
use the default of $\max(8,p+3)$, and have used
\dcode{dr.slices.arc} to get slices, again to match Cook and Ni. The
remaining three arguments are default values, \dcode{itmax} and
\dcode{eps} controlling the iteration, and \dcode{steps} controlling
the number of re\"estimates of the covariance matrix.

The default printed output for \ire\ is a little different from that
for \save\ and \sir.  Printed only are the large-sample marginal
dimension tests.  Cook and Ni (2005, Theorem 2) show that the
appropriate statistic for these hypothesis is $nF_d(\hat{B},\hat{C})
\sim \chi^2(p-d)(h-d-1)$. These tests are produced whenever \ire\ is
fit.

\subsection{Solution in original coordinates}
In \sir\ and \save, the estimated $d$-dimensional basis for
$\span(B)$ is given by the $d$ eigenvectors corresponding to the $d$
largest eigenvalues.  In \ire, we will get a different basis for
each value of $d$, and so the subspaces need not be nested.  As with
\sir\ and \save, the basis is found by multiplying $R^{-1}\hat{B}$,
where $R$ is the Cholesky factor from the QR decomposition.

Two see the two dimensional solution, use the function
\dcode{dr.basis},
<<two>>=
dr.basis(m1,numdir=2)
@
For this problem, the first two vectors of the three-dimensional
solution are slightly different from the two-dimensional solution:
<<three>>=
dr.basis(m1,3)
@


\subsection{Coordinate hypothesis}
Suppose that $H$ is a {\em hypothesis matrix}; typically, $H$ could
restrict the subspace of interest to be orthogonal to one or more of
the predictors, but other choices are possible as well.  The {\em
marginal predictor hypothesis} has null hypothesis $(I-P_H)S=0$
versus $(I-P_H)S \neq 0$.  Equation (12) of Cook and Ni (2005, eq.
12) provides the appropriate test statistic. (Cook and Ni have a
different definition of $H$.)  As with \sir\ and \save, both
marginal and conditional tests are available

For example, a marginal predictor hypothesis test for dropping a
variable and a conditional test assuming $d=2$ are given by
<<mct>>=
dr.coordinate.test(m1,~.-log(Hg))
dr.coordinate.test(m1,~.-log(Hg),d=2)
@
The general syntax of the second argument is a formula that
corresponds to $H$:  Here, $H$ is obtained from the original
specification, \verb+~.+ by removing \dcode{log(Hg)}.  The large
$p$-value suggests that \dcode{log(Hg)} is likely to be unimportant
if we assume nothing about $d$, but if $d=2$ dropping
\dcode{log(Hg)} is not supportable.

As with \sir\ and \ire, \dcode{drop1} can be used to consider
dropping each predictor in tern, and \dcode{dr.step} can be used for
stepwise selection of predictors.
<<drop1>>=
drop1(m1,update=FALSE)
@

\subsection{Joint predictor test}
One can simultaneously test $d=m$ and $(I-P_HS)=0$ by minimizing
(\ref{eq:crit1}) with the restriction matrix $T=P_H$.  This test is
not available for \sir\ or \save, and is used in the computation of
the conditional coordinate test.  See the documentation for
\dcode{dr.joint.test}.

\subsection{Accessor functions}
See Section~\ref{sec:access}.

\section{Partial inverse regression with a categorical predictor}
Wen and Cook (2007) have extended the \ire\ method to allow for
inclusion of a categorical predictor.
<<pire>>=
m2 <- dr(LBM~log(Ht)+log(Wt)+log(SSF)+log(RCC)+log(WCC)+log(Ferr)+
           log(Hc)+log(Hg),group=~Sex,data=ais,method="ire",nslices=8,
           numdir=4,slice.function=dr.slices.arc,itmax=200,steps=1,
           eps=1.e-6)
@
In this case, we have conditioned on the two levels of
\dcode{Sex}.  The output is identical to that for \ire, except for degrees of
freedom:
<<pire1>>=
m2
@
The argument \dcode{group} requires a one-sided formula, like
\verb|~Sex|, where the right-side variable can be a factor or a
variate. If you have more than one grouping variable, for example
{\tt Sex} and {\tt AgeClass}, then use \verb|~ Sex:AgeClass|.  For
this form, all the variables on the right-side of the formula must
be factors.

Here is the basic idea.  We first replace the $X$ matrix by the
$\sqrt{n}Q$ factor from the $QR$ decomposition. Suppose the
conditioning or grouping variable $G$ has $g$ levels.  Subdivide the
data into $g$ groups, where $Q^{(k)}$ is the part of the $Q$ matrix
for group $k$. $Q^{(k)}$ is not part of a $QR$ decomposition, and so
we replace each $Q^{(k)}$ by its $QR$ decomposition.  This
essentially allows each group to have its own within-group
covariance structure. Obtain slice means $\hat{\xi}^{(k)}$, for
$k=1,\ldots,g$.  If $h_k$ is the number of slices in the $k$-th
group, the columns of the matrix $p \times \sum h_k$ matrix
$(\hat{\xi}^{(1)},\ldots,\hat{\xi}^{(g)})$ all estimate vectors in
the central (mean) subspace.  An estimate of $B$ is obtained by
minimizing the objective function
\begin{equation} G_d(B,C,T) = \sum_{k=1}^g F_d^{(k)}(B,C,T)
\label{eq:critpire} \end{equation} where by $F_d^{(k)}(B,C,T)$ we
mean (\ref{eq:crit1}) evaluated only for the $k$-th group, with
$\hat{\xi} = \hat{\xi}^{(k)}, V_n=V_n^{(k)}$, and $\diag(f_y) =
\diag(f_y^{(k)})$.  Theory similar to that for \ire\ permits test
statistics similar to \ire, and these are implemented in \dr.

Output for partial \ire\ parallels that for \ire.  All the functions
that can be applied to \ire\ objects can also be applied to \pire\
objects.

\section{Auxillary functions}
\subsection{Weights}\label{sec:weights} Weights are generally used
in dimension reduction methods to make the resulting weighted
sample closer to a normal distribution than the original sample.
Cook (1998, Section 8.4) discusses the method that is implemented
here. When weights are present, they are used in centering the
data and computing the covariance matrix, and they are used in
computing the objective matrix $M$ for \phd.
%For consistency with \arc, they are not used in computing $M$
%for \sir\ and \save\ unless the argument \dcode{use.weights} is set to
%\dcode{T}.
Weights may be provided by the user with the \dcode{weights}
argument.  If \dcode{weights=NULL}, the default, no weighting is
used.

The function \dcode{dr.weights} is used to estimate weights using
the algorithm described by Cook (1998, Sec. 8.4).  There are
several other arguments that control how the weights are computed,
as described below, and on the help page for the function
\dcode{dr.weights}.  The algorithm works as follows:
\begin{enumerate}
\item
For an $n\times p$ data matrix $X$, find estimates $m$ and $S$ of
the mean and covariance matrix.  For this purpose, in \R\ the
function \dcode{cov.rob} in the \dcode{MASS} package is used. See
the documentation for \dcode{cov.rob} for a description of these
additional parameters. All the defaults are sensible.
\item
Compute the matrix $Z = (X - 1m')S^{-1/2}$.  If the data were
normally distributed $\N(m,S)$, the rows of $Z$ would be like a
sample from $\N(0,I)$.
\item
Obtain a random vector $b$ from the $\N(0,\sigma^2I)$ distribution.
The parameter \dcode{sigma=1} is a tuning parameter that can be set
in the call to \dr, and values near 1 or slightly smaller seem
appropriate.  Find the row of $Z$ that is closest to $b$ (the code
uses Euclidean distance), and increase a counter for that row by 1.
\item
The argument \dcode{nsamples} determines the number of times this
last step is repeated; the default is
\dcode{nsamples=10*dim(x)[1]} where $X$ is the $n\times p$ data
matrix; this number may be too small.
\item
Return a vector of weights given by the value of the counter
divided by \dcode{nsamples} and multiplied by $n$, so the sum of
the weights will be $n$.
\end{enumerate}

An example of the use of weights is:
<<echo=TRUE>>=
wts <- dr.weights(LBM~Ht+Wt+RCC+WCC,data=ais)
i1 <- dr(LBM~Ht+Wt+RCC+WCC,weights=wts,method="phdres",data=ais)
@
\subsection{Slices}\label{sec:slice}
\dr\ provides two functions for producing slices,
\dcode{dr.slices}, the recommended default, and
\dcode{dr.slices.arc}, which is the older routine used before
version 3.0.0.  The newer version does a better job of handling
ties.  Both functions have the same calling sequence.  For
example,
<<echo=TRUE>>=
y1 <- c(1,1,1,2,3,4,5,6,7,8,8,8)
dr.slices(y1,3)
@
produces a structure with three arguments, giving the indices of
the observations in each slice, the number of slices and the slice
sizes.

If the response has $k$ columns, then the argument \dcode{nslices}
should be a vector of $k$ elements.  If it is specified as a
number rather than a vector, then that number will give the total
number of cells, approximately. For example, if $k=2$ and
\dcode{nslices=8}, the program will slice $\sqrt{8} \approx 3$
slices along each of the two response variables for a total of
$3\times 3=9$ two-dimensional slices.
<<echo=TRUE>>=
y2 <- c(1,2,3,4,1,2,3,4,1,2,3,4)
dr.slices(cbind(y1,y2),5)
@
This produced 6 slices that are not all exactly the same size.

There are two optional arguments to the \dr\ function that concern
slicing.  The argument \dcode{slice.function} has values either
\dcode{dr.slices}, the default, or \dcode{dr.slices.arc}.  The
argument \dcode{nslices} to \dr\ is passed to the slice function.
The default for \dcode{nslices} is the minimum of 8 and $p+3$.

\subsection{Permutation tests}\label{sec:perm} Cook (1998) and Yin
in his unpublished 2000 Ph.D. dissertation discuss using permutation
tests to get significance levels for the marginal dimension test.
These are implemented in the function \dcode{dr.permutation.test}.
Typical use of this function is
<<echo=TRUE>>=
dr.permutation.test(s0,npermute=99,numdir=4)
@
The number of permutations defaults to 50 and the number of
directions defaults to 4.  Increasing either can increase the
computation time required to obtain the solution. The permutation
test results for the example are very similar to the asymptotic
results given earlier.  The permutation tests are not implemented
for \ire.

\section{Acknowledgements}
Cindy Yu, Jorge de la Vega worked on earlier versions of the
package. Yongwu Shao wrote most of the new dimension tests for \sir\
and \save.  The code for \ire\ is adapted freely from Liqiang Ni's
program written for Arc.  The code for \pire\ is essentially a
modification of Ni's program.  Kurt Hornik was very helpful in
getting the package to pass all the tests in \R.

\begin{references}

\item Bentler, PM and Xie, J (2000).  Corrections to test statistics
in principal Hessian directions.  {\em Statistics and Probability
Letters}, 47, 381--389.

\item Chiaromonte, F., Cook, R. D. and Li, B. (2002).  Sufficient
dimension reduction in regressions with categorical predictors.
{\em Annals of Statistics}, 30, 475--97.

\item Cook, RD (1998). {\em Regression Graphics}.  New York:  Wiley.

\item Cook, RD (2004).  Testing predictor contributions in
sufficient dimension reduction.  {\em Annals of Statistics}, 32,
1062-1092.

\item Bura E and Cook, RD (2001).  Extending sliced inverse regression:
The weighted chi-squared test. {\em J. Amer. Statist. Assoc.} 96
996--1003.

\item Cook, RD, and Li, B (2002).  Dimension reduction for conditional
mean in regression.  {\em Annals of Statistics}, 30, 455-474.

\item Li, K C. (1991).  Sliced inverse regression for dimension reduction
(with discussion), \jasa, 86, 316-342.

\item Li, K C. (1992).  On principal Hessian directions for data
visualization and dimension reduction:  Another application of Stein's
lemma.  \jasa, 87, 1025--1034.

\item Cook, RD and Ni, L (2005).  Sufficient dimension
reduction via inverse regression:  A minimum discrepancy approach.
\jasa, 100, 410--428.

\item Shao, Y, Cook, RD and Weisberg, S (2007).  Marginal Tests
with Sliced Average Variance Estimation.   {\em Biometrika}, 94,
285--296.

\item Shao, Yongwu, Cook, R. D. and Weisberg, S. (2009, to appear, available on line). Partial
central subspace and sliced average variance estimation. {\em
Journal of Statistical Planning and Inference}.

\item Weisberg, S (2002).  Dimension Reduction Regression in \R.
{\em Journal of Statistical Software}, 2002, Volume 7, available
from \url{http://www.jstatsoft.org}.

\item Wood ATA (1989) An $F$ approximation to the distribution of a
linear combination of chi-squared variables. {\em Communications in
Statistics: Simulation and Computation}, 18, 1439-1456.

\item Wen, X and Cook, RD (2007).  Optimal Sufficient Dimension
Reduction in Regressions with Categorical Predictors, {\em Journal
of Statistical Inference and Planning}, 137, 1961--79.
\end{references}

\end{document}