% $Id: identification.Rnw 1796 2015-11-12 13:10:20Z sgaure $
\documentclass{amsproc}
%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{Multicollinearity, identification, and estimable functions}
%\VignetteKeyword{Multiple Fixed Effects}

\usepackage[utf8]{inputenc}

\newcommand{\strong}[1]{{\normalfont\fontseries{b}\selectfont #1}}
\let\pkg=\strong
\newcommand\code{\bgroup\@codex}
\def\@codex#1{{\normalfont\ttfamily\hyphenchar\font=-1 #1}\egroup}

\newcommand{\bma}{\begin{bmatrix}}
\newcommand{\ema}{\end{bmatrix}}


\title{Multicollinearity, identification, and estimable functions}

\author{Simen Gaure}
\address{Ragnar Frisch Centre for Economic Research, Oslo, Norway}
\date{March 26, 2013}

\begin{document}
\begin{abstract}
  Since there is quite a lot of confusion here and there about what
  happens when factors are collinear; here is a walkthrough of the
  identification problems which may arise in models with many dummies,
  and how \pkg{lfe} handles them. (Or, at the very least, attempts to
  handle them).
\end{abstract}
\maketitle

\section{Context}
The \pkg{lfe} package is used for ordinary least squares estimation, i.e. models
which conceptually may be estimated by \code{lm} as

<<eval=FALSE>>=
lm(y ~ x1 + x2 + ... + xm + f1 + f2 + ... + fn)
@ 

where \code{f1,f2,...,fn} are factors.
The standard method is to introduce a dummy variable for each level of each factor.
This is too much as it introduces multicollinearities in the system.
Conceptually, the system may still be solved, but there are many different solutions.
In all of them, the difference between the coefficients for each factor will be the same.

The ambiguity
is typically solved by removing a single dummy variable for each factor, this is 
termed a reference. This is like forcing the coefficient for this dummy variable to zero,
and the other levels are then seen as relative to this zero.  Other ways to solve the
problem is to force the sum of the coefficients to be zero, or one may enforce
some other constraint, typically via the \code{contrasts} argument to \code{lm}.
The default in \code{lm} is to have a reference level in each factor, and a common
intercept term.

In \pkg{lfe} the same estimation can be performed by
<< eval=FALSE>>=
felm(y ~ x1 + x2 + ... + xm | f1 + f2 + ... + fn)
@ 

Since \code{felm} conceptually does exactly the same as \code{lm}, the contrasts approach may work there
too. Or rather, it is actually not necessary that \code{felm} handles it at all, it is only necessary
if one needs to fetch the coefficients for the factor levels with \code{getfe}.

\pkg{lfe} is intended for very large datasets, with factors with many levels.
Then the approach with a single constraint for each factor may sometimes not be sufficient.
The standard example in the econometrics literature (see e.g.\ \cite{akm99}) is the case with two factors, one for individuals,
and one for firms these individuals work for, changing jobs now and then.  What happens in practice
is that the labour market may be disconnected, so that one set of individuals move between
one set of firms, and another (disjoint) set of individuals move between some other firms.
This happens for no obvious reason, and is data dependent, not intrinsic to the model.
There may be several such components.  I.e. there are more multicollinearities in the system
than the obvious ones.
In such a case, there is no way to compare coefficients from different connected components,
it is not sufficient with a single individual reference.  The problem may be phrased in
graph theoretic terms (see e.g.\ \cite{ack02,EH74,GS13}), and it can be shown that it is sufficient with one reference level in each
of the connected components.  This is what \pkg{lfe} does, in the case with two factors
it identifies these components, and force one level to zero in one of the factors.

In the examples below, rather small randomly generated datasets are
used. \pkg{lfe} is hardly the best solution for these problems, they
are solely used to illustrate some concepts.  I can assure the reader that no CPUs, sleeping
patterns, romantic relationships, trees or cats, nor animals in general, were harmed during
data collection and analysis. 

\section{Identification with two factors}
In the case with two factors, identification
is well-known. \code{getfe} will partition the dataset into connected components, and
introduce a reference level in each component:
<<echo=FALSE>>=
cores <- as.integer(Sys.getenv("SG_RUN"))
if (is.na(cores)) options(lfe.threads = 1)
@ 
<<>>=
library(lfe)
set.seed(42)
x1 <- rnorm(20)
f1 <- sample(8, length(x1), replace = TRUE) / 10
f2 <- sample(8, length(x1), replace = TRUE) / 10
e1 <- sin(f1) + 0.02 * f2^2 + rnorm(length(x1))
y <- 2.5 * x1 + (e1 - mean(e1))
summary(est <- felm(y ~ x1 | f1 + f2))
@ 

We examine the estimable function produced by \code{efactory}.
<<>>=
ef <- efactory(est)
is.estimable(ef, est$fe)
getfe(est)
@ 

As we can see from the \code{comp} entry, there are two components, 
the second one with \code{f1=0.1}, \code{f1=0.3}, \code{f1=0.5}, and
\code{f2=0.5} and \code{f2=0.7}. 
A reference is introduced in each of the components, i.e. \code{f2.0.2=0} and \code{f2.0.5=0}.
If we look at the dataset, the component structure becomes clearer:
<<>>=
data.frame(f1, f2, comp = est$cfactor)
@ 

Observations 1, 10, 12, 17, and 20 belong to component 2; no other
observation has \code{f1 \%in\% c(0.1,0.3,0.5)} or
\code{f2 \%in\% c(c0.5,0.7)}, thus it is clear that coefficients for these can not be
compared to other coefficients.  \code{lm} is silent about this
component structure, hence coefficients are hard to interpret. Though,
predictive properties and residuals are the same:
<<>>=
f1 <- factor(f1)
f2 <- factor(f2)
summary(lm(y ~ x1 + f1 + f2))
@ 
\section{Identification with three or more factors}
In the case with three or more factors, there is no general intuitive theory (yet) for handling identification problems.
\pkg{lfe} resorts to the simple-minded approach that non-obvious multicollinearities
arise among the first two factors, and assumes it is sufficient with a single reference level
for each of the remaining factors, i.e. that they in principle could be specified as
ordinary dummies.  In other words, the order of the factors in the model
specification is important.  A typical example would be 3 factors; individuals, firms and education:

<<eval=FALSE>>=
est <- felm(logwage ~ x1 + x2 | id + firm + edu)
getfe(est)
@ 

This will result in the same number of references as if using the model
<<eval=FALSE>>=
logwage ~ x1 + x2 + edu | id + firm
@ 
though it may run faster (or slower).

Alternatively, one could specify the model as
<<eval=FALSE>>=
logwage ~ x1 + x2 | firm + edu + id
@ 

This would not account for a partitioning of the labour market along individual/firm, but
along firm/education, using a single reference level for the individuals.  
In this example, there is some reason to suspect that it is
not sufficient, depending on how \code{edu} is specified. 
There exists no general scheme that sets up suitable reference groups
when there are more than two factors.  It may happen that the default is sufficient.
The function \code{getfe} will check whether this is so, and it will yield a warning
about 'non-estimable function' if not.  With some luck it may be possible to rearrange the
order of the factors to avoid this situation.

There is nothing special with \pkg{lfe} in this respect. You will meet
the same problem with \code{lm}, it will remove a reference level (or
dummy-variable) in each factor, but the system will still contain
multicollinearities.  You may remove reference levels until all
the multicollinearities are gone, but there is no obvious way to
interpret the resulting coefficients.

To illustrate, the classical example is when you include a factor for age (in years), 
a factor for observation year, and a factor for year of birth.  You pick a reference individual,
e.g. \code{age=50}, \code{year=2013} and \code{birth=1963}, but this is not sufficient to remove
all the multicollinearities. If you analyze this problem (see e.g. \cite{Kup83}) you will find that the coefficients
are only identified up to linear trends. You may force the linear trend between \code{birth=1963} and
\code{birth=1990} to zero, by removing the reference level \code{birth=1990}, and the
system will be free of multicollinearities. In this case the \code{birth} coefficients have the
interpretation as being deviations from a linear trend between 1963 and 1990, though you do not know which linear trend.
The \code{age} and \code{year} coefficients are also relative to this same unknown trend.

In the above case, the multicollinearity is obviously built into the model, and it is
possible to remove it and find some intuitive interpretation of the coefficients. In
the general case, when either \code{lm} or \code{getfe} reports a
handful of non-obvious spurious multicollinearites between factors with many levels, you
probably will not be able to find any reasonable way to interpret coefficients.  Of
course, certain linear combinations of coefficients will be unique, 
i.e. estimable, and these may be found by e.g. the procedures in
\cite{GG01,WW64}, but the general picture is muddy.

\pkg{lfe} does not provide a solution to this problem, however, \code{getfe} will still
provide a vector of coefficients which results from finding a non-unique solution to a certain
set of equations.  To get any sense from this, an estimable function must be applied. The simplest
one is to pick a reference for each factor and subtract this coefficient from each of the other coefficients
in the same factor, and add it to a common intercept, however in the case this does not result in
an estimable function, you are out of luck.
If you for some reason believe that you know of an estimable function, you may
provide this to \code{getfe} via the \code{ef}-argument. There is an example in the
\code{getfe} documentation.  You may also test it for estimability with the function 
\code{is.estimable}, this is a probabilistic test which almost never fails (see \cite[Remark 6.2]{GS13}).

\section{Specifying an estimable function}
A model of the type
<<eval=FALSE>>=
y ~ x1 + x2 + f1 + f2 + f3
@ 
may be written in matrix notation as
\begin{equation}\label{matmod}
 y = X\beta + D\alpha + \epsilon,
\end{equation}
where \(X\) is a matrix with columns \code{x1} and \code{x2} and \(D\)
is matrix of dummies constructed from the levels of the factors
\code{f1,f2,f3}.  Formally, an estimable function in our context is a
matrix operator whose row space is contained in the row space of
\(D\).  That is, an estimable function may be written as a matrix.
Like the \code{contrasts} argument to \code{lm}. However, the
\pkg{lfe} package uses an R-function instead.  That is, \code{felm} is
called first, it uses the Frisch-Waugh-Lovell theorem to project out
the \(D\alpha\) term from \eqref{matmod} (see \cite[Remark 3.2]{GS13}):
<<eval=FALSE>>=
est <- felm(y ~ x1 + x2 | f1 + f2 + f3)
@ 

This yields the parameters for \code{x1} and \code{x2}, i.e. \(\hat\beta\).
To find \(\hat\alpha\), the parameters for the levels of \code{f1,f2,f3}, \code{getfe} solves a certain 
linear system (see \cite[eq. (14)]{GS13}):
\begin{equation}\label{alphaeq}
D\gamma = \rho
\end{equation}
where the vector \(\rho\) can be computed when we have \(\hat\beta\).
This does not identify \(\gamma\) uniquely, we have to apply an estimable
function to \(\gamma\).  The estimable function \(F\) is characterized by the
property that \(F\gamma_1 = F\gamma_2\) whenever \(\gamma_1\) and \(\gamma_2\) are
solutions to equation~\eqref{alphaeq}.  Rather than coding \(F\) as a matrix, \pkg{lfe}
codes it as a function. It is of course possible to let the function apply a matrix, so this
is not a material distinction.
So, let's look at an example of how an estimable function may be made:
<<>>=
library(lfe)
x1 <- rnorm(100)
f1 <- sample(7, 100, replace = TRUE)
f2 <- sample(8, 100, replace = TRUE) / 8
f3 <- sample(10, 100, replace = TRUE) / 10
e1 <- sin(f1) + 0.02 * f2^2 + 0.17 * f3^3 + rnorm(100)
y <- 2.5 * x1 + (e1 - mean(e1))
summary(est <- felm(y ~ x1 | f1 + f2 + f3))
@ 

In this case, with 3 factors we can not be certain that it is sufficient with a single
reference in two of the factors, but we try it as an exercise. (\pkg{lfe} does not include an intercept, it is subsumed
in one of the factors, so it should tentatively be sufficient with a reference for the two
others).

The input to our estimable function is a solution \(\gamma\) of equation~\eqref{alphaeq}. The
argument \code{addnames} is a logical, set to \code{TRUE} when the function should add names to
the resulting vector.  The coefficients is ordered the same way as the levels in the factors.
We should pick a single reference in factors \code{f2,f3}, subtract these, and add the sum
to the first factor:
<<>>=
ef <- function(gamma, addnames) {
  ref2 <- gamma[[8]]
  ref3 <- gamma[[16]]
  gamma[1:7] <- gamma[1:7] + ref2 + ref3
  gamma[8:15] <- gamma[8:15] - ref2
  gamma[16:25] <- gamma[16:25] - ref3
  if (addnames) {
    names(gamma) <- c(
      paste("f1", 1:7, sep = "."),
      paste("f2", 1:8, sep = "."),
      paste("f3", 1:10, sep = ".")
    )
  }
  gamma
}
is.estimable(ef, fe = est$fe)
getfe(est, ef = ef)
@ 

We may compare this to the default estimable function, which picks a reference in each connected
component as defined by the two first factors.
<<>>=
getfe(est)
@ 
We see that the default has some more information.  It uses the level names, and some more information, added like
this:
<<>>=
efactory(est)
@ 
I.e. when asked to provide level names, it is also possible to add additional
information as a \code{list} (or \code{data.frame}) as an
attribute \code{'extra'}. The vectors \code{extrarefs,refsubs,refsuba} etc.\ are precomputed
by \code{efactory} for speed efficiency.

Here is the above example, but we create an intercept instead, and don't report the zero-coefficients, so that
it closely resembles the output from \code{lm}

<<>>=
f1 <- factor(f1)
f2 <- factor(f2)
f3 <- factor(f3)
ef <- function(gamma, addnames) {
  ref1 <- gamma[[1]]
  ref2 <- gamma[[8]]
  ref3 <- gamma[[16]]
  # put the intercept in the first coordinate
  gamma[[1]] <- ref1 + ref2 + ref3
  gamma[2:7] <- gamma[2:7] - ref1
  gamma[8:14] <- gamma[9:15] - ref2
  gamma[15:23] <- gamma[17:25] - ref3
  length(gamma) <- 23
  if (addnames) {
    names(gamma) <- c(
      "(Intercept)", paste("f1", levels(f1)[2:7], sep = ""),
      paste("f2", levels(f2)[2:8], sep = ""),
      paste("f3", levels(f3)[2:10], sep = "")
    )
  }
  gamma
}
getfe(est, ef = ef, bN = 1000, se = TRUE)
# compare with lm
summary(lm(y ~ x1 + f1 + f2 + f3))
@ 


\section{Non-estimability}
We consider another example. To ensure spurious relations there are
almost as many factor levels as there are observations, and it will be
hard to find enough estimable function to interpret all the
coefficients.  The coefficient for \code{x1} is still estimated, but
with a large standard error.  Note that this is an illustration of non-obvious
non-estimability which may occur in much larger datasets, the author
does not endorse using this kind of model for the kind of data you find below.

<<>>=
set.seed(55)
x1 <- rnorm(25)
f1 <- sample(9, length(x1), replace = TRUE)
f2 <- sample(8, length(x1), replace = TRUE)
f3 <- sample(8, length(x1), replace = TRUE)
e1 <- sin(f1) + 0.02 * f2^2 + 0.17 * f3^3 + rnorm(length(x1))
y <- 2.5 * x1 + (e1 - mean(e1))
summary(est <- felm(y ~ x1 | f1 + f2 + f3))
@ 

The default estimable function fails, and the coefficients
from \code{getfe} are not useable.  \code{getfe} yields a warning in
this case.
<<>>=
ef <- efactory(est)
is.estimable(ef, est$fe)
@ 

Indeed, the rank-deficiency is larger than expected. There are more spurious relations between
the factors than what can be accounted for by looking at components in the two first factors. 
In this low-dimensional example we
may find the matrix \(D\) of equation~\eqref{alphaeq}, and its (column) rank deficiency
is larger than 2.
<<>>=
f1 <- factor(f1)
f2 <- factor(f2)
f3 <- factor(f3)
D <- makeDmatrix(list(f1, f2, f3))
dim(D)
ncol(D) - as.integer(rankMatrix(D))
@ 
Alternatively we can use an internal function
in lfe for finding the rank deficiency directly.
<<>>=
lfe:::rankDefic(list(f1, f2, f3))
@ 

This rank-deficiency also has an impact on the standard errors
computed by \code{felm}.  If the rank-deficiency is small relative to
the degrees of freedom the standard errors are scaled slightly upwards
if we ignore the rank deficiency, but if it is large,
the impact on the standard errors can be substantial.  The
above mentioned rank-computation procedure can be activated by specifying
\code{exactDOF=TRUE} in the call to \code{felm}, but it may be
time-consuming if the factors have many levels. Computing the rank
does not in itself help us find estimable functions for \code{getfe}.
<<>>=
summary(est <- felm(y ~ x1 | f1 + f2 + f3, exactDOF = TRUE))
@ 

We can get an idea what happens if we keep the dummies for \code{f3}. In this case, with
2 factors, \pkg{lfe} will partition the dataset into connected components and account
for all the multicollinearities among the factors \code{f1} and \code{f2} just as above,
but this is not sufficient. The interpretation of the
resulting coefficients is not straightforward.
<<>>=
summary(est <- felm(y ~ x1 + f3 | f1 + f2, exactDOF = TRUE))
getfe(est)
@ 

In this particular example, we may use a different order of the factors, and we see
that by partitioning the dataset on the factors \code{f1,f3} instead of \code{f1,f2},
there are 2 connected components (the factor \code{f2} gets its own \code{comp}-code,
but this is not a graph theoretic component number, it merely indicates that there is
a separate reference among these).
<<>>=
summary(est <- felm(y ~ x1 | f1 + f3 + f2, exactDOF = TRUE))
is.estimable(efactory(est), est$fe)
getfe(est)
@ 

Below is the same estimation in \code{lm}. We see that the coefficient
for \code{x1} is identical to the one from \code{felm}, but there is no obvious
relation between e.g. the coefficients for \code{f1}; the difference \code{f14-f15} is
not the same for \code{lm} and \code{felm}.  Since these are in different components, they
are not comparable.
But of course, if we compare in the same component, e.g. \code{f16-f17} or take
a combination which actually occurs in the dataset, it is unique (estimable):
<<>>=
data.frame(f1, f2, f3)[1, ]
@ 
I.e.\ if we add the coefficients \code{f1.2 + f2.6 + f3.3}
and include the intercept for \code{lm}, we will get the same number
for both \code{lm} and \code{felm}. That is, for predicting the actual dataset, 
estimability plays no role, we obtain the same residuals anyway. 
It is only for predicting outside of the dataset estimability
is important.
<<>>=
summary(est <- lm(y ~ x1 + f1 + f2 + f3))
@ 

\section{Weeks-Williams partitions}
There is a partial solution to the non-estimability problem in
\cite{WW64}.  Their idea is to partition the dataset into components
in which all differences between factor levels are estimable.  The
components are connected components of a subgraph of an
\(e\)-dimensional grid graph where \(e\) is the number of factors.
That is, a graph is constructed with the observations as vertices, two
observations are adjacent (in a graph theoretic sense) if they differ
in at most one of the factors.  The dataset is then partitioned
into (graph theoretic) connected components.  It's a finer
partitioning than the above, and consequently introduces more reference levels than
is necessary for identification. I.e. it does not find all estimable
functions, but in some cases (e.g. in \cite{TPAG13}) the largest
component will be sufficiently large for proper analysis.  It is of
course always a question whether such an endogenous selection of
observations will yield a dataset which results in unbiased
coefficients.  This partitioning can be done by the
\code{compfactor} function with argument \code{WW=TRUE}:
<<>>=
fe <- list(f1, f2, f3)
wwcomp <- compfactor(fe, WW = TRUE)
@ 
It has more levels than the rank deficiency
<<>>=
lfe:::rankDefic(fe)
nlevels(wwcomp)
@ 
and each of its components are contained in 
a component of the previously considered components,
no matter which two factors we consider.
For the case of two factors, the concepts coincide.
<<>>=
nlevels(interaction(compfactor(fe), wwcomp))
# pick the largest component:
wwdata <- data.frame(y, x1, f1, f2, f3)[wwcomp == 1, ]
print(wwdata)
@ 
That is, we can start in one of the observations and travel through all of them
by changing just one of \code{f1,f2,f3} at a time.
Though, in this particular example, there are more parameters than there are observations,
so an estimation would not be feasible.

\code{efactory} cannot easily be modified to produce an estimable
function corresponding to WW components.  The reason is that
\code{efactory}, and the logic in \code{getfe}, work on partitions of
factor levels, not on partitions of the dataset, these are the same
for the two-factor case.

WW partitions have the property that if
you pick any two of the factors and partition a WW-component into the
previously mentioned non-WW partitions, there will be only one
component, hence you may use any of the estimable functions from \code{efactory}
on each partition.  That is, a way to use WW partitions with \pkg{lfe} is
to do the whole analysis on the largest WW-component.  \code{felm}
may still be used on the whole dataset, and it may yield different results
than what you get by analysing the largest WW-component.

Here is a larger example:
<<>>=
set.seed(135)
x <- rnorm(10000)
f1 <- sample(1000, length(x), replace = TRUE)
f2 <- (f1 + sample(18, length(x), replace = TRUE)) %% 500
f3 <- (f2 + sample(9, length(x), replace = TRUE)) %% 500
y <- x + 1e-4 * f1 + sin(f2^2) +
  cos(f3)^3 + 0.5 * rnorm(length(x))
dataset <- data.frame(y, x, f1, f2, f3)
summary(est <- felm(y ~ x | f1 + f2 + f3,
  data = dataset, exactDOF = TRUE
))
@ 

We count the number of connected components in \code{f1,f2}, and see
that this is sufficient to ensure estimability
<<>>=
nlevels(est$cfactor)
is.estimable(efactory(est), est$fe)
nrow(alpha <- getfe(est))
@ 
It has rank deficiency one less than the number of factors :
<<>>=
lfe:::rankDefic(est$fe)
@ 

Then we analyse the largest WW-component 
<<>>=
wwcomp <- compfactor(est$fe, WW = TRUE)
nlevels(wwcomp)
wwset <- wwcomp == 1
sum(wwset)
summary(wwest <- felm(y ~ x | f1 + f2 + f3,
  data = dataset, subset = wwset, exactDOF = TRUE
))
@ 
We see that we get the same coefficient for \code{x} in this case.
This is not surprising, there is no obvious reason to believe that our
selection of observations is skewed in this randomly created dataset.

This one has the same rank deficiency:
<<>>=
lfe:::rankDefic(wwest$fe)
@ 
but a smaller number of identifiable coefficients.
<<>>=
nrow(wwalpha <- getfe(wwest))
@ 

We may compare effects which are common to the two methods:
<<>>=
head(wwalpha)
alpha[c(35, 38, 40:43), ]
@ 
but there is no obvious relation between e.g. \code{f1.35 - f1.36}, they are very different
in the two estimations. The coefficients are from different datasets, and the
standard errors are large  (\(\approx 0.7\)) with this few observations for each factor level.
The number of identified coefficients for each factor varies (these figures contain the two references):
<<>>=
table(wwalpha[, "fe"])
@ 
\iffalse
\bibliographystyle{amsplain}
\bibliography{biblio}
\else
\providecommand{\bysame}{\leavevmode\hbox to3em{\hrulefill}\thinspace}
\providecommand{\MR}{\relax\ifhmode\unskip\space\fi MR }
% \MRhref is called by the amsart/book/proc definition of \MR.
\providecommand{\MRhref}[2]{%
  \href{http://www.ams.org/mathscinet-getitem?mr=#1}{#2}
}
\providecommand{\href}[2]{#2}
\begin{thebibliography}{1}

\bibitem{ack02}
J.~M. Abowd, R.~H. Creecy, and F.~Kramarz, \emph{{Computing Person and Firm
  Effects Using Linked Longitudinal Employer-Employee Data}}, Tech. Report
  TP-2002-06, U.S. Census Bureau, 2002.

\bibitem{akm99}
J.~M. Abowd, F.~Kramarz, and D.~N. Margolis, \emph{{High Wage Workers and High
  Wage Firms}}, Econometrica \textbf{67} (1999), no.~2, 251--333.

\bibitem{EH74}
J.~A. Eccleston and A.~Hedayat, \emph{{On the Theory of Connected Designs:
  Characterization and Optimality}}, Annals of Statistics \textbf{2} (1974),
  1238--1255.

\bibitem{GS13}
S.~Gaure, \emph{{OLS with Multiple High Dimensional Category Variables}},
  Computational Statistics and Data Analysis \textbf{66} (2013), 8--18.

\bibitem{GG01}
J.~D. Godolphin and E.~J. Godolphin, \emph{On the connectivity of row-column
  designs}, Util. Math. \textbf{60} (2001), 51--65.

\bibitem{Kup83}
L.L. Kupper, J.M. Janis, I.A. Salama, C.N. Yoshizawa, and B.G. Greenberg,
  \emph{{Age-Period-Cohort Analysis: An Illustration of the Problems in
  Assessing Interaction in One Observation Per Cell Data}}, Commun.
  Statist.-Theor. Meth. \textbf{12} (1983), no.~23, {2779--2807}.

\bibitem{TPAG13}
S.M. Torres, P.~Portugal, J.T. Addison, and P.~Guimar{\~a}es, \emph{{The
  Sources of Wage Variation: A Three-Way High-Dimensional Fixed Effects
  Regression Model.}}, IZA Discussion Paper 7276, Institute for the Study of
  Labor (IZA), March 2013.

\bibitem{WW64}
D.L. Weeks and D.R. Williams, \emph{{A Note on the Determination of
  Connectedness in an N-Way Cross Classification}}, Technometrics \textbf{6}
  (1964), no.~3, 319--324.

\end{thebibliography}


\fi

\end{document}