\documentclass{article}
\usepackage{url}
\usepackage{Sweave}
\author{A. Arcos, M. Rueda, M. G. Ranalli and D. Molina}
\title{Splitting and formatting data in a dual frame context}

% \VignetteIndexEntry{Splitting and formatting data in a dual frame context}
\begin{document}
\SweaveOpts{concordance=TRUE}
\maketitle

\tableofcontents

\section{Data description}

To illustrate how to split and format a file including the information collected from a dual frame survey we will use data set $Dat$ (included in the package). This data set includes some of the variables collected in a real dual frame opinion survey about immigration. This survey was conducted using telephone interviews using two sampling frames: one for landlines and another one for cell phones.  From the landline frame, a stratified sample of size 1919 was drawn, while from the cell phone frame, a sample of size 483 was drawn using simple random sampling without replacement. Variables includes in the data set are: \texttt{Drawnby}, which takes value 1 if the unit comes from the landline sample and value 2 if it comes from the cell phone sample; \texttt{Stratum}, which indicates the stratum each unit belongs to (for individuals in cell phone frame, value of this variable is \texttt{NA}); \texttt{Opinion} the response to the opinion question with value 1 representing a favorable opinion about immigration and value 0 representing a unfavorable opinion about immigration; \texttt{Landline} and \texttt{Cell}, which record whether the unit possess a landline or a cell phone, respectively. First order inclusion probabilities are also included in the data set.

Let see the first three rows of the data set:

<<>>=
library (Frames2)
data(Dat)

head(Dat, 3)
@

\section{Formatting data}

From the data of this survey we wish to estimate the number of people with a favorable opinion regarding immigration. In order to use functions of Frames2, we need to split this dataset. The variables we will use to do this are \texttt{Drawnby} and \texttt{Landline} and \texttt{Cell}. First step is to split the original data set in four new different data sets, each one corresponding to one domain.

<<>>=
attach(Dat)

DomainOnlyLandline <- Dat[Landline == 1 & Cell == 0,]
DomainBothLandline <- Dat[Drawnby == 1 & Landline == 1 & 
                            Cell == 1,]
DomainOnlyCell <- Dat[Landline == 0 & Cell == 1,]
DomainBothCell <- Dat[Drawnby == 2 & Landline == 1 & 
                        Cell == 1,]
@ 

Then, from the domain datasets, we can easily build frame datasets

<<>>=
FrameLandline <- rbind(DomainOnlyLandline, DomainBothLandline)
FrameCell <- rbind(DomainOnlyCell, DomainBothCell)
@

Finally, we only need to label domain of each unit using "a", "b", "ab" or "ba"

<<>>=
Domain <- c(rep("a", nrow(DomainOnlyLandline)), rep("ab", 
            nrow(DomainBothLandline)))
FrameLandline <- cbind(FrameLandline, Domain)

Domain <- c(rep("b", nrow(DomainOnlyCell)), rep("ba", 
            nrow(DomainBothCell)))
FrameCell <- cbind(FrameCell, Domain)
@

Now  dual frame estimators, as Hartley (1962, 1974) estimator, can be computed:

<<>>=
Hartley(FrameLandline$Opinion, FrameCell$Opinion, 
        FrameLandline$ProbLandline, FrameCell$ProbCell, 
        FrameLandline$Domain, FrameCell$Domain)
@ 

\begin{thebibliography}{99}

\bibitem{Arcos2015} Arcos, A., Molina, D., Rueda, M. and Ranalli, M. G. (2015). \emph{Frames2: A Package for Estimation in Dual Frame Surveys}. The R Journal, 7(1), 52 - 72.

\bibitem{Hartley1962} Hartley, H.O. (1962). \emph{Multiple Frame Surveys}. Proceedings of the American Statistical Association, Social Statistics Sections, 203 - 206.

\bibitem{Hartley1974} Hartley, H.O. (1974). \emph{Multiple frame methodology and selected applications}. Sankhya C., Vol. 36, 99 - 118.

\end{thebibliography}

\end{document}