---
title: "Working with multiple files"
author: "Tristan Mahr"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Working with multiple files}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r, echo = FALSE, message = FALSE}
library("rprime")
library("knitr")
opts_chunk$set(
  comment = "#>",
  error = FALSE,
  tidy = FALSE,
  collapse = TRUE)
```

Suppose we want to load all the Eprime files in a directory and combine the 
results in dataframe. 

My strategy in this scenario is to figure out what I need to do for a single 
file and then wrap those steps in a function that takes a filepath to a txt 
file and returns a dataframe. After some exploration and interactive 
programming, I come up with the following function.

```{r}
library("plyr")
reduce_sails <- function(sails_path) {
  sails_lines <- read_eprime(sails_path)
  sails_frames <- FrameList(sails_lines)
  
  # Trials occur at level 3
  sails_frames <- keep_levels(sails_frames, 3)
  sails <- to_data_frame(sails_frames)
  
  # Tidy up
  to_pick <- c("Eprime.Basename", "Running", "Module", "Sound", 
               "Sample", "Correct", "Response")
  sails <- sails[to_pick]
  running_map <- c(TrialLists = "Trial", PracticeBlock = "Practice")
  sails$Running <- running_map[sails$Running]
  
  # Renumber trials in the practice and experimental blocks separately.
  # Numerically code correct response.
  sails <- ddply(sails, .(Running), mutate, 
                 TrialNumber = seq(from = 1, to = length(Running)),
                 CorrectResponse = ifelse(Correct == Response, 1, 0))
  sails$Sample <- NULL
  
  # Optionally, one might save the processed file via: 
  # csv <- paste0(file_path_sans_ext(sails_path), ".csv")
  # write.csv(sails, csv, row.names = FALSE)
  sails
}
```

Here's a preview of what the function returns when given a filepath.

```{r}
head(reduce_sails("data/SAILS/SAILS_001X00XS1.txt"))
```

Now that the function works on one file, I can use `ldply` to apply the 
function to several files, returning results in a single dataframe. (For 
`dplyr`, I would `lapply` the function to each path to get a list of 
dataframes, then use `bind_rows` to combine into a single dataframe.)

```{r}
sails_paths <- list.files("data/SAILS/", pattern = ".txt", full.names = TRUE)
sails_paths
ensemble <- ldply(sails_paths, reduce_sails)
```

Finally, with all of the subjects' data contained in a single dataframe, I can 
use `ddply` plus `summarise` and compute summary scores at different levels of 
aggregation within each subject. 

```{r}
# Score trials within subjects
overall <- ddply(ensemble, .(Eprime.Basename, Running), summarise, 
                 Score = sum(CorrectResponse),
                 PropCorrect = Score / length(CorrectResponse))
overall

# Score modules within subjects
modules <- ddply(ensemble, .(Eprime.Basename, Running, Module), summarise, 
                 Score = sum(CorrectResponse),
                 PropCorrect = mean(CorrectResponse))
modules
```