---
title: "mpathr"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{mpathr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

  
```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(mpathr)
```

The main goal of `mpathr` is to provide functions to import data from the m-Path
platform, as well as provide functions for common manipulations for
ESM data.

## Importing m-Path data

To show how to import data using `mpathr`, we provide example data within 
the package:

```{r show m-Path example data}
mpath_example()
```
As shown above, the package comes with an example of the `basic.csv` that can be
exported from the m-Path platform. 

To read this data into R, we can use the `read_mpath()` function. We will also 
need a path to the meta data. The meta data is a file that contains information
about the data types of each column, as well as the possible responses for
categorical columns. 

The main advantage of using `read_mpath()`, as opposed to other functions like 
`read.csv()`, is that `read_mpath()` uses the meta data to correctly interpret the
data types. Furthermore it will also automatically convert columns that store 
multiple responses into lists. For a response with multiple options like `1,4,6`,
`read_mpath()` will store a list with each number, which facilitates further 
preprocessing of these responses.

We can obtain the paths to the example basic data and meta data 
using the `mpath_example()` function: 

```{r use read_mpath}
# find paths to example basic and meta data:
basic_path <- mpath_example(file = "example_basic.csv")
meta_path <- mpath_example("example_meta.csv")

# read the data
data <- read_mpath(
  file = basic_path,
  meta_data = meta_path
)

data
```

#### Saving m-Path data
The resulting data frame will contain columns with lists,
which can be problematic when saving the data. To save the data, we suggest the
following two options:

If you want to save the data as a comma-separated values (CSV) file to use it in another program, 
use `write_mpath()`. This function will collapse most list columns to a single string and parses
all character columns to JSON strings, essentially reversing the operations performed by 
`read_mpath()`. Note that this does not mean that data can be read back using `read_mpath()`,
because the data may have been modified and thus no longer be in line with the meta data.

```{r write data as csv, eval = FALSE}
write_mpath(
  x = data,
  file = "data.csv"
)
```

Otherwise, if the data will be used exclusively in R, we suggest saving it as an R object (.RData 
or .RDS):
```{r write data as an R object, eval = FALSE}
# As an .RData file. When using `load()`, note that the data will be stored in the `data` object
# in the global environment.
save(
  data, 
  file = 'data.RData'
)

# As an RDS file.
saveRDS(
  data, 
  file = 'data.RDS'
)
```

## Obtaining response rates

### response_rate function

Some common operations that are done on Experience Sampling Methodology (ESM) data have to do with 
the participants' response rate. We provide a function `response_rate()` that 
calculates the response_rate per participant for the entire duration of the 
study, or for a specific time frame.

This function takes as argument a `valid_col`, that takes a logical column that 
stores whether the beep was answered by the participant, or not, as well as a 
`participant_col`, that identifies each distinct participant.

We will show how to use this function with the `example_data`, that contains data from the same 
study as the `example_basic.csv` file, but after some cleaning.

```{r calculate response rate}
example_data

response_rates <- response_rate(
  data = example_data,
  valid_col = answered,
  participant_col = participant
)

response_rates
```

The function returns a data frame with:

* The `participant` column, as specified in `participant_col`
* The `number_of_beeps` used to calculate the response rate.
* The `response_rate` column, which is the proportion of valid responses 
(specified in `valid_col`) per participant.

The output of this function can further be used to identify participants with 
low response rates:

```{r show low response rates}
response_rates[response_rates$response_rate < 0.5,]
```

We could also be interested in seeing the participants' response rate during
a specific period of time (for example, if we think a participant's compliance
significantly dropped a certain date). In this case, we should supply the 
function with the (otherwise optional) argument `time_col`, that should contain
times stored as `POSIXct` objects, and specify the date period that we are
interested in (in the format `yyyy-mm-dd` or `yyyy/mm/dd`):

```{r calculate response rate after 15th of May 2024}
response_rates_after_15 <- response_rate(
  data = example_data,
  valid_col = answered,
  participant_col = participant,
  time_col = sent,
  period_start = '2024-05-15'
)
```

This will return the participant's response rate after the 15th of May 2024. 

```{r show low response rates after 15th of May 2024}
response_rates_after_15
```

### plot_response_rate function

We also suggest a way to plot the participant response rates, to identify 
patterns like response rates dropping over time. For this, we provide the `plot_response_rate()` function.

```{r plot response rate, fig.width=7, fig.height=5}
plot_response_rate(
  data = example_data,
  time_col = sent,
  participant_col = participant,
  valid_col = answered
)
```
Note that the resulting plot can be further customized using the `ggplot2`
package.

```{r customize plot response rate plot, fig.width=7, fig.height=5}
library(ggplot2)

plot_response_rate(
  data = example_data,
  time_col = sent,
  participant_col = participant,
  valid_col = answered
) +
  theme_minimal() +
  ggtitle('Response rate over time') +
  xlab('Day in study')
```