---
title: "Extending srvyr"
author: "Greg Freedman Ellis"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Extending srvyr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r setup, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

if (!require(convey) | !require(laeken)) {
  knitr::opts_chunk$set(eval = FALSE)
  message("Missing convey and laeken packages. Install them to run vignette.")
}

data("eusilc", package = "laeken")
if (!exists("eusilc")) {
  knitr::opts_chunk$set(eval = FALSE)
  message("Did not find 'eusilc' data in laeken package, cannot continue.")
}
```
I don't expect this vignette to be help for most srvyr users, it is instead
intended for other package developers. An exciting new feature that is easier
now that I have reworked srvyr's non-standard evaluation to match dplyr 0.7+ is
that it is now possible for non-srvyr functions to be called from within
`summarize`. This vignette describes some of the inner-workings of summarize so
that others can extend srvyr. This is kind of a fiddly part of srvyr, and I
don't expect that many people will want or need to understand it, so this guide
is mostly aimed at package authors who already have an understanding of how
survey objects work. If you'd like more explanation, please let me know on
[github](https://github.com/gergness/srvyr)!

This guide has also been rewritten for srvyr 1.0, as I had to rework summarize and
was unable to maintain backwards compatibility.

## Translating from survey to srvyr
srvyr implements the "survey statistics" functions from the survey package. Some examples
are the svymean, svytotal, svyciprop, svyquantile and svyratio all return a
`svystat` object which usually prints out the estimate and its standard error
and other estimates of the variance can be calculated from it. In srvyr, these
estimates are created inside of a summarize call and the variance estimates are
specified at the same time.

The combination of srvyr's group_by and summarize is analogous to the `svyby`
function that performs one of the survey statistic function and performs it on
multiple groups. However, as of srvyr 1.0, srvyr no longer uses `svyby`, instead
the survey object is split into each group's 


## What summarize expects
srvyr's summarize expects that the survey statistics functions will return
objects that are formatted in a particular way. Below, I'll explain some of the 
functions that will help create these objects for you in most cases, but the return
should be:

- A `srvyr_result_df` object (which is just a wrapper around a `data.frame`)
- Generally there should be 1 row for ungrouped, or 1 row per group, but this is no longer
  required.
- The names are based on the argument name from the summarize call but this name can't 
  provided to the functions. Instead, summarize does the renaming and your function should
  name the variables "coef" (which is renamed to the parameter name) or with a 
  suffix that will be appended after the parameter name. 

## Helper functions exported by srvyr
srvyr now exports several functions that can help convert functions designed
for the survey package to this format.

- `cur_svy()` - This function, modeled after `dplyr::current_vars()`,
  is a hidden way to send the survey object to the object (by hidden, I mean 
  that the user doesn't have to specify the survey in the arguments of their
  function call). To use it, you can now directly call `cur_svy()` from
  inside your function. This survey includes only the current group's survey
  data.
- `cur_svy_full()` - Like `cur_svy()`, but includes the full survey data intead
  of just the current group's data.
- `cur_svy_wts()` - This helper function provides access to the full-sample
  weights for the current group's data.
- `set_survey_vars()` - Many survey functions have limited support for both
  supplying a formula indicating the variables to calculate a statistic on
  as well as a vector. However, oftentimes the vector version is less well
  supported than the formula version. Since srvyr uses dplyr semantics, it
  ends up returning the values as vectors. This function will add on the variable
  to the survey, defaulting to having the name "\_\_SRVYR_TEMP_VAR\_\_". 
- `get_var_est()` - A helper function that calculates variance estimates
  like standard error (se), confidence interval (ci), variance (var), 
  or coefficient of variance (cv). For functions that support it, 
  there is a separate argument for design effects (to match survey's conventions).
- `as_srvyr_result_df()` - A helper function that adds the `srvyr_result_df` class
  to a `data.frame`

Note that these functions may not work in all cases. In srvyr, I've actually had to 
write multiple versions of `get_var_est()` because of minor differences in the way survey
objects are returned. Hopefully they will help in most situations, or at least give 
you a good place to start.

## Miscellaneous conventions
Two less important conventions that srvyr functions follow are:

1) snake_case function names (to better match the tidyverse)
2) Multiple choice arguments that default to the first (so for var_type, if
  no parameters are specified, use only "se" not all of them).

## Example: convey::svygini -> survey_gini
That was just a lot of text, but I think it's probably easiest just to provide an
example. The convey package provides several methods for analysis of inequality
using survey data. The svygini function calculates the gini coefficient. Here,
we'll write functions that make a srvyr version `survey_gini`.

```{r}
# S3 generic function
survey_gini <- function(
  x, na.rm = FALSE, vartype = c("se", "ci", "var", "cv"), ...
) {
  if (missing(vartype)) vartype <- "se"
  vartype <- match.arg(vartype, several.ok = TRUE)
  .svy <- srvyr::set_survey_vars(srvyr::cur_svy(), x)
  
  out <- convey::svygini(~`__SRVYR_TEMP_VAR__`, na.rm = na.rm, design = .svy)
  out <- srvyr::get_var_est(out, vartype)
  as_srvyr_result_df(out)
}
```

And here's what this function looks like in practice:
```{r}
# Example from ?convey::svygini
suppressPackageStartupMessages({
  library(srvyr)
  library(survey)
  library(convey)
  library(laeken)
})
data(eusilc) ; names( eusilc ) <- tolower( names( eusilc ) )

# Setup for survey package
des_eusilc <- svydesign(
  ids = ~rb030, 
  strata = ~db040,  
  weights = ~rb050, 
  data = eusilc
)
des_eusilc <- convey_prep(des_eusilc)

# Setup for srvyr package
srvyr_eusilc <- eusilc %>% 
  as_survey(
    ids = rb030,
    strata = db040,
    weights = rb050
  ) %>%
  convey_prep()

## Ungrouped
# Calculate ungrouped for survey package
svygini(~eqincome, design = des_eusilc)

# Use new function from summarize
srvyr_eusilc %>% 
  summarize(eqincome = survey_gini(eqincome))

## Groups
# Calculate by groups for survey
survey::svyby(~eqincome, ~rb090, des_eusilc, convey::svygini)

# Use new function from summarize
srvyr_eusilc %>% 
  group_by(rb090) %>%
  summarize(eqincome = survey_gini(eqincome))

```