---
title: "ae_attendances dataset"
author:
- '[Tom Jemmett][tj_email], [The Strategy Unit][su_web]'
date: "15/08/2019"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{ae_attendances dataset}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

options(tidyverse.quiet = TRUE)
```

This vignette explains how to use the `ae_attendances` dataset in R, and also
details where it comes from and how it is generated.

The data is sourced from [NHS England Statistical Work Areas][nhs_swa]
and is available under the [Open Government Licence v3.0][ogl_v3].

The data contains all reported A&E attendances for the period April 2016 through
March 2019

The dataset contains:

- __period__: the month that this activity relates to, stored as a date (1st of each month)
- __org_code__: the [ODS][ods]
code for the organisation that this activity relates to
- __type__: the [Department Type][ae_types]
for this activity, either 1, 2, or other
- __attendances__: the number of attendances for this department type at this organisation
for this month
- __breaches__: the number of attendances that breached the 4 hour target
- __admissions__: the number of attendances that resulted in an admission to the
hospital

First let's load some packages and the dataset and show the first 10 rows of data.

```{r load data, message=TRUE, warning=TRUE}
library(knitr)
library(scales)
library(ggrepel)
library(lubridate)
library(dplyr)
library(forcats)
library(tidyr)

library(NHSRdatasets)

data("ae_attendances")

# format for display
ae_attendances %>%
  # set the period column to show in Month-Year format
  mutate_at(vars(period), format, "%b-%y") %>% 
  # set the numeric columns to have a comma at the 1000's place
  mutate_at(vars(attendances, breaches, admissions), comma) %>%
  # show the first 10 rows
  head(10) %>%
  # format as a table
  kable()
```

We can calculate the 4 hours performance for England as a whole like so:

```{r england performance}
england_performance <- ae_attendances %>%
  group_by(period) %>%
  summarise_at(vars(attendances, breaches), sum) %>%
  mutate(performance = 1 - breaches / attendances)

# format for display
england_performance %>% 
  # same format options as above
  mutate_at(vars(period), format, "%b-%y") %>% 
  mutate_at(vars(attendances, breaches), comma) %>%
  # this time show the performance column as a percentage
  mutate_at(vars(performance), percent) %>%
  # show the first 10 rows and format as a table
  head(10) %>%
  kable()
```

We can now plot the monthly performance

```{r england performance plot}
ggplot(england_performance, aes(period, performance)) +
  geom_line() +
  geom_point() +
  scale_y_continuous(labels = percent) +
  labs(x = "Month of attendance",
       y = "% of attendances that met the 4 hour standard",
       title = "NHS England A&E 4 Hour Performance",
       caption = "Source: NHS England Statistical Work Areas (OGL v3.0)")
```

We can clearly see the "Winter Pressures" where performance drops.

We can also inspect performance for the [different types of department][ae_types]:

```{r england performance by type}
ae_attendances %>%
  group_by(period, type) %>%
  summarise_if(is.numeric, sum) %>%
  mutate(performance = 1 - breaches / attendances) %>%
  ggplot(aes(period, performance, colour = type)) +
  geom_line() +
  geom_point() +
  scale_y_continuous(labels = percent) +
  #facet_wrap(vars(type), nrow = 1) +
  theme(legend.position = "bottom") +
  labs(x = "Month of attendance",
       y = "% of attendances that met the 4 hour standard",
       title = "NHS England A&E 4 Hour Performance",
       subtitle = "By Department Type",
       caption = "Source: NHS England Statistical Work Areas (OGL v3.0)")
```

From this it appears as if only the type 1 departments have the seasonal
drops, type 2 and "other" departments remain pretty consistent.

## What are the best and worst trusts for performance?

We could create a similar table of data for performance by each individual trust,
but it would be useful to only look at trusts that have a type 1 department as
it appears from the chart above that these departments have the largest
variation.

```{r performance_by_trust}
performance_by_trust <- ae_attendances %>%
  group_by(org_code, period) %>%
  # make sure that this trust has a type 1 department
  filter(any(type == 1)) %>%
  summarise_at(vars(attendances, breaches), sum) %>%
  mutate(performance = 1 - breaches / attendances)

# format for display
performance_by_trust %>%
  mutate_at(vars(period), format, "%b-%y") %>% 
  mutate_at(vars(attendances, breaches), comma) %>%
  mutate_at(vars(performance), percent) %>%
  head(10) %>%
  kable()
```

From this table we can calculate the overall performance by each trust and then
organise the trusts by their overall performance.

```{r performance_by_trust_ranking}
performance_by_trust_ranking <- performance_by_trust %>%
  summarise(performance = 1 - sum(breaches) / sum(attendances)) %>%
  arrange(performance) %>%
  pull(org_code) %>%
  as.character()

print("Bottom 5")
head(performance_by_trust_ranking, 5)

print("Top 5")
tail(performance_by_trust_ranking, 5)
```

```{r performance_by_trust top 5 bottom 5 plot}
performance_by_trust %>%
  ungroup() %>%
  mutate_at(vars(org_code), fct_relevel, performance_by_trust_ranking) %>%
  filter(org_code %in% c(head(performance_by_trust_ranking, 5),
                         tail(performance_by_trust_ranking, 5))) %>%
  ggplot(aes(period, performance)) +
  geom_line() +
  geom_point() +
  scale_y_continuous(labels = percent) +
  facet_wrap(vars(org_code), nrow = 2) +
  theme(legend.position = "bottom") +
  labs(x = "Month of attendance",
       y = "% of attendances that met the 4 hour standard",
       title = "NHS England A&E 4 Hour Performance",
       subtitle = "Bottom 5/Top 5 over the whole 3 years",
       caption = "Source: NHS England Statistical Work Areas (OGL v3.0)")
```

## Benchmarking

It is sometimes useful to see how an organisation stacks up against all of the
other organisations. Below we create a chart where each organisation is shown as
a point, ordered by performance from left (highest performance) to right (lowest)
performance.

It's useful to indicate certain organisations on the chart, below I am showing
the 3 organisations that are at the lower quartile, median and upper quartile,
however you could change this to instead pick out specific organisations (using
a reference table and `left_join` or hard coding with `case_when`).

```{r bencmarking plot}
ae_attendances %>%
  filter(period == last(period)) %>%
  group_by(org_code) %>%
  filter(any(type == 1)) %>%
  summarise_at(vars(attendances, breaches), sum) %>%
  mutate(performance = 1 - breaches/attendances,
         overall_performance = 1 - sum(breaches)/sum(attendances),
         org_code = fct_reorder(org_code, -performance)) %>%
  #
  arrange(performance) %>%
  # lets highlight the organsiations that are at the lower and upper quartile
  # and at the median. First "tile" the data into 4 groups, then we use the
  # lag function to check to see if the value changes between rows. We will get
  # NA for the first row, so replace this with FALSE
  mutate(highlight = ntile(n = 4),
         highlight = replace_na(highlight != lag(highlight), FALSE)) %>%

  ggplot(aes(org_code, performance)) +
  geom_hline(aes(yintercept = overall_performance)) +
  geom_point(aes(fill = highlight), show.legend = FALSE, pch = 21) +
  geom_text_repel(aes(label = ifelse(highlight, as.character(org_code), NA)),
                  na.rm = TRUE) +
  scale_fill_manual(values = c("TRUE" = "black",
                               "FALSE" = NA)) +
  scale_y_continuous(labels = percent) +
  theme_minimal() +
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.line = element_line(),
        axis.ticks.y = element_line())
```

[//]: <> (URL's / References --------------------------------------------------)
[tj_email]: mailto:thomas.jemmett@nhs.net
[su_web]:   https://www.strategyunitwm.nhs.uk/ 
[nhs_swa]:  https://www.england.nhs.uk/statistics/statistical-work-areas/ae-waiting-times-and-activity/
[ogl_v3]:   https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
[ods]:      https://digital.nhs.uk/services/organisation-data-service
[ae_types]: https://web.archive.org/web/20200128111444/https://www.datadictionary.nhs.uk/data_dictionary/attributes/a/acc/accident_and_emergency_department_type_de.asp