---
title: "Getting Started"
author: "Dyfan Jones"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started}
  %\VignetteEngine{knitr::rmarkdown}
  %\usepackage[UTF-8]{inputenc}
---

The `RAthena` package aims to make it easier to work with data stored in [`AWS Athena`](https://aws.amazon.com/athena/). `RAthena` package attempts to provide three levels of interacting with AWS Athena:

* Low - level API: This provides more finer tuning of `AWS Athena` backend utilising the AWS SDK [`paws`](https://github.com/paws-r/paws). This includes configuring [`AWS Athena Work Groups`](https://aws.amazon.com/about-aws/whats-new/2019/02/athena_workgroups/) to assuming different roles within `AWS` when connecting to `AWS Athena`.
* [DBI interface](https://dbi.r-dbi.org/): This is the primary goal of `RAthena`, by providing a `DBI` interface to `AWS Athena`. Users are able to interact with `AWS Athena` utilising familiar functions and methods they have used for other Databases from R.
* [dplyr interface](https://dbplyr.tidyverse.org/): As `dplyr` is coming more popular, `RAthena` aims to give `dplyr` a seamless interface into `AWS Athena`.

# Installing `RAthena`:

As `RAthena` utilising the python AWS SDK `boto3`, Python 3+ is required. Please install Python 3+ either by [Python](https://www.python.org/downloads/) or [Python Anaconda](https://www.anaconda.com/products/distribution). To install `RAthena`:

```r
# cran version
install.packages("RAthena")

# Dev version
remotes::install_github("dyfanjones/RAthena")
```

Next is to install Python `boto3`. This can be done either by `RAthena`'s installation method:

```r
RAthena::install_boto()
```
Or pip method:

```
pip install boto3
```

## Python Environments:

If `RAthena` doesn't pick up `boto3` after using `install_boto()`, please consider specifying the python environment.`install_boto()` creates `RAthena` environment. This is either a Python virtual environment or a conda environment depending on your system. 

```r
library(DBI)

# Specify python conda environment and force reticulate to use it
reticulate::use_condaenv("RAthena", required = TRUE)

# Or specify python virtual environment and force reticulate to use it
reticulate::use_virtualenv("RAthena", required = TRUE)

con <- dbConnect(RAthena::athena())
```

**Note:** Python environments are not required if `boto3` is either in the root Python or if R and Python are in their own environment (for example conda environment).

## Docker Example:

To help with users wishing to run `RAthena` in a [docker](https://hub.docker.com/), a simple docker file has been created [here](https://github.com/DyfanJones/RAthena/blob/master/docker/Dockerfile). To set up the docker please refer to [link](https://aws.amazon.com/premiumsupport/knowledge-center/codebuild-temporary-credentials-docker/). For demo purposes we will use the [example docker](https://github.com/DyfanJones/RAthena/blob/master/docker/Dockerfile) and run it locally:

```console
# build docker image
docker build . -t rathena

# start container with aws credentials passed from local
docker run \
      -e AWS_ACCESS_KEY_ID="$(aws configure get aws_access_key_id)" \
      -e AWS_SECRET_ACCESS_KEY="$(aws configure get aws_secret_access_key)" \
      -e AWS_SESSION_TOKEN="$(aws configure get aws_session_token)" \
      -e AWS_DEFAULT_REGION="$(aws configure get region)" \
      -it rathena
```

When running `RAthena` in the docker environment you might be required to let `reticulate` know what python you are using.

```r
reticulate::use_python("/usr/bin/python3")

library(DBI)

con <- dbConnect(RAthena::athena(), s3_staging_dir = "s3://mybucket/")
```

# Usage:

## Low - Level API:
```r
library(DBI)
library(RAthena)

con <- dbConnect(athena())

# list all current work groups in AWS Athena
list_work_groups(con)

# Create a new work group
create_work_group(con, "demo_work_group", description = "This is a demo work group",
                  tags = tag_options(key= "demo_work_group", value = "demo_01"))
```         

## DBI:
```r
library(DBI)

con <- dbConnect(RAthena::athena())

# Get metadata 
dbGetInfo(con)

# $profile_name
# [1] "default"
# 
# $s3_staging
# [1] ######## NOTE: Please don't share your S3 bucket to the public
# 
# $dbms.name
# [1] "default"
# 
# $work_group
# [1] "primary"
# 
# $poll_interval
# NULL
# 
# $encryption_option
# NULL
# 
# $kms_key
# NULL
# 
# $expiration
# NULL
# 
# $region_name
# [1] "eu-west-1"
# 
# $boto3
# [1] "1.11.5"
# 
# $RAthena
# [1] "1.7.1"

# create table to AWS Athena
dbWriteTable(con, "iris", iris)

dbGetQuery(con, "select * from iris limit 10")
# Info: (Data scanned: 860 Bytes)
#  sepal_length sepal_width petal_length petal_width species
# 1:           5.1         3.5          1.4         0.2  setosa
# 2:           4.9         3.0          1.4         0.2  setosa
# 3:           4.7         3.2          1.3         0.2  setosa
# 4:           4.6         3.1          1.5         0.2  setosa
# 5:           5.0         3.6          1.4         0.2  setosa
# 6:           5.4         3.9          1.7         0.4  setosa
# 7:           4.6         3.4          1.4         0.3  setosa
# 8:           5.0         3.4          1.5         0.2  setosa
# 9:           4.4         2.9          1.4         0.2  setosa
# 10:          4.9         3.1          1.5         0.1  setosa
```

## dplyr:
```r
library(dplyr)

athena_iris <- tbl(con, "iris")

athena_iris %>%
  select(species, sepal_length, sepal_width) %>% 
  head(10) %>%
  collect()
# Info: (Data scanned: 860 Bytes)
# # A tibble: 10 x 3
# species  sepal_length sepal_width
# <chr>           <dbl>       <dbl>
# 1 setosa            5.1         3.5
# 2 setosa            4.9         3  
# 3 setosa            4.7         3.2
# 4 setosa            4.6         3.1
# 5 setosa            5           3.6
# 6 setosa            5.4         3.9
# 7 setosa            4.6         3.4
# 8 setosa            5           3.4
# 9 setosa            4.4         2.9
# 10 setosa           4.9         3.1
```
# Useful Links:

* [SQL](https://docs.aws.amazon.com/athena/latest/ug/functions-operators-reference-section.html)
* [AWS Athena performance tips](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/)
* [AWS Athena User Guide](https://docs.aws.amazon.com/athena/latest/ug/athena-ug.pdf)