---
title: "Download data"
author: "Martin Westgate & Dax Kellie"
date: '2024-11-19'
output:
  rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Download data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
The `atlas_` functions are used to return data from the atlas chosen using 
`galah_config()`. They are:

-   `atlas_counts()`
-   `atlas_occurrences()`
-   `atlas_species()`
-   `atlas_media()`
-   `atlas_taxonomy()`

The final `atlas_` function---`atlas_citation()`---is unusual: It does not
return any new data, but instead provides a citation for an existing dataset 
(downloaded using `atlas_occurrences()`) with an associated DOI. The other 
functions are described below.

It is equally permissable to use the `type` argument of `galah_call()`
to specify the kind of data you want, and then retrieve the data using `collect()`. 
Here we use the `atlas_` prefix for consistency with earlier versions of galah,
and because many `atlas_` functions sometimes include shortcuts to make life
easier.


# Record counts
`atlas_counts()` provides summary counts of records in the specified atlas 
without needing to download all the records first. 


``` r
galah_config(atlas = "Australia")
# Total number of records in the ALA
atlas_counts()
```

```
## # A tibble: 1 × 1
##       count
##       <int>
## 1 146185520
```

Group and summarise record counts by specific fields using `galah_group_by()`.


``` r
galah_call() |>
  galah_group_by(kingdom) |>
  atlas_counts()
```

```
## # A tibble: 12 × 2
##    kingdom           count
##    <chr>             <int>
##  1 Animalia      113408280
##  2 Plantae        27572183
##  3 Fungi           2448600
##  4 Chromista       1057157
##  5 Protista         316541
##  6 Bacteria         113480
##  7 Archaea            4120
##  8 Virus              2382
##  9 Bamfordvirae        210
## 10 Orthornavirae       138
## 11 Viroid              104
## 12 Shotokuvirae         41
```


# Species lists
A common use case of atlas data is to identify which species occur in a specified
region, time period, or taxonomic group. `atlas_species()` is similar to 
`search_taxa()`, in that it returns taxonomic information and unique identifiers, 
but differs by returning information only on species and is far more flexible by 
supporting filtering.


``` r
species <- galah_call() |>
  galah_identify("Rodentia") |>
  galah_filter(stateProvince == "Northern Territory") |>
  atlas_species()
  
species |> head()
```

```
## # A tibble: 6 × 11
##   taxon_concept_id species_name scientific_name_auth…¹ taxon_rank kingdom phylum class order family genus vernacular_name
##   <chr>            <chr>        <chr>                  <chr>      <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr>          
## 1 https://biodive… Pseudomys d… (Gould, 1842)          species    Animal… Chord… Mamm… Rode… Murid… Pseu… Delicate Mouse 
## 2 https://biodive… Mesembriomy… (J.E. Gray, 1843)      species    Animal… Chord… Mamm… Rode… Murid… Mese… Black-footed T…
## 3 https://biodive… Zyzomys arg… (Thomas, 1889)         species    Animal… Chord… Mamm… Rode… Murid… Zyzo… Common Rock-rat
## 4 https://biodive… Pseudomys h… (Waite, 1896)          species    Animal… Chord… Mamm… Rode… Murid… Pseu… Sandy Inland M…
## 5 https://biodive… Melomys bur… (Ramsay, 1887)         species    Animal… Chord… Mamm… Rode… Murid… Melo… Grassland Melo…
## 6 https://biodive… Notomys ale… Thomas, 1922           species    Animal… Chord… Mamm… Rode… Murid… Noto… Spinifex Hoppi…
## # ℹ abbreviated name: ¹​scientific_name_authorship
```


# Occurrence data
To download occurrence data you will need to specify an email in
`galah_config()` that has been registered to an account with your selected GBIF node. 
See more information in the [config section](#config).


``` r
galah_config(email = "your_email@email.com", atlas = "Australia")
```

Download occurrence records for *Eolophus roseicapilla*.


``` r
occ <- galah_call() |>
  galah_identify("Eolophus roseicapilla") |>
  galah_filter(
    stateProvince == "Australian Capital Territory",
    year >= 2010,
    profile = "ALA"
  ) |>
  galah_select(institutionID, group = "basic") |>
  atlas_occurrences()
```

```
## Retrying in 1 seconds.
## Retrying in 2 seconds.
## Retrying in 4 seconds.
```

``` r
occ |> head()
```

```
## # A tibble: 6 × 9
##   recordID            scientificName taxonConceptID decimalLatitude decimalLongitude eventDate           occurrenceStatus
##   <chr>               <chr>          <chr>                    <dbl>            <dbl> <dttm>              <chr>           
## 1 0000a928-d756-42eb… Eolophus rose… https://biodi…           -35.6             149. 2017-04-19 09:11:00 PRESENT         
## 2 0001bc78-d2e9-48aa… Eolophus rose… https://biodi…           -35.2             149. 2019-08-13 15:13:00 PRESENT         
## 3 0002064f-08ea-425b… Eolophus rose… https://biodi…           -35.3             149. 2014-03-16 06:48:00 PRESENT         
## 4 00022dd2-9f85-4802… Eolophus rose… https://biodi…           -35.3             149. 2022-05-08 08:20:00 PRESENT         
## 5 0002cc35-8d5a-4d20… Eolophus rose… https://biodi…           -35.3             149. 2015-11-01 08:00:00 PRESENT         
## 6 00030a8c-082f-44f0… Eolophus rose… https://biodi…           -35.3             149. 2022-01-06 11:47:00 PRESENT         
## # ℹ 2 more variables: dataResourceName <chr>, institutionID <lgl>
```


# Media metadata
In addition to text data describing individual occurrences and their attributes, 
ALA stores images, sounds and videos associated with a given record. Metadata on
these records can be downloaded using `atlas_media()`.


``` r
media_data <- galah_call() |>
  galah_identify("Eolophus roseicapilla") |>
  galah_filter(
    year == 2020,
    cl22 == "Australian Capital Territory") |>
  atlas_media()
  
media_data |> head()
```

```
## # A tibble: 6 × 19
##   media_id   recordID scientificName taxonConceptID decimalLatitude decimalLongitude eventDate           occurrenceStatus
##   <chr>      <chr>    <chr>          <chr>                    <dbl>            <dbl> <dttm>              <chr>           
## 1 ff8322d0-… 003a192… Eolophus rose… https://biodi…           -35.3             149. 2020-09-12 16:11:00 PRESENT         
## 2 c66fc819-… 015ee7c… Eolophus rose… https://biodi…           -35.4             149. 2020-08-09 15:11:00 PRESENT         
## 3 fe6d7b94-… 05e86b7… Eolophus rose… https://biodi…           -35.4             149. 2020-11-13 22:29:00 PRESENT         
## 4 2f4d32c0-… 063bb0f… Eolophus rose… https://biodi…           -35.6             149. 2020-08-04 11:50:00 PRESENT         
## 5 73407414-… 063bb0f… Eolophus rose… https://biodi…           -35.6             149. 2020-08-04 11:50:00 PRESENT         
## 6 89171c49-… 063bb0f… Eolophus rose… https://biodi…           -35.6             149. 2020-08-04 11:50:00 PRESENT         
## # ℹ 11 more variables: dataResourceName <chr>, multimedia <chr>, images <chr>, sounds <lgl>, videos <lgl>,
## #   creator <chr>, license <chr>, mimetype <chr>, width <int>, height <int>, image_url <chr>
```

To actually download the media files to your computer, use [collect_media()].


``` r
media_data |>
  collect_media()
```

# Taxonomic trees
`atlas_taxonomy()` provides a way to build taxonomic trees from one clade down to 
another using each GBIF node's internal taxonomy. Specify which taxonomic level 
your tree will go down to with `galah_filter()` using the `rank` argument.


``` r
galah_call() |>
  galah_identify("chordata") |>
  galah_filter(rank == class) |>
  atlas_taxonomy()
```

```
## # A tibble: 19 × 4
##    name            rank      parent_taxon_concept_id                                                   taxon_concept_id  
##    <chr>           <chr>     <chr>                                                                     <chr>             
##  1 Chordata        phylum    <NA>                                                                      https://biodivers…
##  2 Cephalochordata subphylum https://biodiversity.org.au/afd/taxa/065f1da4-53cd-40b8-a396-80fa5c74dedd https://biodivers…
##  3 Tunicata        subphylum https://biodiversity.org.au/afd/taxa/065f1da4-53cd-40b8-a396-80fa5c74dedd https://biodivers…
##  4 Appendicularia  class     https://biodiversity.org.au/afd/taxa/1c20ed62-d918-4e42-b625-8b86d533cc51 https://biodivers…
##  5 Ascidiacea      class     https://biodiversity.org.au/afd/taxa/1c20ed62-d918-4e42-b625-8b86d533cc51 https://biodivers…
##  6 Thaliacea       class     https://biodiversity.org.au/afd/taxa/1c20ed62-d918-4e42-b625-8b86d533cc51 https://biodivers…
##  7 Vertebrata      subphylum https://biodiversity.org.au/afd/taxa/065f1da4-53cd-40b8-a396-80fa5c74dedd https://biodivers…
##  8 Agnatha         informal  https://biodiversity.org.au/afd/taxa/5d6076b1-b7c7-487f-9d61-0fea0111cc7e https://biodivers…
##  9 Myxini          informal  https://biodiversity.org.au/afd/taxa/66db22c8-891d-4b16-a1a2-b66feaeaa3e0 https://biodivers…
## 10 Petromyzontida  informal  https://biodiversity.org.au/afd/taxa/66db22c8-891d-4b16-a1a2-b66feaeaa3e0 https://biodivers…
## 11 Gnathostomata   informal  https://biodiversity.org.au/afd/taxa/5d6076b1-b7c7-487f-9d61-0fea0111cc7e https://biodivers…
## 12 Amphibia        class     https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers…
## 13 Aves            class     https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers…
## 14 Mammalia        class     https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers…
## 15 Reptilia        class     https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers…
## 16 Pisces          informal  https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers…
## 17 Actinopterygii  class     https://biodiversity.org.au/afd/taxa/e22efeb4-2cb5-4250-8d71-61c48bdaa051 https://biodivers…
## 18 Chondrichthyes  class     https://biodiversity.org.au/afd/taxa/e22efeb4-2cb5-4250-8d71-61c48bdaa051 https://biodivers…
## 19 Sarcopterygii   class     https://biodiversity.org.au/afd/taxa/e22efeb4-2cb5-4250-8d71-61c48bdaa051 https://biodivers…
```

# Configuring galah
Various aspects of the galah package can be customized. 

## Email
To download occurrence records, species lists or media, you will need to 
provide an email address registered with the service that you want to use 
(e.g. for the ALA you can create an account 
[here](https://auth.ala.org.au/userdetails/registration/createAccount)).

Once an email is registered, it should be stored in the config:

``` r
galah_config(email = "myemail@gmail.com")
```

## Setting your directory
By default, galah stores downloads in a temporary folder, meaning that the 
local files are automatically deleted when the R session is ended. This 
behaviour can be altered so that downloaded files are preserved by setting the 
directory to a non-temporary location.


``` r
galah_config(directory = "example/dir")
```

## Setting the download reason
ALA requires that you provide a reason when downloading occurrence data 
(via the galah `atlas_occurrences()` function). `reason` is set as 
"scientific research" by default, but you can change this using `galah_config()`. 
See `show_all(reasons)` for valid download reasons.


``` r
galah_config(download_reason_id = your_reason_id)
```

## Debugging
If things aren't working as expected, more detail (particularly about web 
requests and caching behaviour) can be obtained by setting `verbose = TRUE`.


``` r
galah_config(verbose = TRUE)
```