--- title: "A Guide to hdar package" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{A Guide to hdar package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ### Important Notice ::: {style="border: 1px solid orange; padding: 10px; background-color: #fff3cd; color: #856404;"} Notice: We are currently performing maintenance and improvements on the Backend service. You may experience intermittent slow responses or minor issues. Rest assured, our team is working hard to enhance your experience. Thank you for your patience! ::: # Introduction The `hdar` R package provides seamless access to the WEkEO Harmonised Data Access (HDA) API, enabling users to efficiently query, download, and process data from the HDA platform. # Accessing the HDA Service To utilize the HDA service and library, you must first register for a WEkEO account. The HDA service is available at no cost to all WEkEO users. Creating an account allows you full access to our services, ensuring you can leverage the full capabilities of HDA seamlessly. Registration is straightforward and can be completed through the following link: [Register for WEkEO](https://www.wekeo.eu/register). Once your account is set up, you will be able to access the HDA services immediately. # Setup To start using the `hdar` package, you first need to install and load it in your R environment. ```{r, eval = FALSE} # Install hdar from CRAN (if available) or GitHub # install.packages("hdar") # or # devtools::install_github("eea/hdar@develop") # Load the hdar package library(hdar) ``` # Authentication To interact with the HDA service, you need to authenticate by providing your username and password. The `Client` class allows you to pass these credentials directly and optionally save them to a configuration file for future use. If credentials are not specified as parameters, the client will read them from the `~/.hdarc` file. ## Creating and Authenticating the Client You can create an instance of the `Client` class by passing your username and password directly. TThe initialization method has an optional parameter `save_credentials` that specifies whether the provided credentials should be saved in the `~/.hdarc` configuration file. By default, `save_credential`s is set to `FALSE`. ### Example: Pass User/Password at Initialization Here is an example of how to authenticate by passing the user and password, and optionally saving these credentials: ```{r, eval=FALSE} # Define your username and password username <- "your_username" password <- "your_password" # Create an instance of the Client class and save credentials to a config file # The save_credentials parameter is optional and defaults to FALSE client <- Client$new(username, password, save_credentials = TRUE) ``` If the `save_credentials` parameter is set to `TRUE`, the credentials will be saved in the `~/.hdarc` file, making it easier to authenticate in future sessions without passing the credentials again. ### Example: Using Saved Credentials Once the credentials are saved, you can initialize the Client class without passing the credentials. The client will read the credentials from the `~/.hdarc` file: ```{r, eval=FALSE} # Create an instance of the Client class without passing credentials client <- Client$new() ``` ## Checking Authentication Once the client is created, you can check if it has been authenticated properly by calling a method `token()` that verifies authentication. For example: ```{r, eval=FALSE} # Example method to check authentication by getting the auth token client$get_token() ``` By using one of these methods, you can securely authenticate with the HDA service and start making requests. # Finding Datasets To interact with the HDA service, you will often need to find datasets available on WEkEO. The Client class provides a method called `datasets` that lists available datasets, optionally filtered by a text pattern. ## Basic Usage The basic usage of the datasets method is straightforward. You can retrieve a list of all datasets available on WEkEO by calling the `datasets` method on an instance of the `Client` class. ```{r, eval=FALSE} # Assuming 'client' is already created and authenticated # client <- Client$new() all_datasets <- client$datasets() ``` ## Filtering Datasets You can also filter the datasets by providing a text pattern. This is useful when you are looking for datasets that match a specific keyword or phrase. ```{r, eval=FALSE} # Assuming 'client' is already created and authenticated # client <- Client$new() pattern <- "Seasonal Trajectories" # client <- Client$new() pattern <- "Seasonal Trajectories" filtered_datasets <- client$datasets(pattern) # list dataset IDs sapply(filtered_datasets,FUN = function(x){x$dataset_id}) [1] "EO:EEA:DAT:CLMS_HRVPP_VPP-LAEA" "EO:EEA:DAT:CLMS_HRVPP_ST" [3] "EO:EEA:DAT:CLMS_HRVPP_ST-LAEA" "EO:EEA:DAT:CLMS_HRVPP_VPP" ``` ## Understanding the Results The datasets method returns a list containing datasets and associated information. This information may include dataset names, descriptions, and other metadata. # Creating a Query Template To search for data in the HDA service, you need to create a query template. Manually creating a query template can be tedious as it involves reading documentation and learning about possible parameters. To simplify this process, the `generate_query_template` function is provided to automate the creation of query templates for a given dataset. ## Basic Usage The `generate_query_template` function generates a template of a query for a specified dataset. This function fetches information about existing parameters, default values, etc., from the `/queryable` endpoint of the HDA service. The `generate_query_template` function generates a template of a query for a specified dataset. This function fetches information about existing parameters, default values, etc., from the `/queryable` endpoint of the HDA service. #### Example: Generating a Query Here is an example of how to generate a query template for the dataset with the ID "EO:EEA:DAT:CLMS_HRVPP_ST": ```{r, eval = FALSE} # client <- Client$new() query_template <- client$generate_query_template("EO:EEA:DAT:CLMS_HRVPP_ST") query_template { "dataset_id": "EO:EEA:DAT:CLMS_HRVPP_ST", "uid": "__### Value of string type with pattern: [\\w-]+", "productType": "PPI", "platformSerialIdentifier": "S2A, S2B", "tileId": "__### Value of string type with pattern: [\\w-]+", "productVersion": "__### Value of string type with pattern: [\\w-]+", "resolution": "10", "processingDate": "__### Value of string", "start": "__### Value of string", "end": "__### Value of string", "bbox": [ -180, -90, 180, 90 ] } ``` ## Modify and use the generated Query Template You can and should customize the generated query template to fit your specific needs. Fields starting with `__###` are placeholders indicating possible values. If these placeholders are left unchanged, they will be automatically removed before sending the query to the HDA service. To modify the query, it is often easier to transform the JSON into an R list using the `jsonlite::fromJSON()` function: ```{r, eval = FALSE} # convert to list for easier manipulation in R library(jsonlite) query_template <- fromJSON(query_template) query_template $dataset_id [1] "EO:EEA:DAT:CLMS_HRVPP_ST" $uid [1] "__### Value of string type with pattern: [\\w-]+" $productType [1] "PPI" $platformSerialIdentifier [1] "S2A, S2B" $tileId [1] "__### Value of string type with pattern: [\\w-]+" $productVersion [1] "__### Value of string type with pattern: [\\w-]+" $resolution [1] "10" $processingDate [1] "__### Value of string" $start [1] "__### Value of string" $end [1] "__### Value of string" $bbox [1] -180 -90 180 90 ``` Here is an example of how to use the query template in a search: ```{r, eval = FALSE} # set a new bbox query_template$bbox <- c(11.1090, 46.6210, 11.2090, 46.7210) # limit the time range query_template$start <- "2018-03-01T00:00:00.000Z" query_template$end <- "2018-05-31T00:00:00.000Z" query_template $dataset_id [1] "EO:EEA:DAT:CLMS_HRVPP_ST" $uid [1] "__### Value of string type with pattern: [\\w-]+" $productType [1] "PPI" $platformSerialIdentifier [1] "S2A, S2B" $tileId [1] "__### Value of string type with pattern: [\\w-]+" $productVersion [1] "__### Value of string type with pattern: [\\w-]+" $resolution [1] "10" $processingDate [1] "__### Value of string" $start [1] "2018-03-01T00:00:00.000Z" $end [1] "2018-05-31T00:00:00.000Z" $bbox [1] 11.109 46.621 11.209 46.721 ``` Once you have made the necessary modifications, you can convert the list back to JSON format with the `jsonlite::toJSON()` function. It's crucial to use the `auto_unbox = TRUE` flag when converting back to JSON. This ensures that the JSON is correctly formatted, particularly for arrays with a single element, due to the way `jsonlite` handles serialization. ```{r, eval = FALSE} # convert to JSON format query_template <- toJSON(query_template, auto_unbox = TRUE) ``` # Searching and Downloading Data To search for data in the HDA service, you can use the `search` function provided by the Client class. This function allows you to search for datasets based on a query and optionally limit the number of results. The search results can then be downloaded using the download method of the `SearchResults` class. ## Searching for Data The `search` function takes a query and an optional limit parameter, which specifies the maximum number of results you want to retrieve. The function only searches for data and does not download it. The output of this function is an instance of the `SearchResults` class. Here is an example of how to search for data using a query and limit the results to 5: ```{r, eval = FALSE} # Assuming 'client' is already created and authenticated, 'query' is defined matches <- client$search(query_template) [1] "Found 9 files" [1] "Total Size 1.8 GB" sapply(matches$results,FUN = function(x){x$id}) [1] "ST_20180301T000000_S2_T32TPS-010m_V101_PPI" "ST_20180311T000000_S2_T32TPS-010m_V101_PPI" [3] "ST_20180321T000000_S2_T32TPS-010m_V101_PPI" "ST_20180401T000000_S2_T32TPS-010m_V101_PPI" [5] "ST_20180411T000000_S2_T32TPS-010m_V101_PPI" "ST_20180421T000000_S2_T32TPS-010m_V101_PPI" [7] "ST_20180501T000000_S2_T32TPS-010m_V101_PPI" "ST_20180511T000000_S2_T32TPS-010m_V101_PPI" [9] "ST_20180521T000000_S2_T32TPS-010m_V101_PPI" ``` ## Downloading the files The `SearchResults` class has a public field `results` and a method called `download` that is responsible for downloading the found data. The `download()` function takes an output directory (which is created if it doesn't already exist) and includes an optional `force` parameter. When `force` is set to `TRUE`, the function will re-download the files even if they already exist in the output directory, overwriting the existing files. If `force` is set to `FALSE` (the default), the function will skip downloading files that already exist, saving time and bandwidth. ```{r, eval = FALSE} # Assuming 'matches' is an instance of SearchResults obtained from the search odir <- tempdir() matches$download(odir) The total size is 1.8 GB . Do you want to proceed? (Y/N): y [1] "[Download] Start" [1] "[Download] Downloading file 1/9" [1] "[Download] Downloading file 2/9" [1] "[Download] Downloading file 3/9" [1] "[Download] Downloading file 4/9" [1] "[Download] Downloading file 5/9" [1] "[Download] Downloading file 6/9" [1] "[Download] Downloading file 7/9" [1] "[Download] Downloading file 8/9" [1] "[Download] Downloading file 9/9" [1] "[Download] DONE" # Assuming 'matches' is an instance of SearchResults obtained from the search list.files(odir) [1] "ST_20180301T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180311T000000_S2_T32TPS-010m_V101_PPI.tif" [3] "ST_20180321T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180401T000000_S2_T32TPS-010m_V101_PPI.tif" [5] "ST_20180411T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180421T000000_S2_T32TPS-010m_V101_PPI.tif" [7] "ST_20180501T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180511T000000_S2_T32TPS-010m_V101_PPI.tif" [9] "ST_20180521T000000_S2_T32TPS-010m_V101_PPI.tif" unlink(odir,recursive = TRUE) ```