--- title: "Quickstart" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Quickstart} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` *Last updated 10 February 2024* \ ### Set up a Google Cloud Services account Follow the instructions [here](https://dair.info/articles/configuration.html) for the GUI method or [here](https://dair.info/articles/gcs_cli.html) for the command line method. See also the [GCS concept cheatsheet](https://dair.info/articles/cheatsheet.html) for an overview of recommended environment variables. ### Process synchronously Pass a single-page pdf or image file to Document AI and get the output immediately: ```{r, eval=FALSE} library(daiR) ## Not run: myfile <- "sample.pdf" text <- get_text(dai_sync(myfile)) ``` ### Process asynchronously Requires [configuration of `googleCloudStorageR`](https://dair.info/articles/gcs_storage.html). Send larger batches for offline processing in three steps: #### 1. Upload files to your Google Cloud Storage bucket ```{r, eval=FALSE} ## Not run: library(googleCloudStorageR) library(purrr) my_pdfs <- c("sample1.pdf", "sample2.pdf") map(my_pdfs, ~ gcs_upload(.x, name = basename(.x))) ``` #### 2. Tell Document AI to process them: ```{r, eval=FALSE} ## Not run: resp <- dai_async(my_pdfs) dai_status(resp) # to check the progress ``` The output will be delivered to the same bucket as JSON files. #### 3. Download the JSON output and extract the text: ```{r, eval=FALSE} ## Not run: # Get a dataframe with the bucket contents contents <- gcs_list_objects() # Get the names of the JSON output files jsons <- grep("*.json", contents$name, value = TRUE) # Download them map(jsons, ~ gcs_get_object(.x, saveToDisk = basename(.x))) # Extract the text from the JSON files and save it as .txt files local_jsons <- basename(jsons) map(local_jsons, ~ get_text(.x, type = "async", save_to_file = TRUE)) ``` Assuming your pdfs were named `sample1.pdf` and `sample2.pdf`, there will now be two files named `sample1-0.txt` and `sample2-0.txt` in your working directory.