---
title: "Project Setup and Data Extraction"
author: "Pattawee Puangchit"
date: "`r Sys.Date()`"
output:
  html_document:
    toc: true
    toc_float:
      collapsed: true
      smooth_scroll: true
    css: mystyle.css
    number_sections: true
    code_folding: show
    self_contained: false
vignette: >
  %\VignetteIndexEntry{Project Setup and Data Extraction}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE, eval = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE,
  message = FALSE,
  eval = requireNamespace("GTAPViz", quietly = TRUE)
)

```


```{r Dev Period, include = FALSE, eval = FALSE}
try(devtools::load_all(".."), silent = TRUE)  # go up one level from /vignettes/

input_path <- system.file("extdata/in", package = "GTAPViz")
sl4.plot.data <- readRDS(file.path(input_path, "sl4.plot.data.rds"))
har.plot.data <- readRDS(file.path(input_path, "har.plot.data.rds"))
macro.data    <- readRDS(file.path(input_path, "macro.data.rds"))
```

This vignette outlines the complete project setup process and demonstrates the use of `auto_gtap_data`.

# Package Overview {#sec:package-overview}

This package streamlines the creation of figures and tables from **.HAR** and **.sl4** results, making academic presentations effortless. Some key features are:

- **Effortless Multi-Plot Generation** – Automatically adjusts dimensions, facets, and layout with minimal input.  

- **Smart Plot Adjustments** – Fine-tune visuals easily without manual sizing or layout tweaks.  

- **Dual Export Plot Formats** – Instantly save high-resolution PNG and PDF outputs for slides, papers, and LaTeX. 

- **Publication-Ready Pivot Tables** – Generate clean tables alongside figures—ideal for academic papers.  

- **Streamlined Styling** – Customize colors, fonts, and legends through simple, flexible options.  

- **Powerful Yet Simple** – Built on `ggplot2`, with intuitive `TRUE/FALSE` switches—no advanced coding needed.  

- **Self-Contained Help** – Includes a detailed vignette and internal help—no need to search online.

<details>
  <summary class="toggle-summary">💡 Tip for Non-GTAP Users</summary>
  
<div class="tip-box">
If you're an advanced R user or not working directly with GTAP data, this package still works for **you**.  

However, data preparation may require custom manipulation before using `GTAPViz` plotting and table functions.  

For working with `.sl4` and `.har` files, refer to my companion package: [**HARplus**](https://github.com/Bodysbobb/HARplus).

</div>

</details>

<br>
Before proceeding, ensure that `GTAPViz` is installed and loaded:

```{r package, eval = FALSE}
if (!requireNamespace("GTAPViz", quietly = TRUE)) {
  devtools::install_github("Bodysbobb/GTAPViz")
}

require(GTAPViz)
```



## Plot Types {#sec:figure-type}

`GTAPViz` provides four main plotting functions, generating over 10 plot types:

1. [`comparison_plot`](#sec:comparisonplot): Compares variables across multiple experiments for selected observations from a specific dimension (e.g., region, sector).  
   Examples: `qgdp`, `ppriv`, `EV`, `GTAP Macros`, and more.

2. [`detail_plot`](#sec:detailplot): Shows a variable across one or two dimensions for each experiment.  
   Examples: `qo`, `qxw`, `qmw`, etc.

3. [`stack_plot`](#sec:stackplot): Visualizes the composition of a variable for decomposition analysis—e.g., `EV` decomposition.

<details>
  <summary class="toggle-summary">💡 Tip: Plot Catalog</summary>
  
<div class="tip-box">
  Not sure which plot to use or what it looks like?  
  Browse the full <a href="https://pattawee.shinyapps.io/gtapviz-advanced-plot-configs/" target="_blank" style="font-weight: bold; text-decoration: underline;">Plot Catalog</a> for examples and use cases.
</div>
  
</details>

---

## Tables {#sec:report-table}

`GTAPViz` includes powerful functions for generating structured pivot tables and Excel-ready tables with **interactive filters**, ideal for academic presentations and reports.

Explore the full [Table Catalog](https://pattawee.shinyapps.io/gtapviz-advanced-table-configs/) for examples.


# Project Directory {#sec:project-setting}

To use `GTAPViz` efficiently—with little to no manual adjustment—I highly recommend setting up your project directory as follows:

```{r Project Folder, eval = FALSE}
project.folder <- "your/folder/path"

# Optional: You might not need to adjust this
input.folder <- paste0(project.folder, "/in") 
map.folder <- paste0(project.folder, "/map")      
output.folder <- paste0(project.folder, "/out")   
```


<figure class="center-figure">

<div class="code-box">
```{.monospace}
📂 project.folder/ 
├── 📂 in/  # Stores all input files
├── 📂 map/ # Stores the mapping file
├── 📂 out/
```
</div>

<figcaption class="auto-fig">Example of Project Folder</figcaption>

</figure>


## Input Folder {#sec:input-folder}

All `.sl4` and `.har` input files must be stored in the same folder—by default, `<input.folder>`, which refers to the `/in` directory within your project folder.

Below is an example of the expected input folder structure:

<figure class="center-figure">

<div class="code-box">
```{.monospace}
📂 in/ 
├── 📄 TAR10.sl4 
├── 📄 TAR10-WEL.har 
├── 📄 SUBT10.sl4 
├── 📄 SUBT10-WEL.har 
```
</div>

<figcaption class="auto-fig">Example of Input Folder</figcaption>

</figure>


## Mapping Folder {#sec:mapping-folder}

The mapping XLSX template is available here: [OutputMapping.xlsx](https://github.com/Bodysbobb/GTAPViz/tree/main/inst/extdata/map). It contains three main sheets: 

- [SL4File](#sec:main-sheets)

- [HARFile](#sec:main-sheets)

- [FilterData](#sec:filterdata-sheets)


### SL4File and HARFile Sheets {#sec:main-sheets}

• **"Variable"** specifies the required variable from each file.

• **"Description"** is optional for defining variables and plot titles.

• **"Unit"** is required for all figure commands.

Below is an example of mapping file:

<figure class="center-figure">

<div class="table-box">
  <table>
    <tr>
      <th>Variable</th>
      <th>Description</th>
      <th>Unit</th>
    </tr>
    <tr>
      <td>qgdp</td>
      <td>Real GDP Index  </td>
      <td>percent</td>
    </tr>
    <tr>
      <td>EV</td>
      <td>Welfare Equivalents</td>
      <td>million USD</td>
    </tr>
    <tr>
      <td>ppriv</td>
      <td>Consumer Price Index</td>
      <td>percent</td>
    </tr>
    <tr>
      <td>qo</td>
      <td>Output</td>
      <td></td>
    </tr>
    <tr>
      <td>qxw</td>
      <td></td>
      <td></td>
    </tr>
  </table>
</div>

<figcaption class="auto-fig">Example of "SL4File" and "HARFile" Sheet</figcaption>

 <p class="figure-note">
    **Note:** The <strong>Description</strong> and <strong>Unit</strong> columns can be left empty if using 
    <a href="#sec:mapping_info">GTAP defaults</a>.
  </p>
</figure>

<div class="info-box">
<strong>Important:</strong> You must manually define both the description and unit for all <strong>additional variables — non-GTAPv7 variables</strong>.
</div>


### FilterData Sheet {#sec:filterdata-sheets}

The `FilterData` sheet is **optional** and can alternatively be defined directly in R. It contains two columns used to filter data from all loaded data frames:

- **"Region"** – Excludes specified regions by filtering the `"REG"` column.  

- **"Sector"** – Excludes specified sectors by filtering the `"COMM"` and `"ACTS"` columns.

**Caveat:** This option only works with the default column names `"REG"`, `"COMM"`, and `"ACTS"`.  
For filtering by other columns, you must manually apply filters in R after importing the data.


Below is an example of the filter data sheet:

<figure class="center-figure">

<div class="table-box">
  <table>
    <tr>
      <th>Region</th>
      <th>Sector</th>
    </tr>
    <tr>
      <td>EastAsia</td>
      <td></td>
    </tr>
    <tr>
      <td>SEAsia</td>
      <td></td>
    </tr>
    <tr>
      <td>Oceania</td>
      <td></td>
    </tr>
  </table>
</div>

<figcaption class="auto-fig">Example of "FilterData" Sheet</figcaption>

 <p class="figure-note">
    **Note:** You may leave it empty to include all entries or redefine the order for sorting the output format.
  </p>
</figure>



# R Environment Configuration {#sec:r-setup}

This section configures experiment names, description and unit handling, and output formats for processing GTAP model results.

## Experiment Names {#sec:experiment-name}

Define `<experiment>` to specify **input file names**. The experiment name:  
  
  • Appears in plots and is added to the **"Experiment"** column. 
  
  • Sorts figures and tables based on the order of `<experiment>`. 

The following command processes the files and sorts the output with EXP1 before EXP2:

```{r Experiment Name, eval = FALSE}
experiment <- c("TAR10", "SUBT10")

# Automatically Processing These Inputs in the Input Folder 
# - TAR10.sl4 and TAR10-WEL.har  
# - SUBT10.sl4 and SUBT10-WEL.har  
```

<details>
  <summary class="toggle-summary">Note</summary>
  
<div class="tip-box">
You can include as many experiments (inputs) as needed, but a higher
number will increase processing time.

</div>

</details>


## Description and Unit {#sec:mapping_info}

`<mapping_info>` controls how the **"Description"** and **"Unit"** columns are included in the output:

- `"Yes"`   → Uses descriptions and units from the mapping file.  

- `"No"`    → Excludes `"Description"` and `"Unit"`.  

- `"GTAPv7"`→ Applies default definitions and units from *GTAP Model Version 7*.  

- `"Mix"`   → Uses manual values when available; otherwise, applies GTAP defaults.  

```{r Information Structure, eval = FALSE}
mapping_info <- "Mix"
```

**Note:** GTAP defaults apply only to variables included in the GTAPv7 model. Any additional variables must be manually defined.


## Output Formats {#sec:output-formats}

Select the required output formats (`"Yes"` = export, `"No"` = skip):  

- **CSV (`csv.output`)** → `"No"`  

- **STATA (`stata.output`)** → `"No"`  

- **R (`r.output`)** → `"Yes"`  

- **Text (`txt.output`)** → `"Yes"`  

You can also choose to export only the organized raw data, with or without visualization, by setting `plot_data` to `TRUE` or `FALSE`.  

The following command exports all formats and generates data for plotting:  

```{r Output Formats, eval = FALSE}
csv.output <- "YES"    
stata.output <- "YES"  
r.output <- "YES"
txt.output <- "YES"   

plot_data = TRUE

# Convert units (optional)
# Options: "mil2bil", "bil2mil", "pct2frac", "frac2pct" — see details in `?convert_units`
sl4_convert_unit <- c("mil2bil")
har_convert_unit <- c("mil2bil")
```

<details>
  <summary class="toggle-summary">💡 Tip: Unit Conversion</summary>
  
<div class="tip-box">
You can convert result units for 'sl4' and 'har' independently by using the following automatic options, see `?convert_units`:

- `"mil2bil"`: converts million USD to billion USD.
- `"bil2mil"`: converts billion USD to million USD.
- `"pct2frac"`: converts percentages to fractions.
- `"frac2pct"`: converts fractions to percentages.
- `NULL`: No conversion
</div>

</details>


# R Configuration Summary {#sec:r-config-summary}

In summary, the entire R setup is captured in the following chunk:  

```{r Config Summary, eval = FALSE}
# 1. Project Directory
project.folder <- "your/project/folder"

# 2. Define the Input Names 
experiment <- c("TAR10", "SUBT10")

# 3. Adding Description / Unit (Yes/No/GTAPv7/Mix)
mapping_info <- "Mix"

# 4. Choosing Output: (CSV, STATA, R, TEXT) 
csv.output <- "No"    
stata.output <- "No"  
r.output <- "No"
txt.output <- "No"   

# 5. For Plotting: (TRUE/FALSE)
plot_data = TRUE
```

<details>
  <summary class="toggle-summary">💡 Tip</summary>
  
<div class="tip-box">
 Once this process is complete, you can use the same format for future documents—saving you time and ensuring consistency.
</div>

</details>

You can simply run the following code to setup subdirectories without modification if you followed all the previous instructions:

```{r Default Input, eval = FALSE}
# Default Subdirectories:
input.folder <- paste0(project.folder, "/in")
map.folder <- paste0(project.folder, "/map")
output.folder <- paste0(project.folder, "/out")

# Default Mapping File:
sl4map <- readxl::read_xlsx(paste0(map.folder, "/OutputMapping.xlsx"), sheet = "SL4File")
harmap <- readxl::read_xlsx(paste0(map.folder, "/OutputMapping.xlsx"), sheet = "HARFile")
filter.map <- readxl::read_xlsx(paste0(map.folder, "/OutputMapping.xlsx"), sheet = "FilterData")

# Filtering Data:
selected_regions <- if(length(filter.map$Region) > 0) filter.map$Region else NULL
selected_sector  <- if(length(filter.map$Sector) > 0) filter.map$Sector else NULL
```

**Note:** If you're familiar with R and prefer a more flexible directory structure, you can customize any of these. However, you must also define them in the function. 


# Automated Data Extraction {#sec:automategtap}

To streamline figure generation from GTAP results, I developed an automated data extraction method using the following command:  

```{r Preparing Data for Plot, eval = FALSE}
auto_gtap_data(
  experiment = experiment,
  process_sl4_vars = sl4map, 
  process_har_vars = harmap,
  mapping_info = mapping_info,
  sl4_mapping_info = sl4map,
  har_mapping_info = harmap,
  sl4_convert_unit ="mil2bil",
  har_convert_unit = "mil2bil",
  region_select = selected_regions,
  sector_select = selected_sector,
  subtotal_level = FALSE,
  rename_columns = TRUE,
  decimals = 4,
  project_path = project.folder,
  plot_data = plot_data,
  output_formats = list(
    "csv" = csv.output,
    "stata" = stata.output,
    "rds" = r.output,
    "txt" = txt.output))
```

W
# Tips

## Manual Mapping Files

You can easily create a filter file (`Filter`) using the following code:

```{r Munual FilterData, eval=FALSE}
selected_regions <- c("EastAsia", "SEAsia", "Oceania")
selected_sector  <- NULL
```

You can also manually create a mapping file that replicates the structure of the `SL4File` and `HARFile` sheets using the following code:

```{r Munual Mapping File Create, eval=FALSE}
mapping_df <- data.frame(
  Variable = c("qgdp", "EV", "ppriv"),
  Description = c("Real GDP Index", "Welfare Equivalents", "Consumer Price Index"),
  Unit = c("Percent", "million USD", "percent"),
  stringsAsFactors = FALSE
)
```


## Sorting Rules {#sec:sorting-output}

These predefined lists determine the display order of outputs, i.e., figures and tables:  

- `<experiment>` sorts the column `"Experiment"`, i.e., your input files.  

- `<selected_region>` sorts the column for defined countries by GTAP; the default is `"REG"`.  

- `<selected_sectors>` sorts the column for defined sectors by GTAP; the default is `"COMM"` and `"ACTS"`.  

<details>
  <summary class="toggle-summary">💡 Tip </summary>
  
<div class="tip-box">
 To customize sorting for additional columns, see `Sorting Data` in the `Utilities` manuscript. 
</div>
</details>

For example, this setup with will display the figure as shown below:  

- `experiment <- c("TAR10", "SUBT10")`  

- `selected_regions <- c("EastAsia", "SEAsia", "Oceania")`  


# Sample Data

Sample data used in this vignette is obtained from the GTAPv7 model
and utilizes data from the GTAP 11 database. For
more details, refer to the [GTAP Database Archive](https://www.gtap.agecon.purdue.edu/databases/archives.asp).