---
title: "Introduction to teal.data"
author: "NEST CoreDev"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to teal.data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Introduction

The `teal.data` package specifies the data format used in `teal` applications. 


A `teal_data` is meant to be used for reproducibility purposes. The class inherits from [`qenv`](https://insightsengineering.github.io/teal.code/latest-tag/articles/qenv.html) and we encourage to get familiar with [`teal.code`](https://insightsengineering.github.io/teal.code/latest-tag/) first. `teal_data` has following characteristics:

- It inherits from the environment and methods such as `$`, `get()`, `ls()`, `as.list()` work out of the box.
- `teal_data` is a locked environment, and data modification is only possible through the `teal.code::eval_code()` and `within.qenv()` functions.
- It stores metadata about the code used to create the data (see [reproducibility](#reproducibility)).
- It supports slicing by `[`.
- It is immutable which means that each code evaluation does not modify the original `teal_data` environment directly.
- It maintains information about relationships between datasets (see [Join Keys](#relational-data-models)).

## Quick Start

To create an object of class `teal_data`, use the `teal_data` function.
`teal_data` has a number of methods to interact with the object.

```{r, results = 'hide', message = FALSE}
library(teal.data)

# create teal_data object
my_data <- teal_data()

# run code within teal_data to create data objects
my_data <- within(
  my_data,
  {
    data1 <- data.frame(id = 1:10, x = 11:20)
    data2 <- data.frame(id = 1:10, x = 21:30)
    data3 <- data.frame(id = 1:10, x = 31:40)
  }
)

# get objects stored in teal_data
my_data[["data1"]]
my_data[["data2"]]

# limit objects stored in teal_data
my_data[c("data1", "data3")]

# get reproducible code
get_code(my_data)

# get code just for specific object
get_code(my_data, names = "data2")

# get datanames
names(my_data)

# print
print(my_data)
```

### Reproducibility

The primary function of `teal_data` is to provide reproducibility of data.
We recommend to initialize empty `teal_data`, which marks object as _verified_, and create datasets by evaluating code in the object, using `within` or `eval_code`.
Read more in [teal_data Reproducibility](teal-data-reproducibility.html).

```{r}
my_data <- teal_data()
my_data <- within(my_data, data <- data.frame(x = 11:20))
my_data <- within(my_data, data$id <- seq_len(nrow(data)))
my_data # is verified
```

### Relational data models

The `teal_data` class supports relational data.
Relationships between datasets can be described by joining keys and stored in a `teal_data` object.
These relationships can be read or set with the `join_keys` function.
See more in [join_keys](join-keys.html).

```{r}
my_data <- teal_data()
my_data <- within(my_data, {
  data <- data.frame(id = 1:10, x = 11:20)
  child <- data.frame(id = 1:20, data_id = c(1:10, 1:10), y = 21:30)
})

join_keys(my_data) <- join_keys(
  join_key("data", "data", key = "id"),
  join_key("child", "child", key = "id"),
  join_key("child", "data", key = c("data_id" = "id"))
)

join_keys(my_data)

# join_keys for limited object
join_keys(my_data["child"])
```

### Hidden objects

An object is hidden in `teal_data` if its name starts with a dot (`.`). This can be used to pass auxiliary objects in
the `teal_data` instance, without being visible in the `teal` summary and filter panel.

```{r}
my_data <- teal_data()
my_data <- within(my_data, {
  data <- data.frame(id = 1:10, x = 11:20)
  .data2 <- data.frame(id = 1:20, data_id = c(1:10, 1:10), y = 21:30)
})

ls(my_data)
names(my_data)
```