--- title: "defined" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{defined} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(dataset) ``` `defined()` is a vector subclass of labelled. Labelled improves the semantic capacity of a base R factor with improved value levels and labels by adding a long-form, human-readable label to the variable itself. ```{r} gdp_1 = defined( c(3897, 7365), label = "Gross Domestic Product", unit = "million dollars", definition = "http://data.europa.eu/83i/aa/GDP") ``` The `defined()` class extends the attributes of a labelled vector with a unit (of measure), a definition and a namespace. ```{r} attributes(gdp_1) cat("Get the label only: ") var_label(gdp_1) cat("Get the unit only: ") var_unit(gdp_1) cat("Get the definition only: ") var_definition(gdp_1) ``` What happens if we try to concatenate a semantically under-specified new vector to the GDP vector? ```{r} gdp_2 <- defined(2034, label = "Gross Domestic Product") ``` You will get an intended error message that some attributes are not compatible. You certainly want to avoid that you are concatenating figures in euros and dollars, for example. ```{r, eval=FALSE} c(gdp_1, gdp_2) Error in `vec_c()`: ! Can't combine `..1` <haven_labelled_defined> and `..2` <haven_labelled_defined>. ✖ Some attributes are incompatible. ``` Let's define better the GDP of San Marino: ```{r gpd2} var_unit(gdp_2) <- "million dollars" ``` ```{r vardef2} var_definition(gdp_2) <- "http://data.europa.eu/83i/aa/GDP" ``` ```{r c} summary(c(gdp_1, gdp_2)) ``` ```{r country} country = defined(c("AD", "LI", "SM"), label = "Country name", definition = "http://data.europa.eu/bna/c_6c2bb82d", namespace = "https://www.geonames.org/countries/$1/") ``` The point of using a namespace is that it can point to a both human- and machine readable definition of the ID column, or any attribute column in the datasets. (Attributes in a statistical datasets are characteristics of the observations or the measured variables.) For example, the namespace definition above points to <https://www.geonames.org/countries/AD/> in the case of Andorra, <https://www.geonames.org/countries/LI/> for Lichtenstein, and <https://www.geonames.org/countries/SM/> for San Marino. And <http://publications.europa.eu/resource/authority/bna/c_6c2bb82d> resolves to a machine-readable definition of geographical names. ## Coerce to base R types Coerce back the labelled country vector to a character vector: ```{r coerce-char} as_character(country) as_character(c(gdp_1, gdp_2)) ``` And to numeric: ```{r coerce-num} as_numeric(c(gdp_1, gdp_2)) ```