Get started with istat package

Alissa Lelli, Elena Gradi

library(istat)

R Introduction

istat package allows you to obtain data from Istat databases within R environment. As of September 2024, there are 2 sources of data: I.Stat and IstatData. Istat is replacing I.Stat with IstatData platform, but I.Stat can still be used as a source. Searching and downloading data sets from the new platform allows you to have access to more data sets (they can be found at https://esploradati.istat.it/databrowser/). This package allows you to search, get, filter and plot data sets. It will follow the explanation of provider related functions, then the explanation of filter and plot functions and lastly shinyIstat will be introduced.

Note that, when using get_i_stat or get_istatdata, the function may take some time to download datasets.

I.Stat

I.Stat is the old Istat data warehouse that is still accessible. Functions that retrieve data from I.Stat end with i_stat. Available functions are:

list_i_stat

This function allows you to obtain the complete list of available I.Stat data, with their ID and name. Default language is Italian (“ita”), but you can also select English as follows:

head(list_i_stat(lang = "eng"))

#> [rsdmx][INFO] Fetching 'http://sdmx.istat.it/SDMXWS/rest/dataflow/all/all/latest/' 
#>        ID                                                        Name
#> 1 101_1015                                                       Crops
#> 2 101_1030                           PDO, PGI and TSG quality products
#> 3 101_1033                                                slaughtering
#> 4 101_1039                                Agritourism - municipalities
#> 5 101_1077 PDO, PGI and TSG products:  operators - municipalities data
#> 6   101_12                                         Agricoltural prices

If you find the data that you were looking for, take note of its ID: you will need it to download it through get_i_stat.

search_i_stat

If you are looking for a specific data set, you can search it by keywords. Let’s suppose you are looking for data about ‘water’. You can search it as follows (as before, default language is Italian) as follows:

search_i_stat("water", lang = "eng")

#>[rsdmx][INFO] Fetching 'http://sdmx.istat.it/SDMXWS/rest/dataflow/all/all/latest/' 
#>      id       name                                 
#> [1,] "12_323" "Urban wastewater treatment plants"  
#> [2,] "12_340" "Water abstraction for drinkable use"
#> [3,] "12_60"  "Public water supply use" 

You decide that you want to download “Public water supply use” data set. You will need its id, which is “12_60”, and will be used as an example.

get_i_stat

get_i_stat(id_dataset = "12_60",
           start_period = NULL,
           end_period = NULL,
           recent = FALSE,
           csv = FALSE,
           xlsx = FALSE,
           lang = "both")

This code downloads the entire data set, without any filter, but you can customize it through the parameters of the function:

Note that if recent is TRUE, then both start_period and end_period has to be NULL, and viceversa.

IstatData

IstatData is the new Istat data warehouse. Functions that retrieve data from IstatData end with _istatdata. Available functions are:

Notice that in this first version data set are retrieved through URL. For this reason, list_istatdata and search_istatdata will provide agencyId and version in addition to ID and name, that will be needed to download a data set through get_istatdata.

list_istatdata

This function allows you to obtain the complete list of available IstatData data, with their agencyID, ID, version and name. Default language is Italian (“ita”), but you can also select English as follows:

head(list_istatdata(lang = "eng"))

#>   agencyId                               ID version                                   Name
#> 1      IT1                         101_1015     1.0                                  Crops
#> 2      IT1  101_1015_DF_DCSP_COLTIVAZIONI_1     1.0              Areas and production -...
#> 3      IT1 101_1015_DF_DCSP_COLTIVAZIONI_10     1.0                        Sowing forecast
#> 4      IT1  101_1015_DF_DCSP_COLTIVAZIONI_2     1.0             Areas and production - ...
#> 5      IT1                         101_1030     1.0      PDO, PGI and TSG quality products
#> 6      IT1        101_1030_DF_DCSP_DOPIGP_1     1.0                    Operators by sector

If you find the data that you were looking for, take note of its agencyId, ID and version: you will need them to download it through get_istatdata.

search_istatdata

If you are looking for a specific data set, you can search it by keywords. Let’s suppose you are looking for data about ‘water’. You can search it as follows (as before, default language is Italian) as follows:

search_istatdata("water", lang = "eng")

#>      agencyId       id                   version   name
#> [1,] "IT1" "12_323_DF_DCCV_IMPDEP_1"        "1.0" "Urban wastewater treatment plants(...)"
#> [2,] "IT1" "12_323_DF_DCCV_IMPDEP_2"        "1.0" "Urban wastewater treatment plants(...)" 
#> [3,] "IT1" "12_340_DF_DCCV_PRELACQ_1"       "1.0" "Water abstraction for drinkable use"
#> [4,] "IT1" "12_60_DF_DCCV_CONSACQUA_2"      "1.0" "Public water supply use(...)"
#> [5,] "IT1" "18_635_DF_DCCV_CENERG_8"        "1.0" "Water system-availability (...)"
#> [6,] "IT1" "18_635_DF_DCCV_CENERG_9"        "1.0" "Water system-Type (...)"
#> [7,] "IT1" "609_1_DF_DCCV_URBANENV_1"       "1.0" "Water - consumption" 
#> [8,] "IT1" "609_1_DF_DCCV_URBANENV_2"       "1.0" "Water - rationing"
#> [9,] "IT1" "82_87_DF_DCCV_AVQ_FAMIGLIE_19"  "1.0" "House costs, water and other (...)" 
#>[10,] "IT1" "83_85_DF_DCCV_AVQ_PERSONE1_211" "1.0" "Water and carbonate beverages (...)"
#>[11,] "IT1" "83_85_DF_DCCV_AVQ_PERSONE1_212" "1.0" "Water and carbonate beverages (...)"
#>[12,] "IT1" "83_85_DF_DCCV_AVQ_PERSONE1_213" "1.0" "Water and carbonate beverages (...)"
#>[13,] "IT1" "83_85_DF_DCCV_AVQ_PERSONE1_214" "1.0" "Water and carbonate beverages (...)"
#>[14,] "IT1" "9_951_DF_DCCV_CAVE_MIN_4"       "1.0" "Natural mineral waters extracted (...)"

You decide that you want to download “Public water supply use - municipalities” data set. You will need its agencyId, id and version, which are, respectively, “IT1”, “12_60_DF_DCCV_CONSACQUA_2”, and “1.0” (this data set will be used as an example).

get_istatdata

get_istatdata(agencyId = "IT1",
              dataset_id = "12_60_DF_DCCV_CONSACQUA_2",
              version = "1.0",
              start = NULL,
              end = NULL,
              recent = FALSE,
              csv = FALSE,
              xlsx = FALSE)

This code downloads the entire data set, without any filter, but you can customize it through the parameters of the function:

Note that if recent is TRUE, then both start_period and end_period has to be NULL, and viceversa. Moreover, “lang” parameter can’t be specified, since this version of the package can’t retrieve IstatData data sets’ labels.

Filter your data

The package offers you the possibility to filter data set through the function filter_istat; filter_istat_interactive is the same function but interactive. To show how they work, we will use ‘iris’ data.

filter_istat

You can filter a data set by selecting the column(s) to filter, and then selecting for which value of the column to filter the data set through datatype. In this example, we filtered for one column:

data(iris)
filter_istat(iris, columns = "Species", datatype = "setosa")

#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1           5.1         3.5          1.4         0.2  setosa
#> 2           4.9         3.0          1.4         0.2  setosa
#> 3           4.7         3.2          1.3         0.2  setosa
#> 4           4.6         3.1          1.5         0.2  setosa
#> 5           5.0         3.6          1.4         0.2  setosa
#> 6           5.4         3.9          1.7         0.4  setosa
#>  ... 

Now, let’s filter for more than one column:

data(iris)
filter_istat(iris, columns = c("Species", "Petal.Length"), datatype = c("setosa", "1.5"))

#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 4           4.6         3.1          1.5         0.2  setosa
#> 8           5.0         3.4          1.5         0.2  setosa
#> 10          4.9         3.1          1.5         0.1  setosa
#> 11          5.4         3.7          1.5         0.2  setosa
#> 16          5.7         4.4          1.5         0.4  setosa
#> 20          5.1         3.8          1.5         0.3  setosa
#> 22          5.1         3.7          1.5         0.4  setosa
#> 28          5.2         3.5          1.5         0.2  setosa
#> 32          5.4         3.4          1.5         0.4  setosa
#> 33          5.2         4.1          1.5         0.1  setosa
#> 35          4.9         3.1          1.5         0.2  setosa
#> 40          5.1         3.4          1.5         0.2  setosa
#> 49          5.3         3.7          1.5         0.2  setosa

And for more than one value per column:

filter_istat(iris, columns = c("Species","Petal.Width"),  datatype = list(c("virginica","setosa"), c("0.1","1.9")))

#>     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#> 10           4.9         3.1          1.5         0.1    setosa
#> 13           4.8         3.0          1.4         0.1    setosa
#> 14           4.3         3.0          1.1         0.1    setosa
#> 33           5.2         4.1          1.5         0.1    setosa
#> 38           4.9         3.6          1.4         0.1    setosa
#> 102          5.8         2.7          5.1         1.9 virginica
#> 112          6.4         2.7          5.3         1.9 virginica
#> 131          7.4         2.8          6.1         1.9 virginica
#> 143          5.8         2.7          5.1         1.9 virginica
#> 147          6.3         2.5          5.0         1.9 virginica

Here, the function filtered the data set ‘iris’ for the values ‘virginica’ and ‘setosa’ of the column ‘Species’ and for the values ‘0.1’ and ‘1.9’ of the column ‘Petal.Width’.

filter_istat_interactive

This function works the same as the previous one, with the difference that in this case you will be guided through the filtering process. An example:

filter_istat_interactive(iris, lang = "eng")

#> Available columns:
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     
#> Enter the column(s) (separated by comma): Petal.Width, Species
#> Available values for column Petal.Width :
#>  [1] 0.2 0.4 0.3 0.1 0.5 0.6 1.4 1.5 1.3 1.6 1.0 1.1 1.8 1.2 1.7 2.5 1.9 2.1 2.2 2.0 2.4 2.3
#> Enter the chosen values for column Petal.Width (separated by comma): 0.1
#> Available values for column Species :
#> [1] setosa     versicolor virginica 
#> Levels: setosa versicolor virginica
#> Enter the chosen values for column Species (separated by comma): setosa
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 10          4.9         3.1          1.5         0.1  setosa
#> 13          4.8         3.0          1.4         0.1  setosa
#> 14          4.3         3.0          1.1         0.1  setosa
#> 33          5.2         4.1          1.5         0.1  setosa
#> 38          4.9         3.6          1.4         0.1  setosa

Plot your data

The function plot_interactive allows you to graphically visualize your data, and it is intended to be use with exploratory purposes only. The available plots are:

shinyIstat

shinyIstat is a shiny application which integrates the functions of the istat package in a user friendly interface. This app aims to provide a useful tool to search, get, filter and plot those data sets. Here are the main features:

Use the menu on the left to navigate through the app. Inside each panel you will find further help by simply clicking on the green question marks ?.