---
title: "Getting Started With care4cmodel"
output: rmarkdown::html_vignette
bibliography: refs_vignette.bib
vignette: >
  %\VignetteIndexEntry{Getting Started With care4cmodel}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7.15,
  fig.height = 4.5
)

set.seed(12) # Initialize random generator for constant output 
```

This vignette's purpose is to serve as a guide for getting quickly started with
the R package *care4cmodel*. While we give a short introduction the concept of
the package below, we assume that you already know why you want to use the 
package, and that you are not completely unfamiliar with the idea behind it.
Here, we will only provide scientific details if they are necessary for 
understanding how to work with the package. For a complete and detailed 
scientific presentation see 
[@biber_et_al_2024](https://doi.org/10.1016/j.compag.2024.109091).


## 1 Concept of the package

The R package *care4cmodel* has been designed in the context of quantifying 
carbon flows related to the growth and management of forests. It is, in essence,
a dynamic simulation model that is surrounded with tools to enable users to 
quickly set up and simulate scenarios and evaluate them. The current version is 
made for evaluating silvicultural concepts with a focus on opposing the CO~2~ 
uptake from wood growth to the CO~2~ emissions caused by forest operations. 

The basic idea of the model is that a forest area managed under a
given concept can be adequately represented by a set of forest development
stages, each of which covers a certain share of the total area. These area 
shares change dynamically over time due to continuous forest dynamics, but they 
can also change abruptly due to hazard events. Any further simulation results 
are basically obtained by upscaling phase-wise information to these areas. Note,
that *care4cmodel* is not a growth and yield simulator in the usual sense, i.e. 
the required basic growth and yield information relating to the concept of 
interest must be provided by the user (see 
[Section 3.1](#section_concept_definition)). While the current implementation is
constrained to providing the carbon related information as mentioned above, the 
concept of the model is generic and lends itself to a broad variety of 
applications.


## 2 Quickstart {#quickstart}

If you simply want to run and evaluate a simulation without further reading,
here's your code:

```{r quickstart, message = FALSE}
library(care4cmodel) # Attach the package

# Run a simulation and store its base results in a variable sim_base_out
# call ?simulate_single_concept for details
sim_base_out <- simulate_single_concept(
    concept_def = pine_thinning_from_above_1, # use pre-defined concept
    init_areas  = c(800, 0, 0, 0, 0, 200),
    time_span   = 200,
    risk_level  = 3
)

# Evaluate the base results for carbon related information
# call ?fuel_and_co2_evaluation for details
carbon_out <- fuel_and_co2_evaluation(sim_base_out, road_density_m_ha = 35)
```

Both result variables, `sim_base_out`, and `carbon_out` contain large objects
which are, in essence, named lists. You can easily explore them manually, but
there exist convenient functions for visualization:

```{r quickstart_plot_base}
# Plot base results. Without further specifications, this will generate a plot
# of the areas covered by the stand development phases over time

# Check the documentation with ?plot.c4c_base_result in order to see all 
# options for plotting, especially growth and yield variables
plot(sim_base_out) 

```


```{r quickstart_plot_carbon}
# Plot carbon related results. The selected option for plot_type generates a
# phase diagram where the total CO2 emissions due to forest operations are
# plotted over the CO2 sequestered by wood growth.

# Check the documentation with ?plot.c4c_co2_result in order to see all options
# for plotting
plot(carbon_out, plot_type = "em_vs_inc")

```

Great, if this got you sufficiently started. If you want more explanations, read
on.


## 3 Using care4cmodel

After installing the package on your computer, the easiest (and standard) way 
to access the functionality of the package is to attach it with
```{r setup}
library(care4cmodel)
```

### 3.1 Silvicultural concept definition {#section_concept_definition}

The fundamental requirement for running simulations with *care4cmodel* is an 
appropriate definition of the silvicultural concept of interest. Such 
definitions are objects of class *c4c_concept*, and the most convienent way to
generate one of your own is to use the function `c4c_concept()`. The main 
argument to that function is a data frame with growth and yield information, as
defined below. Many users will find it convenient to assemble this information
in a spreadsheet software like MS EXCEL, and import it into R as a data frame.
With `?c4c_concept` you can call the documentation of `c4c_concept()`. Using 
this function includes a series of automatic checks in order to ensure that the 
definition is technically correct. The package contains two pre-defined concept 
definitions, `pine_thinning_from_above_1`, and, 
`pine_no_thinning_and_clearcut_1`. We use the former for explaining the main 
content of such a definition. Simply type the name of the concept to display its
contents:

```{r example_concept}
pine_thinning_from_above_1
```

Basically, such an object of class `c4c_concept` (as created with the function 
of the same name) is a list with three elements. The first
element, `concept_name` is to be defined by the user. The second element, 
`units` lists the most important units to be used when simulating this concept.
This feature is not actively used in the current version (and is not required
to be defined by the user), i.e. all numbers connected to any silvicultural 
concept are understood in these units if not explicitly stated otherwise.
However in future versions, it is planned to be more flexible in that regard. 
The third element, `growth_and_yield` is actually the most important. It is a 
data frame that defines the stand development phases to be considered, their 
duration, and a set of fundamental growth and yield variables. Typically this 
information comes from observational plots, silvicultural guidelines, forest 
growth simulators, yield tables, and similar sources, or a mixture of these.

The number of stand development phases is totally up to the user; for most 
applications, however, four to seven phases should be sufficient. Each line of
the data frame completely defines one such phase. The phases are taken to be 
subsequent to each other in the order of the column `phase_no`. The last phase 
is followed by the first phase again, which typically means a final harvest 
followed by an initial stand. The columns of the data frame are defined as 
follows (note that, depending on your system, the contents of some columns, and
some columns themselves might be cut off in the display above):

+ **phase_no** Integer numbers that indicate the sequence of how the stand 
development phases follow after each other
+ **phase_name** User-defined names for the stand development phases, must be
unique
+ **duration** The (average) duration of each phase in years
+ **n_subphases** Integer numbers >= 1, indicating the number of subphases to 
each phase to be considered internally. This is useful and required for 
adjusting the "blur" when forest areas move through a sequence of phases. I.e.
when a stand development phase has a duration of $D$ years, and $n$ subphases,
a unit area will remain on average $D$ years in that phase, but with a variance 
of $\sigma^2=\frac{D^2}{n}$. In many cases, the rule of thumb to have at least 
three subphases and otherwise $\frac{D}{n}\approx 5$, will be adequate.
+ **vol_standing** The average standing wood volume (m³/ha) of a stand in the
phase of interest
+ **vol_remove** The wood volume (m³/ha/a) that is regularly removed (harvested)
per year and hectare from a stand in the phase of interest. This does not 
include wood that is harvested due (partial) stand destruction by hazard events
+ **vol_mort** The wood volume (m³/ha/a) that is regularly dying per year and
hectare in the phase of interest. Like vol_remove, this does not include 
mortality due to stand destruction by hazard events
+ **n_standing** The average number of standing live trees per hectare in a 
stand of the given phase
+ **n_remove** The number of trees that are removed per hectare and year due to
regular harvest in the phase of interest
+ **dbh_standing** The average mean diameter at breast height, dbh (cm), of a 
stand in the phase of interest
+ **dbh_remove** The average mean dbh (cm) of the trees being regularly 
harvested in a stand of the given phase
+ **harvest_interval** While vol_remove, and n_remove are annual numbers, the
harvest interval is the actual time between two harvest operations in the phase
of interest
+ **survival_cum** Probability (i.e. number between 0 and 1) of a stand to 
survive all occuring hazard events *since the beginning of the first phase* to 
the end of the phase of interest. Therefore, the sequence of these numbers must 
be decreasing along the sequence of phases. The sequence of probabilities given
in the concept definition is considered the baseline risk. We demonstrate below
how that baseline can be conveniently adjusted for covering other risk scenarios
+ **vol_increment** The average annual wood volume incrment (m³/ha/a) in the
phase of interest. Unlike the previous variables, vol_increment is not required
to be provided by the user when generating a concept definition with 
`c4c_concept()`. It results directly from the phase durations, the standing
volumes, the removal, and the mortality volume


### 3.2 Running a simulation {#section_running_simulation}

The workhorse function for conducting simulation runs with *care4cmodel* is
`simulate_single_concept()`. For simulating an example with the concept 
`pine_thinning_from_above_1` we use it as follows:

```{r base_simulation}
sim_base_out <- simulate_single_concept(
    concept_def = pine_thinning_from_above_1,
    init_areas  = c(1000, 0, 0, 0, 0, 0),
    time_span   = 200,
    risk_level  = 3
)
```

The first argument `concept_def` is the definition of the silvicultural 
concept to be used, it must be a valid object of class `c4c_concept` 
([Section 3.1](#section_concept_definition)). The second argument, `init_areas`,
is a vector that sets the initial areas covered by the stand development phases 
as defined in the silvicultural concept (in ascending order). In the example,
`init_areas` indicates 1000 ha in the first stand development phase, and no 
areas covered by any other phase. This can be seen as an afforestation of a
1000 ha forest area. During the simulation, the total area (1000 ha in our
example) will remain constant, while the shares of the phases will be changing.
The size of the areas and their initial distribution is fully up to the user.
Note that the initial areas can also be specified on the level of the internal
sub-phases ([Section 3.1](#section_concept_definition)). Call the function's 
documentation with `?simulate_single_concept` for this and other details.
The argument `time_span` is the number of years to cover with the simulation.
200 years, as defined here, is certainly longer than required for typical 
applications. With the argument `risk_level`, you can adjust the occurence of 
stochastic hazard events relative to the baseline risk settings made in 
`survival_cum` in the concept definition 
([Section 3.1](#section_concept_definition)). In the example, the setting 
`risk_level = 3` means that the actual damage probabilities are the same as if 
the events leading to the baseline risk would happen three times in sequence.
Consequently, `risk_level = 1` would use exactly the baseline risk; numbers 
smaller than 1 would indicate a lower risk, e.g. `risk_level = 1/2` uses 
damage probabilities as if only half of the baseline risk events happened. 
Setting `risk_level = 0` will simulate the development without any stochastic
hazard events. In our example, we store the output in a variable, we call
`sim_base_out`. Internally, the simulation is based on functionality provided by
the R package 
[deSolve](https://CRAN.R-project.org/package=deSolve) 
[@Soetart_et_al_2010].


### 3.3 Exploring the base simulation output {#explore_base_sim_out}

As its output, a simulation with `simulate_single_concept()` generates an 
object of class `c4c_base_result`. This large object comprises different
categories of information, and it has an own `plot()` function for convenient 
result visualization as we will show below. In essence, such an object is a 
named list containing several vectors, matrices, and data frames. In the example
above ([Section 3.2](#section_running_simulation)), we stored such an output 
object in the variable `sim_base_out`. For the sake of clarity, we do not 
display the entire object here. In order to get an overview, let us first 
display the names of the top level list elements:

```{r show_base_output}
names(sim_base_out)
```

The first six list elements, `concept`, `time_span`, `detailed_init`, 
`detailed_out`, `init_areas`, and `risk_level` document the settings of  
`simulate_single_concept()` that were used for the simulation. You can access 
them most conveniently by their name, e.g.

```{r base_access_init_areas}
sim_base_out$init_areas
```

The list element `parameters` contains interal simulation parameters derived 
from the user settings; most important sub-phase wise dwell times and detailed 
risk related information.

The two remaining list items `sim_areas`, and `sim_growth_and_yield` contain 
actual simulation output. Both are named lists themselves. `sim_areas` contains
three matrices named as follows:

```{r names_sim_areas}
names(sim_base_out$sim_areas)

```

All three matrices have the same structure; each line represents a point in
time, i.e. the end of a simulation year. The first column denotes these times
in years. The other columns represent the subsequent stand development phases
from left to right. The matrix `areas` simply contains each phase's area in ha
at the end of the respective simulation year. The matrix `area_inflows_regular`
represents the areas in ha that flowed into a phase from the previous phase
during the respective simulation year. Analogously, the matrix 
`area_outflows_events` contains the areas that flowed out of a given phase into
the (first subphase of the) initial phase due to hazard events. As both of the
latter matrices represent flows during a year, they contain NA values for the
initial situation, i.e. time 0.

```{r area_matrices}
head(sim_base_out$sim_areas$areas)
head(sim_base_out$sim_areas$area_inflows_regular)
head(sim_base_out$sim_areas$area_outflows_events)
```

The other remaining list element `sim_growth_and_yield` is a named list of data
frames that contain growth and yield information which was, in essence, obtained
by upscaling the growth and yield information provided in the silvicultural 
concept definition ([Section 3.1](#section_concept_definition)). We obtain an
overview with:

```{r sim_growth_and_yield}
sim_base_out$sim_growth_and_yield
```

As these data frames are mostly self explaining, let us just point out a few
details. If not stated otherwise all numbers given there represent cubic meters
wood. In the first data frame `gyield_summary`, the column `vol_rmv_cont` stands
for wood volume that is continually removed (harvested) as planned according
to the concept definition ([Section 3.1](#section_concept_definition)). 
`vol_rmv_damage` in contrast, is all wood volume of the stands that were lost
due to hazard events, assuming that this wood is harvested in its entirety. 
Consequently, `vol_rmv_total` is the sum of both, regular and damage-induced
harvest. Note that there are two variables related to volume increment, 
`vol_inc_ups`, and `vol_inc`. The former is upscaled from the concept 
definition, and this the volume increment which should be solely used for 
subsequent calculations. The latter results from a differently defined post-hoc 
calculation and is provided for internal reasons only. The data frames that make
up the sub-list `gyield_phases` give a more detailed view on the variables 
provided with `gyield_summary`, i.e. they split them among the stand development 
phases.

The last data frame, `gyield_harvest_detail` is of special importance, as it 
contains detailed growth and yield information that is required for calculating 
fuel consumption and CO~2~ emissions due to harvest operations. That is, the 
average dbh of a tree to be harvested, `dbh_cm`, the average tree volume, 
`vol_tree_m3`, the average neighbor distance of the harvest trees, 
`tree_dist_m`, and the volume to be harvested, `volume`. Note that `volume` can
be 0 even though there are non-zero values for the other variables.

As mentioned in the [quickstart section](#quickstart), there is an own plot 
function for the output objects of the function `simulate_single_concept()`. Its
documentation is accessible by calling `?plot.c4c_base_result`. The plot shown
in the quickstart section shows the default, the development of the areas
covered by the different stand development phases over time. Due to the CRAN
policies we can only provide very few other example plots; the reader is 
encouraged to try out all options listed in the documentation. First, we 
visualize the standing volume:

```{r plot_standing_volume}
plot(sim_base_out, variable = "vol_standing") # standing volume

```

Second, we plot the total harvested volume. The sharp peaks result from wood 
that is removed due to hazard events.

```{r plot_vol_rmv_total}
plot(sim_base_out, variable = "vol_rmv_total") # total harvested volume

```



### 3.4 Calculating carbon related results

In order to obtain the fuel consumption, and subsequently the CO~2~ emissions
caused by forest operations, the package provides the function 
`fuel_and_co2_evaluation`. The forest operations taken into account comprise
cutting, moving the timber to the forest road (forwarding, skidding), and 
maintenance of the forest road network. A typical call of 
`fuel_and_co2_evaluation` in the context of our example is shown below. Note 
that in contrast to the [quickstart section](#quickstart), we explicitly call 
parameters with default values for demonstration.

```{r fuel_and_co2, message = FALSE}
# Calculate information about fuel consumption
# call documentation with ?fuel_and_co2_evaluation for detail information
carbon_out <- fuel_and_co2_evaluation(
  sim_base_out,             # object obtained from simulate_single_concept
  road_density_m_ha = 35,   # forest road density
  raw_density_kg_m3 = 520,  # default setting, wood density
  harvest_loss      = 0.1,  # default, share of volume lost at harvest
  bark_share        = 0.12, # default, bark share of volume
  mode  = "standard" # default, choice is between "standard" and "nordic"
)
```

The only parameters, the user must provide in any case, are an object of class
`c4c_base_result` as obtained from `simulate_single_concept()` and the forest 
road density (m/ha) in the area of interest (relating to truck roads). The 
parameter `mode` allows to choose the assumptions for the harvest operations.
Currently, there are only two options, "standard", and "nordic". In both cases,
it is assumed that the wood is felled and cut by a harvester, and transported
to the forest road with a forwarder. With the option "standard", the model uses
functions estimated after @Bacescu_et_al_2022 and @grigolato_et_al_2022. With 
the option "nordic", we apply functions provided by @karha_et_al_2023, instead. 
As the cutting under the option "standard" does only take into account harvest 
operations with stem diameters at breast height of at least 15 cm, it uses the 
cutting function by @karha_et_al_2023 in case of smaller lots. The estimate for 
fuel consumption due to forest road maintenance is estimated after 
@enache_stampfer_2015 with both options. For details about the other parameters 
call `?fuel_and_co2_evaluation`. 

The output of the function `fuel_and_co2_evaluation()` is an object of class
`c4c_co2_result`. It is, in essence, a named list containing the following
elements:

```{r names_carbon_results}
names(carbon_out)
```

Similar as for the base simulation results described in 
[Section 3.3](#explore_base_sim_out), the first six list elements document the
settings used for the calculations. The three remaining items, `co2_agg_high`,
`co2_agg_medium`, `co2_by_phases` are all data frames that contain the actual 
outcomes, each on a different level of aggregation. We show them below, but
note, that depending on your machine, some columns will be probably cut off and
only mentioned under *more variables* underneath the data frame. The variable
names should be self-explaining, note that for all operations we provide both,
the fuel consumption and the CO~2~ emissions.

```{r dataframes_carbon_results}
carbon_out$co2_agg_high
carbon_out$co2_agg_medium
carbon_out$co2_by_phases
```

As a tool for facilitating visualization, we also provide a plot functions for 
the output objects of the function `fuel_and_co2_evaluation()`. Call
`?plot.c4c_co2_result` for a full documentation. In the 
[quickstart section](#quickstart), we have provided an example plot, where the
CO~2~ emissions were plotted against the wood increment in CO~2~ equivalents.
Due to the CRAN policies, we cannot show the full set of possible diagrams in
this vignette; the user is encouraged to check the documentation and try out
the available options. First, we show for our example the total CO~2~ emissions
split into the components "cutting", "moving", and "road maintenance:

```{r plot_em_by_type}
plot(carbon_out, plot_type = "em_by_type")
```

Second, plotting the ratio of all CO~2~ emissions and the wood increment (in 
CO~2~ equivalents). Note that, in general, the emissions are between three and 
two magnitudes smaller compared to the increment.

```{r plot_em_inc_ratio}
plot(carbon_out, plot_type = "em_inc_ratio")
```


## 4 Further functions

When calling the documentation of any function contained in the package 
*care4cmodel*, e.g. with `?simulate_single_concept` and scrolling down to the 
bottom of the page, there is a link called "Index". Following this link leads 
to a complete list of documented functions the package provides for users. 
Typically, most of these functions are internally used in the workflow described 
above. Advanced users, however, might find some useful tools among them.


## 5 Acknowledgment

This R package has been developed as a part of the 
[CARE4C](http://www.care4c.eu/) project that has received funding from the 
European Union’s HORIZON 2020 research and innovation programme under the Marie 
Skłodowska-Curie grant agreement # 778322.


## 6 References