\documentclass[article,nojss]{jss}

\DeclareGraphicsExtensions{.pdf}

\usepackage{amsmath,amssymb}
\usepackage{textcomp}%has degree celsius

\renewcommand{\textfraction}{0.2}
\renewcommand{\bottomfraction}{0.7}
\renewcommand{\topfraction}{0.7}
\renewcommand{\floatpagefraction}{0.8}

%Numbering of sections
\setcounter{secnumdepth}{3}
%Numbering within TOC
\setcounter{tocdepth}{3}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% declarations for jss.cls %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\author{Thomas Petzoldt\\ Technische Universit\"at Dresden}
\Plainauthor{Thomas Petzoldt} %% comma-separated

\title{\pkg{simecol}-Howto: Tips, Tricks and Building Blocks}
\Plaintitle{simecol-Howto: Tips, Tricks and Building Blocks}
% \VignetteIndexEntry{Simecol-Howto: Tips, Tricks and Building Blocks}

\Shorttitle{simecol-Howto} %% a short title (if necessary)


%% an abstract and keywords
\Abstract{
  \flushleft %% workaround for the report style

  This document is a loose collection of chapters that describe
  different aspects of modelling and model implementation in
  \proglang{R} with the \pkg{simecol} package.  It supplements the
  original publication of \citet{Petzoldt2007} from which an updated
  version, \textbf{simecol-introduction}, is also part of this
  package. Please refer to this publication when citing this work.
}

\Keywords{\proglang{R}, \pkg{simecol}, ecological modeling,
  object-oriented programming (OOP), compiled code, debugging}

\Plainkeywords{R, simecol, ecological modeling, compiled code,
  debugging}

\Address{
  Thomas Petzoldt\\
  Institut f\"ur Hydrobiologie\\
  Technische Universit\"at Dresden\\
  01062 Dresden, Germany\\
  E-mail: \email{thomas.petzoldt@tu-dresden.de}\\
  URL: \url{https://tu-dresden.de/Members/thomas.petzoldt/}\\
}

%% need no \usepackage{Sweave.sty}

\SweaveOpts{engine=R, eps=FALSE, fig=FALSE, echo=TRUE, include=FALSE, prefix.string=howto, width=6, height=4}


\begin{document}

<<init,echo=FALSE, include=FALSE, fig=FALSE>>=
library("simecol")
data(lv, package = "simecol")
options("width"=72)
options("prompt" = "R> ", "continue" = "+  ")
@


\tableofcontents

%% ======================================================================
\section{The Basics}

\subsection[Building simecol objects]{Building \pkg{simecol} objects}

The intention behind \pkg{simecol} is the construction of
``all-in-one'' model objects. That is, everything that defines one
particular model, equations and data are stored together in one
\code{simObj} (spoken: sim-Object), and only some general algorithms
(e.g. differential equation solvers or interpolation routines) remain
external, preferably as package functions (e.g. function \code{lsoda}
in the package \pkg{deSolve} \citep{deSolve_jss} or as functions in the
user workspace.

This strategy has three main advantages:

\begin{enumerate}
\item We can have several independent versions of one model in the
  computer memory at the same time. These instances may have different
  settings, parameters and data or even use different formula, but
  they do not interfere with each other.  Moreover, if all data and
  functions are \emph{encapsulated} in their \code{simObj}ects,
  identifiers can be re-used and it is, for example, not necessary to
  keep track over a large number of variable names or to invent new
  identifiers for parameter sets of different scenarios.
\item We can give \code{simObj}ects away, either in binary form or as
  source code objects. Everything essential to run such a model is
  included, not only the formula but also defaults for parameter and
  data. You, or your users need only \proglang{R}, some packages and
  your model object. It is also possible to start model objects
  directly from the internet or, on the other hand, to distribute
  model collections as \proglang{R} packages.
\item All \code{simObj}ects can be handled, simulated and modified
  with the same generic functions, e.g. \code{sim}, \code{plot} or
  \code{parms}.  Our users can start playing with the models without
  the need to understand all the internals.
\end{enumerate}

While it is, of course, preferable to have all parts of a model
encapsulated in one object, it is not mandatory to have the complete
working model object before starting to use \pkg{simecol}.

\pkg{simecol} models (in the following called \code{simObj}ects) can
be built step by step, starting with mixed applications composed by
rudimentary \code{simObj}ects and ordinary user space functions. When
everything works, you should encapsulate all the main parts of your
model in the \code{simObj}ect to get a clean object that does not
interfere with others.

\subsubsection{An example}

We start with the example given in the simecol-introduction
\citep{Petzoldt2007}, an implementation of the UPCA model of
\citet{Blasius1999}, but we write it in the usual \pkg{deSolve} style,
i.e. without using \pkg{simecol}:

<<upca1>>=
f <- function(x, y, k){x*y / (1+k*x)}  # Holling II

func <- function(time, y, parms) {
  with(as.list(c(parms, y)), {
    du <-  a * u           - alpha1 * f(u, v, k1)
    dv <- -b * v           + alpha1 * f(u, v, k1) +
                           - alpha2 * f(v, w, k2)
    dw <- -c * (w - wstar) + alpha2 * f(v, w, k2)
    list(c(du, dv, dw))
  })
}

times  <- seq(0, 100, 0.1)
parms  <- c(a=1, b=1, c=10, alpha1=0.2, alpha2=1, k1=0.05, k2=0, wstar=0.006)
y      <- c(u=10, v=5, w=0.1)
@

The model is defined by 5 variables in the \proglang{R} user workspace,
namely \code{f}, \code{func}, \code{times}, \code{parms} and
\code{init}.  The implementation is similar to the help page examples
of package \pkg{deSolve} and we can solve it exactly in the same
manner:

<<upca2, fig=TRUE>>=
library(deSolve)
out <- lsoda(y, times, func, parms)
matplot(out[,1], out[,-1], type="l")
@

\begin{figure}
\centering
\includegraphics[width=\textwidth]{howto-upca2}
\caption{\label{howto-upca1} Output of UPCA model, solved with \code{lsoda}
from package \pkg{deSolve}.}
\end{figure}


\subsubsection{Transition to simecol} \label{s4definition}

If we compare this example with the \pkg{simecol} structure, we may
see that they are kind of similar. This obvious coincidence is quite
natural, because the notation of both, \pkg{deSolve} and
\pkg{simecol}, is based on the state-space notation of control
theory\footnote{see
  \url{https://en.wikipedia.org/wiki/State_space_(controls)}, last access
  2018-05-07}.

Due to this, only small restructuring and renaming is needed to form a
\code{simObj}:

<<upca3, keep.source=TRUE>>=
library("simecol")
f <- function(x, y, k){x*y / (1+k*x)}  # Holling II

upca <- new("odeModel",
  main = function(time, y, parms) {
    with(as.list(c(parms, y)), {
      du <-  a * u           - alpha1 * f(u, v, k1)
      dv <- -b * v           + alpha1 * f(u, v, k1) +
                             - alpha2 * f(v, w, k2)
      dw <- -c * (w - wstar) + alpha2 * f(v, w, k2)
      list(c(du, dv, dw))
    })
  },
  times  = seq(0, 100, 0.1),
  parms  = c(a=1, b=1, c=10, alpha1=0.2, alpha2=1,
    k1=0.05, k2=0, wstar=0.006),
  init   = c(u=10, v=5, w=0.1),
  solver = "lsoda"
)
@

We may notice that the assignment operators ``\code{<-}'' changed to
a declarative equal sign ``\code{=}'' for the slot definitions, that
some of the names (\code{y}, \code{func}) were changed to the
pre-defined slot names of \pkg{simecol} and that all the slot
definitions are now comma separated arguments of the \code{new}
function that creates the \code{upca} object. The solver method
\code{lsoda} is also given as a character string pointing to the
original \code{lsoda} function in package \pkg{deSolve}.

The new object can now be simulated very easily with the \code{sim}
function of \pkg{simecol} that returns the object with all original
slots and one additional slot \code{out} holding the output values.  A
generic \code{plot} function is also available for basic plotting of
the output:

<<upca4>>=
upca <- sim(upca)
plot(upca)
@

It is now also possible to extract the results from \code{upca} with a
so called accessor function \code{out}, and to use arbitrary,
user-defined plot functions:

<<upca5, fig=TRUE>>=
plotupca <- function(obj, ...) {
  o <- out(obj)
  matplot(o[,1], o[,-1], type="l", ...)
  legend("topright", legend = c("u", "v", "w"), lty=1:3, , bg="white",col = 1:3)
}
plotupca(upca)
@

%\begin{figure}
%\centering
%\includegraphics[width=\textwidth]{howto-upca5}
%\caption{\label{howto-upca5} A user-defined plot function.}
%\end{figure}


Okay, that's it, but note that function \code{f} is not yet part of
the simecol object, that's why we call here a ``mixed
implementation''. This function \code{f} is rather simple here, but it
would be also possible to call functions of arbitrary complexity from
\code{main}.

\subsubsection{Creating scenarios}

After defining one \code{simecol} object (that we can call a parent
object or a \textbf{prototype}), we may create derived objects, simply
by copying (cloning) and modification.  As an example, we create two
scenarios with different parameter sets:

<<upca6, fig=TRUE>>=
sc1 <- sc2 <- upca
parms(sc1)["wstar"] <- 0
parms(sc2)["wstar"] <- 0.1
sc1 <- sim(sc1)
sc2 <- sim(sc2)
par(mfrow=c(1,2))
plotupca(sc1, ylim=c(0, 250))
plotupca(sc2, ylim=c(0, 250))
@

\begin{figure}
\centering
\includegraphics[width=\textwidth]{howto-upca6}
\caption{\label{howto-upca6} Two scenarios of the UPCA model (left:
  wstar=0, right: wstar=0.1; functional response f is Holling II).}
\end{figure}

If we simulate and plot these scenarios, we see an exponentially
growing $u$ in both cases, and cycles resp. an equilibrium state for
$v$ and $w$ for the scenarios respectively (figure \ref{howto-upca6}).

If we now also change the functional response function $f$ from Holling II
to Lotka-Volterra:

<<upca7>>=
f <- function(x, y, k){x * y}
@

both model scenarios, \code{sc1} and \code{sc2} are affected by this
new definition.

<<upca8,fig=TRUE>>=
sc1 <- sim(sc1)
sc2 <- sim(sc2)
par(mfrow=c(1,2))
plotupca(sc1, ylim=c(0, 20))
plotupca(sc2, ylim=c(0, 20))
@

\begin{figure}
\centering
\includegraphics[width=\textwidth]{howto-upca8}
\caption{\label{howto-upca8} Two scenarios of the UPCA model (left:
  wstar=0, right: wstar=0.1; functional response f is Holling II).}
\end{figure}

Now, we get a stable cycle for $u$ and $v$ in scenario 1 and an
equilibrium for all state variables in scenario 2 (figure
\ref{howto-upca8}). You may also note that the new function \code{f}
has exactly the same parameters as above, including the, in the second
case obsolete, parameter \code{k}.

In the examples above, function \code{f} was an ordinary function in
the user workspace, but it is also possible to implement such
functions (or sub-models) directly as part of the model object. As one
possibility, one might consider to define local functions within
\code{main}, but that would have the disadvantage that such functions
are not easily accessible from outside.

To allow the latter, \pkg{simecol} has an optional slot ``equations'',
that can hold a list of sub-models. Such an equations-slot can be
defined either during object creation, or functions may be added
afterwards. In the following, we derive two new clones with default
parameter settings from the original \code{upca}-object, and then
assign one version (the Holling II functional response) to scenario 1
and the other version (simple multiplicative Lotka-Volterra functional
response) to scenario 2 (figure \ref{howto-upca9}):

<<upca9,fig=TRUE>>=
sc1 <- sc2 <- upca
equations(sc1)$f <- function(x, y, k){x*y / (1+k*x)}
equations(sc2)$f <- function(x, y, k){x * y}
sc1 <- sim(sc1)
sc2 <- sim(sc2)
par(mfrow=c(1,2))
plotupca(sc1, ylim=c(0, 20))
plotupca(sc2, ylim=c(0, 20))
@

\begin{figure}
\centering
\includegraphics[width=\textwidth]{howto-upca9}
\caption{\label{howto-upca9} Two scenarios of the UPCA model (left:
  functional response \code{f} is Holling II, right functional
  response is Lotka-Volterra).}
\end{figure}

This method allows to compare models with different structures in the
same way as scenarios with different parameter values. In addition,
it is also possible to define model objects with different versions of
\textbf{built-in} sub-models, that can be alternatively enabled:

<<upca10,keep.source=TRUE>>=
upca <- new("odeModel",
  main = function(time, y, parms) {
    with(as.list(c(parms, y)), {
      du <-  a * u           - alpha1 * f(u, v, k1)
      dv <- -b * v           + alpha1 * f(u, v, k1) +
                             - alpha2 * f(v, w, k2)
      dw <- -c * (w - wstar) + alpha2 * f(v, w, k2)
      list(c(du, dv, dw))
    })
  },
  equations  = list(
    f1 = function(x, y, k){x*y},           # Lotka-Volterra
    f2 = function(x, y, k){x*y / (1+k*x)}  # Holling II
  ),
  times  = seq(0, 100, 0.1),
  parms  = c(a=1, b=1, c=10, alpha1=0.2, alpha2=1, k1=0.05, k2=0, wstar=0.006),
  init   = c(u=10, v=5, w=0.1),
  solver = "lsoda"
)

equations(upca)$f <- equations(upca)$f1
@

\subsubsection{Debugging}

As stated before, all-in-one encapsulation of all functions and data
in \code{simObj}ects has many advantages, but there is also one
disadvantage, namely debugging.  Debugging of \proglang{S4} objects is
sometimes cumbersome, especially if slot-functions (e.g. \code{main},
\code{equations}, \code{initfunc}) come into play. These difficulties
are not much important for well-functioning ready-made model objects,
but they appear as an additional burden during model building, in
particular if these models are technically not as simple as in our
example.

%\subsubsection{Method 1: Debug functions in the user workspace}

Fortunately, there are easy workarounds. One of them is implementing
the technically challenging parts in the user-workspace first using
the above mentioned mixed style. Then, after developing and debugging
the model and if everything works satisfactory, integrating the parts
into the object is straightforward, given that you keep the general
structure in mind. In the example below, we implement the main model
as a workspace function \code{fmain}\footnote{Note that this function
  must never be named ``func'', for some rather esoteric internal
  reasons which we shall not discuss further here.}  with the same
interface (parameters and return values) as above, which is then
called by the \code{main}-function of the \code{simObj}:

<<upca11>>=
f <- function(x, y, k){x*y / (1+k*x)}  # Holling II

fmain <-  function(time, y, parms) {
  with(as.list(c(parms, y)), {
    du <-  a * u           - alpha1 * f(u, v, k1)
    dv <- -b * v           + alpha1 * f(u, v, k1) +
                           - alpha2 * f(v, w, k2)
    dw <- -c * (w - wstar) + alpha2 * f(v, w, k2)
    list(c(du, dv, dw))
  })
}

upca <- new("odeModel",
  main = function(time, y, parms) fmain(time, y, parms),
  times  = seq(0, 100, 0.1),
  parms  = c(a=1, b=1, c=10, alpha1=0.2, alpha2=1, k1=0.05, k2=0, wstar=0.006),
  init   = c(u=10, v=5, w=0.1),
  solver = "lsoda"
)
@

This function \code{fmain} as well as any other submodels like
\code{f} can now be debugged with the usual \proglang{R} tools,
e.g. \code{debug}:

<<eval=FALSE>>=
debug(fmain)
upca <- sim(upca)
@

Debugging can be stopped by \code{undebug(fmain)}. If everything
works, you can add the body of \code{fmain} to \code{upca} manually,
and it is even possible to do this in the formalized \code{simecol}
way of object modification:

<<upca12, keep.source=TRUE>>=
main(upca)        <- fmain   # assign workspace function to main slot
equations(upca)$f <- f       # assign workspace function to equations
rm(fmain, f)   # optional, for saving memory and avoiding confusion
str(upca)      # show the object
@

Now, we can delete \code{f} and \code{fmain} to have a clean
workspace with only the necessary objects.

%\subsubsection{Method 2: Extract functions temporarily}


\subsection{Different ways to store simObjects}

One of the main advantages of \pkg{simecol} is, that model objects can
be made persistent and that it is easy to distribute and share
\code{simObj}ects over the internet.

The most obvious and simple form is, of course, to use the original
source code of the objects, i.e. the function call to \code{new} with
all the slots which creates the \proglang{S4}-object (see section
\ref{s4definition}), but there are also other possibilities.

\pkg{simecol} objects can be saved in machine readable form as
\proglang{S4}-object binaries with the \code{save} method of
\proglang{R}, which stores the whole object with all its equations,
initial values, parameters etc. and also the simulation output if the
model was simulated before saving.

<<upca13, keep.source=TRUE>>=
save(upca, file="upca.Rdata")  # persistent storage of the model object
load("upca.Rdata")             # load the model
@

Conversion of the \proglang{S4} object to a list is another possibility, 
that yields a representation that is readable by humans \textbf{and} 
by \proglang{R}:

<<>>=
l.upca <- as.list(upca)
@

This method allows to get an alternative text representation of the
\code{simObj}, that can be manipulated by code parsing programs or
dumped to the hard disk:

<<>>=
dput(l.upca, file="upca_list.R")
@

and this is completely reversible via:

<<>>=
l.upca <- dget("upca_list.R")
upca <- as.simObj(l.upca)
@

Sometimes it may be useful to store simObjects in an un-initialized
form, in particular if they are to be distributed in packages.

Let's demonstrate this again with a simple Lotka-Volterra model. In
the first step, we define a function, that returns a \code{simecol}
object:

<<lvgen,fig=FALSE>>=
genLV <- function() {
  new("odeModel",
    main = function (time, init, parms) {
      x <- init
      p <- parms
      dx1 <-   p["k1"] * x[1] - p["k2"] * x[1] * x[2]
      dx2 <- - p["k3"] * x[2] + p["k2"] * x[1] * x[2]
      list(c(dx1, dx2))
    },
    parms  = c(k1=0.2, k2=0.2, k3=0.2),
    times  = c(from=0, to=100, by=0.5),
    init   = c(prey=0.5, predator=1),
    solver = "lsoda"
  )
}
@

Now, the function contains the instruction, how \proglang{R} can
create a new instance of such a model. The \code{simecol} object is
not created yet, but a call to the creator function can bring it to
live:

<<lvgen2>>=
lv1 <- genLV()
plot(sim(lv1))
@

This style is used in package
\pkg{simecolModels}\footnote{\pkg{simecolModels} can be downloaded
  from the R-Forge server,
  \url{https://simecol.r-forge.r-project.org/}.}, a collection of
(mostly) published ecological models.


%\subsection{The initfunc}
% basics already explained in the manual
% avoiding re-initialization

%\subsection{Using alternative solver packages}

%\pkg{PBSddeSolve}-example  \dots
% use a time delayed differential model as example

%user-defined solvers and assignment of functions to the solver-slot \dots

%\subsection{Using vectorized equations}

\subsection{Methods to work with S4 objects}

The \proglang{S4} scheme includes several utility functions which can
be used to inspect objects and their methods. As an example,
\code{showMethods} can be used to list all different functions, that
are available for one method (here \code{sim}), depending on the
object types involved:

<<showMethods,fig=FALSE>>=
showMethods("sim")
@

Based on this information, it is now possible to inspect the source
code of the particular method, e.g. the \code{sim}-function for
differential equation models (class \code{odeModel}):

<<getMethod,fig=FALSE>>=
getMethod("sim", "odeModel")
@

In addition to this, \proglang{R} has several other functions to
inspect or manipulate objects, e.g. \code{hasMethod},
\code{findMethod}, or \code{setMethod}, please see the documentation
of these functions for details.

% hasMethod("sim", "odeModel")
% findMethod("sim", "odeModel")

% setMethod

%% ======================================================================
\section{Fitting Parameters}

The preferred method to obtain model parameters is direct measurement
of process rates. This can be done in own controlled experiments or by
taking results from the literature. However, not all processes are
accessible by direct measurement. In such cases process parameters can
be identified indirectly by calibration from observed data.

\subsection{Example model and data}

In order to demonstrate parameter fitting we use a simple ODE model
with two coupled differential equations:

\begin{align}
\frac{dX}{dt} &= \mu  X - D X                          \\
\frac{dS}{dt} &=  D (S_0 - S) - \frac{1}{Y} \mu  X     \\
\intertext{with Monod functional response:}
\mu  &= v_m \frac{S}{k_m + S}                           \\
\intertext{and:}
X &= \text{abundance or biomass concentration} \nonumber\\
S &= \text{substrate concentration} \nonumber\\
D &= \text{dilution rate} \nonumber\\
Y &= \text{yield constant (i.e. conversion factor between S and X)} \nonumber\\
D &= \text{dilution rate} \nonumber\\
mu  &= \text{growth rate} \nonumber\\
v_m &= \text{maximum growth rate} \nonumber\\
k_m  &= \text{half saturation constant} \nonumber
\end{align}

\subsection{Load the model and assign a solver}

The chemostat model is one of the standard examples in package
\pkg{simecol}, that can be loaded like a data set.

In the following, instead of the default solver \code{lsoda} an explicit 
solver \code{adams} can be used, because the problem is not stiff.
However, package \pkg{deSolve} only contains dedicated functions for the most
common solvers like \code{lsoda} or \code{rk4}, while other solvers like 
\code{adams} (for non-stiff) or \code{bdf} (for stiff systems) are specified 
by calling \code{ode} with an appropriate  \code{method}-option.


<<fig=FALSE, keep.source=TRUE>>=
data(chemostat)
solver(chemostat) # shows which solver we have
## assign an alternative solver
solver(chemostat) <- function(y, times, func, parms, ...) {
  ode(y, times, func, parms, method = "adams", ...)
}
@

Now, we create two working copies \code{cs1} and \code{cs2} for further usage:

<<fig=FALSE>>=
cs1 <- cs2 <- chemostat
@


Let's also assume we have the following test data:

<<fig=FALSE>>=
obstime <- seq(0, 20, 2)
yobs <- data.frame(
  X    = c(10, 26, 120, 197, 354, 577, 628, 661, 654, 608, 642),
  S    = c(9.6, 10.2, 9.5, 8.2, 6.4, 4.9, 4.2, 3.8, 2.5, 3.8, 3.9)
)
@

Note that in this example, \code{yobs} contains two columns (and only
two) with exactly the same column names as the state variables.

\subsection{Parameter estimation in simecol}

Package \pkg{simecol} has a basic function for fitting parameters
of ODE models (\code{fitOdeModel}) built in. This function is a wrapper that
uses existing optimization functions of \proglang{R}, currently
\code{nlminb} with the ``PORT'' algorithm, \code{newuoa} and \code{bobyqa} 
from package \pkg{minqa} \citep{Powell2009,minqa} and \code{optim} with ``Nelder-Mead'', 
``BFGS'', ``CG'', ``L-BFGS-B'' and ``SANN'' \cite[cf.][and others]{Rcore2015,Nash1990}. Among
these only ``PORT'', ``bobyqa'' and ``L-BFGS-B'' support constrained optimization
natively, whereas for the others \code{fitOdeModel} tries to emulate box constraints
(more or less successfully) using an arctan-transformation (see \code{p.constrain}).

In order to save computation time it is suggested to use an efficient
variable time step algorithm (e.g. \code{adams} od \code{bdf}, 
depending on the system) and to set external time steps of the model 
to the time steps contained in the observational data:

<<fig=FALSE>>=
times(cs1)  <- obstime
@

For the most basic call to the parameter fitting function we need only
the model object, the time steps and the observational data:

<<fig=FALSE,eval=FALSE>>=
res <- fitOdeModel(cs1, obstime = obstime, yobs=yobs)
@

In this case, \textbf{all} parameters are fitted by least squares
between \textbf{all} state variables and their corresponding
observations. The start values for parameter estimation are taken from
the \code{simObj}ect. Sum of squares of the individual state variables
is weighted by the inverse standard deviation of the observations and
the Nelder-Mead algorithm is used by default.

Many options are available to control the parameter fitting, e.g. to
fit only a subset of parameters (\code{whichpar}), to control the
amount of information displayed (\code{debuglevel}, \code{trace}),
weighting of state variables (\code{sd.yobs}) or individual data
points (\code{weights}), or the algorithm used (\code{method}).

In the following, we fit only $v_m$, $k_m$ and $Y$ with the PORT
algorithm, that is in many cases faster than the other methods,
especially if the range of parameters is known (constrained
optimization, e.g. to avoid negative values for $k_m$). It as also a
good idea to assign reasonable start values to the
\code{simObj}ect. For interactive use it is recommended to set
\code{trace=TRUE}.

<<fig=FALSE>>=
whichpar <- c("vm", "km", "Y")
lower    <- c(vm=0, km=0, Y=0)
upper    <- c(vm=100, km=500, Y=200)
parms(cs1)[whichpar] <- c(vm=5, km=10, Y=100)

res <- fitOdeModel(cs1, whichpar = whichpar,
  lower = lower, upper=upper,
  obstime = obstime, yobs = yobs, method = "PORT",
  control=list(trace = FALSE))
@

The results are now stored in a list structure \code(res), from which
parameters, objective (sum of squares) or information about
convergence can be extracted:

<<fig=FALSE>>=
res
@

Future versions of \pkg{simecol} will probably contain specific
functions to extract this information in a more user-friendly style.

The success of parameter estimation can now be controlled numerically
and graphically. First we assign the new parameters to a copy of the
model object:

<<fig=FALSE>>=
parms(cs2)[whichpar] <- res$par
@

We get $r^2$ for both state variables with:

<<fig=FALSE>>=
times(cs2) <- obstime
ysim <- out(sim(cs2))
1 - var(ysim$X - yobs$X) / var(yobs$X)
1 - var(ysim$S - yobs$S) / var(yobs$S)
@

Note that time steps in data and model must be the same. However, for
producing a smooth figure it is recommended to use a smaller time
step. We will need this figure also for further examples, so it makes
sense to define a function:

<<fit1,fig=TRUE>>=
plotFit <- function() {
  times(cs2) <- c(from=0, to=20, by=.1)
  ysim <- out(sim(cs2))
  par(mfrow=c(1,2))
  plot(obstime, yobs$X, ylim = range(yobs$X, ysim$X))
  lines(ysim$time, ysim$X, col="red")
  plot(obstime, yobs$S, ylim= range(yobs$S, ysim$S))
  lines(ysim$time, ysim$S, col="red")
}
plotFit()
@

\begin{figure}
\centering
\includegraphics[width=\textwidth]{howto-fit1}
\caption{\label{howto-fit1} Observed data and fitted model.}
\end{figure}


\subsection{Estimation of initial values}

In the example above, we assumed that the initial values are were
known and in fact, we used the built-in initial values from the
default example. In many cases, however, initial values must be
estimated from observed data because they are either completely
unknown or known with a certain error.

In order to estimate initial values from data, we have to add them
to the list of parameters in a technical sense.

<<>>=
parms(cs1) <- c(parms(cs1), init(cs1))
parms(cs1)
@

The second step is to assign these parameters back to the vector of
initial values, at the beginning of a new simulation, i.e. to
initialize the model object with new start values determined by the
optimization algorithm before simulation and calculation of sum of
squares. For such purposes, \pkg{simecol} objects have a special
function slot \code{initfunc} and the only thing we have to do is to
assign an appropriate function which copies the appropriate values
from \code{parms} to \code{init}.

<<>>=
initfunc(cs1) <- function(obj) {
  init(obj) <- parms(obj)[c("X", "S")]
  obj
}
@

Note that initfunc gets an object as input which is then modified and
returned. Note also, that number and order of initial values must be
consistent with \code{main}.

<<fig=FALSE>>=
whichpar <- c("vm", "km", "X", "S")
lower    <- c(vm=0, km=0, X=0, S=0)
upper    <- c(vm=100, km=500, X=100, S=100)
parms(cs1)[whichpar] <- c(vm=10, km=10, X=10, S=10)

res <- fitOdeModel(cs1, whichpar = whichpar,
  lower = lower, upper=upper,
  obstime = obstime, yobs = yobs, method = "Nelder",
  control=list(trace = FALSE))
@

Assigning fitted parameters and a new simulation now results in:

<<fit2,fig=TRUE>>=
initfunc(cs2) <- initfunc(cs1)
parms(cs2)[whichpar] <- res$par

plotFit()
@

Note that the model object \code{cs2} should have a copy of the
\code{initfunc} too, otherwise it is necessary to assign the initial
values manually.

\begin{figure}
\centering
\includegraphics[width=\textwidth]{howto-fit2}
\caption{\label{howto-fit2} Observed data and fitted model ($k_m$,
  $v_m$ and initial values were estimated).}
\end{figure}

\subsection{Scaling and weighting}

Appropriate scaling of \textbf{observational variables} is crucial. On
one hand, scaling is necessary because variables may be of different order
of magnitude or have different measurement units (comparing apples and
oranges), but on the other hand estimated parameters heavily depend on
this decision.

Weighting is related to scaling. It may be required to weight
\textbf{individual data points}, depending on their accuracy or the
number of measurements they rely on.

A third type of scaling is sometimes necessary for numerical reasons
if \textbf{parameters} have very different order of magnitude. Some
algorithms (e.g. PORT) try to do this automatically but sometimes even
for this, an optional scaling argument is required.

\subsection{Scaling of variables}

The default scaling method used in \pkg{simecol} is division by the
standard deviation of observational values (per variable, i.e. per
column), but sometimes it is necessary to provide user-specified
scaling. Depending on the question and kind of the data, different
assumptions may be appropriate, for example mean, median, range, an
appropriate conversion constant or kind of ``expert judgement'' that
helps to make state variables comparable.

The following example uses default scaling (standard deviations of
observations):

<<fit3,fig=TRUE>>=
whichpar <- c("vm", "km")
parms(cs1)[whichpar] <- c(vm = 5, km = 2)

res <- fitOdeModel(cs1, whichpar = whichpar,
  obstime = obstime, yobs = yobs, method = "Nelder")
res$value
res$par
@

and in the second example, we set scaling for both variables to one
(i.e. divide both variables by 1). This means that no scaling is
applied at all and the original dimensions remain:

<<>>=
res <- fitOdeModel(cs1, whichpar = whichpar,
  obstime = obstime, yobs = yobs, method = "Nelder",
  sd.yobs = c(1, 1))

res$value
res$par
@

It is obvious that the resulting sum of squares and also the estimated
parameters of these two approaches are different.

%% ToDo:
%% - make assignment the scaling factors to variables more robust.
%%   (e.g. named vector)
%% - rename sd.yobs ???

\subsection{Weighting of observations}

Weighting of observations (i.e. per row or per individual value) can
be useful in different circumstances, e.g. if measurements have
different precision, if variance is not constant over the range of
observations (violation of homoscdasticity) or if data points
represent multiple measurements.

Let's assume that we want to downweight the last three values (9\dots
11) for any reason to one third, we simply define a data frame with
the same structure as the observational data (yobs) and assign
appropriate weights:


<<>>=
weights <- data.frame(X = rep(1, nrow(yobs)),
                      S = rep(1, nrow(yobs)))
weights[9:11,] <- 1/3
res <- fitOdeModel(cs1, whichpar = whichpar,
  obstime = obstime, yobs = yobs, method = "Nelder",
  weights = weights)

res$value
res$par
@


\subsection{Scaling of parameters}

If the parameters to be fitted have very different order of magnitude
(e.g. one is 0.1 and some other is $10^7$), then it is possible that
optimization fails due to numerical problems. One possibility to avoid
this is to reformulate the problem that way, that the size of
parameters do not differ ``not too much''. Depending on the algorithm
used, it may be also possible to let the optimizer do this rescaling
of parameters, e.g. via argument \code{scale} for the PORT algorithm
or with \code{control = list(parscale = ...)} for the algorithms in
\code{optim}. Please consult the original help pages for details.

Normally, the PORT algorithm (in function \code{nlminb}) does
automatic rescaling, but if required scaling by $scale = 1/par_{max}$
may help (see \url{http://netlib.bell-labs.com/cm/cs/cstr/153.pdf}):
% == thpe 2021-09-22 bell-labs link outdated ==

<<>>=
res <- fitOdeModel(cs1, whichpar = whichpar,
  obstime = obstime, yobs = yobs, method = "PORT",
  scale = 1/c(vm = 2, km = 5))

res$value
res$par
@


\subsection{Parameter estimation with FME}

\pkg{FME} \cite[Flexible Modeling Environment,][]{Soetaert2010} is a
package that contains functions to perform complex applications of
models, consisting of differential equations, especially fitting
models to data, sensitivity analysis and Markov chain Monte Carlo.

The essential principles of model fitting with \pkg{FME} are quite
similar to the above, including fine-tuning of the underlying
optimization algorithms. However, in contrast to \code{fitOdeModel},
\pkg{FME}'s \code{modFit} is \textbf{much more powerful}. It allows to use several
additional optimizers and gives more detailed output, especially
standard errors and covariances (correlations) of parameter
estimates. Other advantages are the flexible way to define own
minimization criteria (cost functions), the existence of summary
functions to extract the outputs and the availability of an advanced
Markov chain Monte Carlo (MCMC) algorithm.

In the following we give a full example to demonstrate its use with
the same model and data as above.  We use the chemostat model again,
together with the test data set. The user defined cost function
(\code{Cost}) contains simulation and comparison between simulated and
observed data (function \code{modCost}) as last statement in the
function. Note that \code{modCost} requires that both simulated and
observed data contain a \code{time} column. This is already available
in \code{ysim} (returned by the solver), but we have to add it also to
\code{yobs}. As above, appropriate scaling (resp. weighting) of
variables is also an important point.  Here, setting
\code{weight="std"} does the same as the default \code{sd.yobs} in the
former example, other possibilities are described in the respective
online help pages.


%% workaround to omit minpack.lm start messages
<<echo=FALSE,quiet=TRUE,split=TRUE>>=
fme <- require(FME)
@


<<>>=
library(FME)
library(simecol)
data(chemostat)

cs1 <- chemostat

obstime <- seq(0, 20, 2)
yobs <- data.frame(
  X    = c(10, 26, 120, 197, 354, 577, 628, 661, 654, 608, 642),
  S    = c(9.6, 10.2, 9.5, 8.2, 6.4, 4.9, 4.2, 3.8, 2.5, 3.8, 3.9)
)

Cost <- function(p, simObj, obstime, yobs) {
  whichpar <- names(p)
  parms(simObj)[whichpar] <- p
  times(simObj) <- obstime
  ysim <- out(sim(simObj))
  modCost(ysim, yobs, weight="std")
}

yobs <- cbind(time=obstime, yobs)

Fit <- modFit(p = c(vm=10, km=10), f = Cost, simObj=cs1,
  obstime=obstime, yobs=yobs, method="Nelder", control=list(trace=FALSE))
@

Setting \code{trace = TRUE} in interactive sessions helps to observe how
minimization proceeds.  Moreover, it is of course also possible to use other
optimization algorithms or to constrain the parameters within sensible ranges.

The output can now be extracted with appropriate methods:

<<>>=
summary(Fit)
deviance(Fit)
coef(Fit)
@

You may also test the following, but we omit its output here to save space:

<<eval=FALSE>>=
residuals(Fit)
df.residual(Fit)
plot(Fit)
@


%% ======================================================================
\section{Implementing Models in Compiled Languages}

Compilation of model code can speed up simulations considerably and
there are several ways to call compiled code from \proglang{R}; so it
is possible to use functions written in \proglang{C/C++} or
\proglang{Fortran} in the ordinary way described in the ``Writing R
Extensions'' manual \citep{Rexts2006}. This can speed up computations,
but still a certain amout of communication overhead is needed because
the control is given back to \proglang{R} in every simulation step.

In addition to this basic method, it is also possible to enable direct
communication between integration routines and the model code if both
are available in compiled languages and if the direct call of a
compiled model is supported by the integrator. All integrators of
package \pkg{deSolve} support this, see the \pkg{deSolve}
documentation for details.

\subsection{An Example in C}

Now, let's inspect an example. We firstly provide our model as
described in the \pkg{deSolve} vignette ``Writing Code in Compiled
Language'', here again the Lotka-Volterra-model:

\begin{verbatim}
/* file: clotka.c */
#include <R.h>

static double parms[3];

#define k1 parms[0]
#define k2 parms[1]
#define k3 parms[2]

/* It is possible to define global variables here */
static double aGlobalVar = 99.99;   // for testing only

/* initializer: same name as the dll (without extension) */
void clotka(void (* odeparms)(int *, double *)) {
  int N = 3;
  odeparms(&N, parms);
  Rprintf("model parameters succesfully initialized\n");
}

/* Derivatives */
void dlotka(int *neq, double *t, double *y,
  double *ydot, double *yout, int *ip) {

  // sanity check for the 2 'additional outputs'
  if (ip[0] < 2) error("nout should be at least 2");

  // derivatives
  ydot[0] = k1 * y[0]        - k2 * y[0] * y[1];
  ydot[1] = k2 * y[0] * y[1] - k3 * y[1];

  // the 2 additional outputs, here for demo purposes only
  yout[0] = aGlobalVar;
  yout[1] = ydot[0];
}
\end{verbatim}


Using \verb!#define! macros are a typical \proglang{C}-trick to get
readable names for the parameters. This method is simple and efficient
and of course, there are more elaborate possibilities. One alternative
is using dynamic variables, another is doing call-backs to
\proglang{R}.

The \proglang{C} code can now be compiled into a so-called shared
library (on Linux) or a DLL on Windows, that can be linked to
\proglang{R}.

Compilation requires an installed \proglang{C} compiler (\pkg{gcc})
and some other tools that are quite standard on Linux, and which are
also available for the Macintosh or, form of the \pkg{R-Tools}
collection\footnote{\url{https://cran.r-project.org/bin/windows/Rtools/}}
provided by Duncan Murdoch for Windows.

If the tools are installed, compilation can be done directly from
\proglang{R} with:

<<compile.clotka,fig=FALSE,eval=FALSE>>=
system("R CMD SHLIB clotka.c")
@

The result, a shared library or DLL, can now be linked to the current
\proglang{R} session with \code{dyn.load}, that we show here for
Windows, and which is quite similar for Linux \cite[see][, Writing R
Extensions for details]{Rexts2006}.  Note that you set the working
directory of \proglang{R} to the path where the DLL resides or use the
full path in the call to \code{dyn.load}.


<<clotka_R,eval=FALSE>>=
modeldll <- dyn.load("clotka.dll")
@

You can now call the derivatives \code{dlotka} of the model in the
\code{main} function of a \pkg{simecol}-object.


\subsection{Enabling Direct Communication Between Model and Solver}

Another, more efficient way, is to tell the solver (e.g. \code{lsoda})
directly where to find the model in the DLL. This method circumvents
the communication overhead that occurs normally for every call from
the solver to the model DLL and is especially effective if small
models are called many times, e.g. in case of small time steps or if a
model is embedder in an optimization routine.

The trick consists of two parts:

\begin{enumerate}
\item We write an almost empty \code{main} function that returns all
  the information that the ODE solver needs in form of a list,
\item Instead of putting a character reference to an existing solver
  function into the \code{solver} slot (e.g. \code{"lsoda"}) we write
  a user-defined interface to the solver and assign it to the
  solver-slot as shown in the example.
\end{enumerate}

Now, we can simulate our model as usual, but avoid interpretation and
communication overhead of \proglang{R} during the integration.

\begin{Schunk}
\begin{Sinput}
clotka <- new("odeModel",
  ## note that 'main' does not contain any equations directly
  ## but returns information where these can be found
  ## 'nout' is the number of 'additional outputs'
  main = function(time, init, parms) {
     # a list with: dllname, func, [jacfunc], nout
     list(lib     = "clotka",
          func    = "dlotka",
          jacfunc = NULL,
          nout    = 2)
  },
  ## parms, times, init are provided as usual, enabling
  ## scenario control like for 'ordinary' simecol models
  parms  = c(k1=0.2, k2=0.2, k3=0.2),
  times  = c(from=0, to=100, by=0.5),
  init   = c(prey=0.5, predator=1),
  ## special solver function that evaluates funclist
  ## and passes its contents directly to the lsoda
  ## in the 'compiled function' mode
  solver = function(init, times, funclist, parms, ...) {
    f <- funclist()
    as.data.frame(lsoda(init, times, func=f$func,
      parms = parms, dllname = f$lib, jacfunc=f$jacfunc, nout = f$nout, ...)
    )
  }
)

clotka <- sim(clotka)

## the two graphics on top are the states
## the other are additional variables returned by the C code
## (for demonstration purposes here)
plot(clotka)

## Another simulation with more time steps
times(clotka)["to"] <- 1000
plot(sim(clotka))

## another simulation with intentionally reduced accuracy
## for testing
plot(sim(clotka, atol=1))

dyn.unload(as.character(modeldll[2]))
\end{Sinput}
\end{Schunk}


You should note a considerable speed-up and you may ask if this is
still a \pkg{simecol} object, because the main parts are now in
\proglang{C} and you may also ask, why one should still write models
in \proglang{R} if \proglang{C} or \proglang{FORTRAN} are so much
faster.

The answer is that speed of computation is not the only factor.  What
counts is a good compromise between execution speed and programming
effort. Programming in scripting languages like \proglang{R} is much
more convenient than programming in compiled languages like
\proglang{C} or \proglang{FORTRAN}. Also, programming in compiled
languages does only pay its effort required if models are quite large
or if a large number of model runs is performed. Even in such cases, a
mixed \proglang{R} \textbf{and} \proglang{C} approach can be
efficient, because it is only necessary to implement the core
functionality of the model in \proglang{C} and most of data
manipulation and scenario control can be done in \proglang{R}.

\pkg{simecol} follows exactly this philosophy. Implementing everything
in \proglang{R} is highly productive if speed is of minor importance,
but you \textbf{may} use \proglang{C} etc. whenever necessary, and
even in that case you still have the scenario management and data
manipulation features of \pkg{simecol}.


%%=======================================================================
\section{Troubleshooting}

This chapter lists the reasons of the most common problems and error
messages. You can contribute to these sections by submitting bug
reports.

\subsection{Error messages when creating simObj-ects}

\begin{description}
\item[Message:]
  \begin{verbatim}
Error in getClass(Class, where = topenv(parent.frame())) :
"odeModel" is not a defined class
  \end{verbatim}
  Instead of \code{odeModel} also other class names are possible.
\item[Reasons:] The most commmon reason is to forget loading the required
  \pkg{simecol} package. Another reason may be, that you use a model
  class that is not contained in simecol (e.g. mistakenly
  \code{odemodel} instead of \code{odeModel}).
\item[Solution:] Load the \pkg{simecol} package by
  \code{library(simecol)}. If this does not help, check spelling of
  the class name in \code{new}.
\end{description}


\begin{description}
\item[Message:]
  \begin{verbatim}
Error in assign(nm, L[[nm]], p) : attempt to use
zero-length variable name
  \end{verbatim}
\item[Solution:] Use ``\code{=}'' instead of ``\code{<-}'' for the lists
  (e.g. function  definition in equations) and function arguments.
\end{description}

\begin{description}
\item[Message:] \texttt{"Warning: a final empty element has been omitted"}
\item[Solution:] Look for obsolete commas after the last element in
  lists (e.g. params).
\end{description}


\subsection{Error messages during model simulations}

\begin{description}
\item[Message:]
  \begin{verbatim}
Error in lsoda(...... :
The number of derivatives returned by func() (3)
must equal the length of the initial conditions vector (2)
  \end{verbatim}
\item[Reasons:] The number of state variables of ODE systems must be
  consistent between the input and the output of the derivative
  function that is called \code{func} in package \code{deSolve} from
  which this error message is printed. In \pkg{simecol} this function
  is called \code{main}. The other solvers of \pkg{deSolve}
  (e.g. \code{rk4}) or other compatible solver packages may issue
  similar messages.
\item[Solution:] Check number of parms and also naming of:
  \begin{itemize}
    \item the \code{init} slot of the model definition,
    \item the usage of the second argument (commonly named
      \code{init}, \code{x} or \code{y}) in the \code{main} function,
    \item the return value of \code{main}. The return value is a list
      whose first argument must be a vector with the same length as
      \code{init}. Note that also the order of \code{init} and the
      return value must be identical.
  \end{itemize}
\end{description}


%%=======================================================================


%\bibliographystyle{jss}
\phantomsection
\addcontentsline{toc}{chapter}{References}
\bibliography{simecol}


% clean up
<<cleanup,echo=FALSE,include=FALSE>>=
options("prompt" = "> ", "continue" = "+ ")
@
\end{document}