---
title: "Quick Tour of Package **optimCheck**"
author: "Martin Lysy"
date: "`r Sys.Date()`"
output:
  html_vignette:
    toc: true
bibliography: references.bib
csl: taylor-and-francis-harvard-x.csl
link-citations: true
vignette: >
  %\VignetteIndexEntry{optimCheck: A Quick Tour}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, eval = FALSE, echo = FALSE}

rmarkdown::render("optimCheck-quicktut.Rmd") # R code to render vignette

```

<!-- latex macros -->
\newcommand{\xx}{\boldsymbol{x}}
\newcommand{\hxx}{\hat{\boldsymbol{x}}}
\newcommand{\txx}{\tilde{\boldsymbol{x}}}
\newcommand{\bb}{\boldsymbol{b}}
\renewcommand{\AA}{\boldsymbol{A}}

## Introduction

The **optimCheck** package provides a set of tools to check that output of an optimization algorithm is indeed at a local mode of the objective function.  The tools include both visual and numerical checks, the latter serving to automate formalized unit tests with e.g., the **R** packages [**testthat**](https://CRAN.R-project.org/package=testthat) or [**RUnit**](https://CRAN.R-project.org/package=RUnit).  

## Example: Quadratic Objective Function

A brief overview of the package functionality is illustrated with the following example.  Let
$$
Q(\xx) = \xx'\AA\xx - 2 \bb'\xx
$$
denote a quadratic objective function in $\xx \in \mathbb R^d$.  If $\AA_{d \times d}$ is a positive-definite matrix, then the unique minimum of $Q(\xx)$ is $\hxx = \AA^{-1}\bb$.  Let us now ignore this information and try to minimize $Q(\xx)$ using **R**'s simplest built-in mode-finding routine, provided by the **R** function `stats::optim()`.

In its simplest configuration, `stats::optim()` requires only the objective function and a starting value $\xx_0$ to initialize the mode-finding procedure.  Let's consider a difficult setting for `stats::optim()`, with a relatively large $d = 12$ and a starting value $\xx_0$ which is far from the optimal value $\hxx$.
```{r, echo = -1}
set.seed(2608) # set seed to fix output
d <- 12 # dimension of optimization problem

# create the objective function: Q(x) = x'Ax - 2b'x
A <- crossprod(matrix(rnorm(d^2), d, d)) # positive definite matrix
b <- rnorm(d)
objfun <- function(x) crossprod(x, A %*% x)[1] - 2 * crossprod(b, x)[1]

xhat <- solve(A, b) # analytic solution

# numerical mode-finding using optim
xfit <- optim(fn = objfun,                    # objective function
              par = xhat * 5,                 # initial value is far from the solution
              control = list(maxit = 1e5))    # very large max. number of iterations

```

### Visual Checks with `optim_proj()`

Like most solvers, `stats::optim()` utilizes various criteria to determine whether its algorithm has converged, which can be assess with the following command:
```{r}
# any value other than 0 means optim failed to converge
xfit$convergence 
```
Here `stats::optim()` reports that its algorithm has converged.  Now let's check this visually with **optimCheck** using *projection plots*.  That is, let $\txx$ denote the potential optimum of $Q(\xx)$.  Then for each $i = 1,\ldots,d$, we plot 
$$
Q_i(x_i) = Q(x_i, \txx_{-i}), \qquad \txx_{-i} = (\tilde x_1, \ldots, \tilde x_{i-1}, \tilde x_{i+1}, \ldots, \tilde x_d).
$$
In other words, projection plot $i$ varies only $x_i$, while holding all other elements of $\xx$ fixed at the value of the potential solution $\txx$.  These plots are produced with the **optimCheck** function `optim_proj()`:
```{r, fig.width = 10, fig.height = 6, out.width = "97%"}
require(optimCheck) # load package

# projection plots
xnames <- parse(text = paste0("x[", 1:d, "]")) # variable names
oproj <- optim_proj(fun = objfun,              # objective function
                    xsol = xfit$par,           # potential solution
                    maximize = FALSE,          # indicates that a local minimum is sought
                    xrng = .5,                 # range of projection plot: x_i +/- .5*|x_i|
                    xnames = xnames)

```
In each of the projection plots, the potential solution $\tilde x_i$ is plotted in red.  The `xrng` argument to `optim_proj()` specifies the plotting range.  Among various ways of doing this, perhaps the simplest is a single scalar value indicating that each plot should span $x_i \pm$ `xrng` $\cdot |x_i|$.  Thus we can see from these plots that `stats::optim()` was sometimes up to 10% away from the local mode of the projection plots.

### Quantification of Projection Plots

Projection plots are a powerful method of assessing the convergence of mode-finding routines to a local mode.  While great for interactive testing, plots are not well-suited to automated unit testing as part of an **R** package development process.  To this end, **optimCheck** provides a means of quantifying the result of a call to `optim_proj()`.  Indeed, a call to `optim_proj()` returns an object of class `optproj` with the following elements:
```{r}
sapply(oproj, function(x) dim(as.matrix(x)))
```
As described in the function documentation, `xproj` and `yproj` are matrices of which each column contains the $x$-axis and $y$-axis coordinates of the points contained in each projection plot.  The `summary()` method for `optproj` objects coverts these to absolute and relative errors in both the potential solution and the objective function.  The `print()` method conveniently displays these results:
```{r}
oproj # same print method as summary(oproj)
```
The documentation for `summary.optproj()` describes the various calculations it provides.  Perhaps the most useful of these are the elementwise absolute and relative differences between the potential solution $\tilde{\xx}$ and $\hxx_\mathrm{proj}$, the vector of optimal 1D solutions in each projection plot. For convenience, these can be extracted with the `diff()` method:
```{r}
diff(oproj) # equivalent to summary(oproj)$xdiff

# here's exactly what these are
xsol <- summary(oproj)$xsol # candidate solution
xopt <- summary(oproj)$xopt # optimal solution in each projection plot
xdiff <- cbind(abs = xopt-xsol, rel = (xopt-xsol)/abs(xsol))
range(xdiff - diff(oproj))
```
Thus it is proposed that a combination of `summary()` and `diff()` methods for projection plots can be useful for constructing automated unit tests.  In this case, plotting itself can be disabled by passing `optim_proj()` the argument `plot = FALSE`.  See the `optimCheck/tests` folder for **testthat** examples featuring: 

- Logistic Regression (`stats::glm()` function).
- Quantile Regression (`quantreg::rq()` function in [**quantreg**](https://CRAN.R-project.org/package=quantreg))
- *Multivariate normal mixtures* (`mclust::emEEE()` in [**mclust**](https://CRAN.R-project.org/package=mclust)).

You can run these tests with the command

```{r eval = FALSE}
testthat::test_package("optimCheck", reporter = "progress")
```

<!-- ```{r, echo = -1, cache = FALSE} -->
<!-- set.seed(151) -->
<!-- # run tests and output timings -->
<!-- tresults <-  testthat::test_package("optimCheck", reporter = "list") -->
<!-- invisible(sapply(tresults, function(tres) { -->
<!--   message("Context: ", tres$context, " [", length(tres$results), " tests -- ", -->
<!--           round(tres$real, 1), " s]") -->
<!-- })) -->
<!-- ``` -->

## `optim_refit()`: A Numerical Alternative to Projection Plots

There are some situations in which numerical quantification of projection plots leaves to be desired:

Generating all projection plots requires `N = 2 * npts * length(xsol)` evaluations of the objective function (where the default value is `npts = 100`), which can belabor the process of automated unit testing.  A different test for mode-finding routines is to recalculate the optimal solution with an "very good" starting point: the current potential solution.  This is the so-called "**refi**ne op**t**izimation" -- or `refit` -- strategy.  

The `optim_refit()` function refines the optimization with a call to **R**'s built-in general-purpose optimizer: the function `stats::optim()`.  In particular, it selects the default Nelder-Mead simplex method with a simplified parameter interface.  As seen in the unit tests above, the `refit` checks are 2-3 times faster than their projection plot counterparts.  Consider now the example of refining the original `stats::optim()` solution to the quadratic objective function:
```{r}
orefit <- optim_refit(fun = objfun,        # objective function
                      xsol = xfit$par,     # potential solution
                      maximize = FALSE)    # indicates that a local minimum is sought
summary(orefit) # same print method as orefit
```
Thus we can see that the first and second run of `stats::optim()` are quite different. 

Of course, this does not mean that the refit solution produced by `stats::optim()` is a local mode:
```{r, fig.width = 10, fig.height = 6, out.width = "97%"}
# projection plots with refined solution
optim_proj(xsol = orefit$xopt, fun = objfun,
           xrng = .5, maximize = FALSE)
```
Indeed, the default `stats::optim()` method is only accurate when initialized close to the optimal solution.  Therefore, one may wish to run the refit test with a different optimizer.  This can be done externally to `optim_refit`, prior to passing the refit solution to the function via its argument `xopt`.  This is illustrated below using `stats::optim()`'s gradient-based quasi-Newton method:
```{r}
# gradient of the objective function
objgrad <- function(x) 2 * drop(A %*% x - b)

# mode-finding using quasi-Newton method
xfit2 <- optim(fn = objfun,                    # objective function
               gr = objgrad,                   # gradient
               par = xfit$par,                 # initial value (first optim fit)
               method = "BFGS")

# external refit test with optimizer of choice
orefit2 <- optim_refit(fun = objfun,
                       xsol = xfit$par,        # initial value (first optim fit)
                       xopt = xfit2$par,       # refit value (2nd fit with quasi-Newton method
                       maximize = FALSE)

# project plot test on refit solution
optim_proj(xsol = orefit2$xopt, fun = objfun,
           xrng = .5, maximize = FALSE, plot = FALSE)
```

<!-- the elementwise optima  -->
<!-- - It is difficult to combine local optima of different projection plots into a global optimal solution -->
<!-- for package development. (as with (http://r-pkgs.had.co.nz/tests.html)[**`testthat`**]),  -->

## Future Work: Constrained Optimization

Many constrained statistical optimization problems, seek a "sparse" solution, i.e., one for which some of the elements of the optimal solution are equal to zero.  In such cases, the relative difference between potential and optimal solution is an unreliable metric.  A working proposal is to flag these "true zeros" in `optim_proj()` and `optim_refit()`, so as to add a 1 to the relative difference denominators.  Other suggestions on this and **optimCheck** in general are [welcome](mailto:mlysy@uwaterloo.ca).