---
title: "Conventions for MLModels Implementation"
author: "Brian J Smith"
date: "2021-07-23"
output: rmarkdown::html_vignette
bibliography: bibliography.bib
vignette: >
  %\VignetteIndexEntry{Conventions for MLModels Implementation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```


## Model Constructor Components

- `MLModel` is a function supplied by the **MachineShop** package.  It allows for the integration of statistical and machine learning models supplied by other R packages with the **MachineShop** model fitting, prediction, and performance assessment tools.

- The following are guidelines for writing model constructor functions that are wrappers around the `MLModel` function.

- In this context, the term "constructor" refers to the wrapper function and "source package" to the package supplying the original model implementation.


### Constructor Arguments

- The constructor should produce a valid model if called without any arguments; i.e., not have any required arguments.

- The source package defaults will be used for parameters with `NULL` values.

- Model formula, data, and weights are separate from model parameters and should not be defined as constructor arguments.


### name Slot

- Use the same name as the constructor.


### packages Slot

- Include all external packages whose functions are called directly from within the constructor.

- Use :: to reference source package functions.


### response_types Slot

- Include all response variable types (`"binary"`, `"factor"`, `"matrix"`, `"numeric"`, `"ordered"`, and/or `"Surv"`) that can be analyzed with the model.


### weights Slot

- Logical indicating whether the model supports case weights.


### params Slot

- List of parameter values set by the constructor, typically obtained internally with `new_params(environment())` if all arguments are to be passed to the source package fit function as supplied.  Additional steps may be needed to pass the constructor arguments to the source package in a different format; e.g., when some model parameters must be passed in a control structure, as in `C50Model` and `CForestModel`.


### fit Function

- The first three arguments should be `formula`, `data`, and `weights` followed by an ellipsis (`...`).

- If weights are not supported, the following, or equivalent, should be included in the function:

```{r eval = FALSE}
if(!all(weights == 1)) warning("weights are not supported and will be ignored")
```

- Only add elements to the resulting fit object if they are needed and will be used in the `predict` or `varimp` functions.

- Return the fit object.


### predict Function

- The arguments are a model fit `object`, `newdata` frame, optionally `times` for prediction at survival time points, and an ellipsis.

- The predict function should return a vector or column matrix of probabilities for the second level of binary factors, a matrix whose columns contain the probabilities for factors with more than two levels, a matrix of predicted responses if matrix, a vector or column matrix of predicted responses if numeric, a matrix whose columns contain survival probabilities at `times` if supplied, or a vector of predicted survival means if `times` are not supplied.


### varimp Function

- Should have a single model fit `object` argument followed by an ellipsis.

- Variable importance results should generally be returned as a vector with elements named after the corresponding predictor variables.  The package will handle conversions to a data frame and `VariableImportance` object.  If there is more than one set of relevant variable importance measures, they can be returned as a matrix or data frame with predictor variable names as the row names.


## Documenting an MLModel


### Model Parameters

- Include the first sentences from the source package.

- Start sentences with the parameter value type (logical, numeric, character, etc.).

- Start sentences with lowercase.

- Omit indefinite articles (a, an, etc.) from the starting sentences.


### Details Section

- Include response types (binary, factor, matrix, numeric, ordered, and/or Surv).

- Include the following sentence:
  
> Default values for the \code{NULL} arguments and further model details can be
> found in the source link below.


### Return (Value) Section

- Include the following sentence:

> MLModel class object.


### See Also Section

- Include a link to the source package function and the other method functions shown below.

```
\code{\link[<source package>]{<fit function>}}, \code{\link{fit}},
\code{\link{resample}}
```

## Package Extensions

- If adding a new model to the package, save its source code in a file whose name begins with "ML_" followed by the model name, and ending with a .R extension; e.g., `"R/ML_CustomModel.R"`.

- Export the model in `NAMESPACE`.

- Add any required packages to the "Suggests" section of `DESCRIPTION`.

- Add the model to `R/models.R`.

- Add the model to `R/modelinfo.R`.

- Add a unit testing file to `tests/testthat`.