Conventions for MLModels Implementation

Model Constructor Components

  • MLModel is a function supplied by the MachineShop package. It allows for the integration of statistical and machine learning models supplied by other R packages with the MachineShop model fitting, prediction, and performance assessment tools.

  • The following are guidelines for writing model constructor functions that are wrappers around the MLModel function.

  • In this context, the term “constructor” refers to the wrapper function and “source package” to the package supplying the original model implementation.

Constructor Arguments

  • The constructor should produce a valid model if called without any arguments; i.e., not have any required arguments.

  • The source package defaults will be used for parameters with NULL values.

  • Model formula, data, and weights are separate from model parameters and should not be defined as constructor arguments.

name Slot

  • Use the same name as the constructor.

packages Slot

  • Include all external packages whose functions are called directly from within the constructor.

  • Use :: to reference source package functions.

response_types Slot

  • Include all response variable types ("binary", "factor", "matrix", "numeric", "ordered", and/or "Surv") that can be analyzed with the model.

weights Slot

  • Logical indicating whether the model supports case weights.

params Slot

  • List of parameter values set by the constructor, typically obtained internally with new_params(environment()) if all arguments are to be passed to the source package fit function as supplied. Additional steps may be needed to pass the constructor arguments to the source package in a different format; e.g., when some model parameters must be passed in a control structure, as in C50Model and CForestModel.

fit Function

  • The first three arguments should be formula, data, and weights followed by an ellipsis (...).

  • If weights are not supported, the following, or equivalent, should be included in the function:

if(!all(weights == 1)) warning("weights are not supported and will be ignored")
  • Only add elements to the resulting fit object if they are needed and will be used in the predict or varimp functions.

  • Return the fit object.

predict Function

  • The arguments are a model fit object, newdata frame, optionally times for prediction at survival time points, and an ellipsis.

  • The predict function should return a vector or column matrix of probabilities for the second level of binary factors, a matrix whose columns contain the probabilities for factors with more than two levels, a matrix of predicted responses if matrix, a vector or column matrix of predicted responses if numeric, a matrix whose columns contain survival probabilities at times if supplied, or a vector of predicted survival means if times are not supplied.

varimp Function

  • Should have a single model fit object argument followed by an ellipsis.

  • Variable importance results should generally be returned as a vector with elements named after the corresponding predictor variables. The package will handle conversions to a data frame and VariableImportance object. If there is more than one set of relevant variable importance measures, they can be returned as a matrix or data frame with predictor variable names as the row names.

Documenting an MLModel

Model Parameters

  • Include the first sentences from the source package.

  • Start sentences with the parameter value type (logical, numeric, character, etc.).

  • Start sentences with lowercase.

  • Omit indefinite articles (a, an, etc.) from the starting sentences.

Details Section

  • Include response types (binary, factor, matrix, numeric, ordered, and/or Surv).

  • Include the following sentence:

Default values for the arguments and further model details can be found in the source link below.

Return (Value) Section

  • Include the following sentence:

MLModel class object.

See Also Section

  • Include a link to the source package function and the other method functions shown below.
\code{\link[<source package>]{<fit function>}}, \code{\link{fit}},
\code{\link{resample}}

Package Extensions

  • If adding a new model to the package, save its source code in a file whose name begins with “ML_” followed by the model name, and ending with a .R extension; e.g., "R/ML_CustomModel.R".

  • Export the model in NAMESPACE.

  • Add any required packages to the “Suggests” section of DESCRIPTION.

  • Add the model to R/models.R.

  • Add the model to R/modelinfo.R.

  • Add a unit testing file to tests/testthat.