rule_fit() is a way to generate a specification of a model before fitting. The main arguments for the model are:

  • mtry: The number of predictors that will be randomly sampled at each split when creating the tree models.

  • trees: The number of trees contained in the ensemble.

  • min_n: The minimum number of data points in a node that are required for the node to be split further.

  • tree_depth: The maximum depth of the tree (i.e. number of splits).

  • learn_rate: The rate at which the boosting algorithm adapts from iteration-to-iteration.

  • loss_reduction: The reduction in the loss function required to split further.

  • sample_size: The amount of data exposed to the fitting routine.

These arguments are converted to their specific names at the time that the model is fit. Other options and argument can be set using parsnip::set_engine(). If left to their defaults here (NULL), the values are taken from the underlying model functions. If parameters need to be modified, update() can be used in lieu of recreating the object from scratch.

rule_fit(
  mode = "unknown",
  mtry = NULL,
  trees = NULL,
  min_n = NULL,
  tree_depth = NULL,
  learn_rate = NULL,
  loss_reduction = NULL,
  sample_size = NULL,
  penalty = NULL
)

# S3 method for rule_fit
update(
  object,
  parameters = NULL,
  mtry = NULL,
  trees = NULL,
  min_n = NULL,
  tree_depth = NULL,
  learn_rate = NULL,
  loss_reduction = NULL,
  sample_size = NULL,
  penalty = NULL,
  fresh = FALSE,
  ...
)

Arguments

mode

A single character string for the type of model. Possible values for this model are "unknown", "regression", or "classification".

mtry

An number for the number (or proportion) of predictors that will be randomly sampled at each split when creating the tree models.

trees

An integer for the number of trees contained in the ensemble.

min_n

An integer for the minimum number of data points in a node that are required for the node to be split further.

tree_depth

An integer for the maximum depth of the tree (i.e. number of splits).

learn_rate

A number for the rate at which the boosting algorithm adapts from iteration-to-iteration.

loss_reduction

A number for the reduction in the loss function required to split further .

sample_size

An number for the number (or proportion) of data that is exposed to the fitting routine.

penalty

L1 regularization parameter.

object

A rule_fit model specification.

parameters

A 1-row tibble or named list with main parameters to update. If the individual arguments are used, these will supersede the values in parameters. Also, using engine arguments in this object will result in an error.

fresh

A logical for whether the arguments should be modified in-place or replaced wholesale.

...

Not used for update().

Value

An updated parsnip model specification.

Details

The RuleFit model creates a regression model of rules in two stages. The first stage uses a tree-based model that is used to generate a set of rules that can be filtered, modified, and simplified. These rules are then added as predictors to a regularized generalized linear model that can also conduct feature selection during model training.

For the xrf engine, the xgboost package is used to create the rule set that is then added to a glmnet model.

The only available engine is "xrf". Not that, per the documentation in ?xrf, transformations of the response variable are not supported. To use these with rule_fit(), we recommend using a recipe instead of the formula method.

References

Friedman, J. H., and Popescu, B. E. (2008). "Predictive learning via rule ensembles." The Annals ofApplied Statistics, 2(3), 916-954.

See also

Examples

rule_fit()
#> RuleFit Model Specification (unknown) #> #> Computational engine: xrf #>
# Parameters can be represented by a placeholder: rule_fit(trees = 7)
#> RuleFit Model Specification (unknown) #> #> Main Arguments: #> trees = 7 #> #> Computational engine: xrf #>
# ------------------------------------------------------------------------------ set.seed(6907) rule_fit_rules <- rule_fit(trees = 3, penalty = 0.1) %>% set_mode("classification") %>% fit(Species ~ ., data = iris)
#> New names: #> * . -> ....1 #> * . -> ....2 #> * . -> ....3 #> * . -> ....4 #> * . -> ....5 #> * ...
# ------------------------------------------------------------------------------ model <- rule_fit(trees = 10, min_n = 2) model
#> RuleFit Model Specification (unknown) #> #> Main Arguments: #> trees = 10 #> min_n = 2 #> #> Computational engine: xrf #>
update(model, trees = 1)
#> RuleFit Model Specification (unknown) #> #> Main Arguments: #> trees = 1 #> min_n = 2 #> #> Computational engine: xrf #>
update(model, trees = 1, fresh = TRUE)
#> RuleFit Model Specification (unknown) #> #> Main Arguments: #> trees = 1 #> #> Computational engine: xrf #>