`rule_fit()`

is a way to generate a *specification* of a model
before fitting. The main arguments for the model are:

`mtry`

: The number of predictors that will be randomly sampled at each split when creating the tree models.`trees`

: The number of trees contained in the ensemble.`min_n`

: The minimum number of data points in a node that are required for the node to be split further.`tree_depth`

: The maximum depth of the tree (i.e. number of splits).`learn_rate`

: The rate at which the boosting algorithm adapts from iteration-to-iteration.`loss_reduction`

: The reduction in the loss function required to split further.`sample_size`

: The amount of data exposed to the fitting routine.

These arguments are converted to their specific names at the
time that the model is fit. Other options and argument can be
set using `parsnip::set_engine()`

. If left to their defaults
here (`NULL`

), the values are taken from the underlying model
functions. If parameters need to be modified, `update()`

can be used
in lieu of recreating the object from scratch.

rule_fit( mode = "unknown", mtry = NULL, trees = NULL, min_n = NULL, tree_depth = NULL, learn_rate = NULL, loss_reduction = NULL, sample_size = NULL, penalty = NULL ) # S3 method for rule_fit update( object, parameters = NULL, mtry = NULL, trees = NULL, min_n = NULL, tree_depth = NULL, learn_rate = NULL, loss_reduction = NULL, sample_size = NULL, penalty = NULL, fresh = FALSE, ... )

mode | A single character string for the type of model. Possible values for this model are "unknown", "regression", or "classification". |
---|---|

mtry | An number for the number (or proportion) of predictors that will be randomly sampled at each split when creating the tree models. |

trees | An integer for the number of trees contained in the ensemble. |

min_n | An integer for the minimum number of data points in a node that are required for the node to be split further. |

tree_depth | An integer for the maximum depth of the tree (i.e. number of splits). |

learn_rate | A number for the rate at which the boosting algorithm adapts from iteration-to-iteration. |

loss_reduction | A number for the reduction in the loss function required to split further . |

sample_size | An number for the number (or proportion) of data that is exposed to the fitting routine. |

penalty | L1 regularization parameter. |

object | A |

parameters | A 1-row tibble or named list with |

fresh | A logical for whether the arguments should be modified in-place or replaced wholesale. |

... | Not used for |

An updated `parsnip`

model specification.

The RuleFit model creates a regression model of rules in two stages. The first stage uses a tree-based model that is used to generate a set of rules that can be filtered, modified, and simplified. These rules are then added as predictors to a regularized generalized linear model that can also conduct feature selection during model training.

For the `xrf`

engine, the `xgboost`

package is used to create the rule set
that is then added to a `glmnet`

model.

The only available engine is `"xrf"`

. Not that, per the documentation in
`?xrf`

, transformations of the response variable are not supported. To
use these with `rule_fit()`

, we recommend using a recipe instead of the
formula method.

Friedman, J. H., and Popescu, B. E. (2008). "Predictive learning
via rule ensembles." *The Annals ofApplied Statistics*, 2(3), 916-954.

rule_fit()#> RuleFit Model Specification (unknown) #> #> Computational engine: xrf #># Parameters can be represented by a placeholder: rule_fit(trees = 7)#> RuleFit Model Specification (unknown) #> #> Main Arguments: #> trees = 7 #> #> Computational engine: xrf #># ------------------------------------------------------------------------------ set.seed(6907) rule_fit_rules <- rule_fit(trees = 3, penalty = 0.1) %>% set_mode("classification") %>% fit(Species ~ ., data = iris)#>#> #> #> #> #> #># ------------------------------------------------------------------------------ model <- rule_fit(trees = 10, min_n = 2) model#> RuleFit Model Specification (unknown) #> #> Main Arguments: #> trees = 10 #> min_n = 2 #> #> Computational engine: xrf #>#> RuleFit Model Specification (unknown) #> #> Main Arguments: #> trees = 1 #> min_n = 2 #> #> Computational engine: xrf #>#> RuleFit Model Specification (unknown) #> #> Main Arguments: #> trees = 1 #> #> Computational engine: xrf #>