`C5_rules()`

is a way to generate a *specification* of a model
before fitting. The main arguments for the model are:

`trees`

: The number of sequential models included in the ensemble (rules are derived from an initial set of boosted trees).`min_n`

: The minimum number of data points in a node that are required for the node to be split further.

These arguments are converted to their specific names at the
time that the model is fit. Other options and argument can be
set using `parsnip::set_engine()`

. If left to their defaults
here (`NULL`

), the values are taken from the underlying model
functions. If parameters need to be modified, `update()`

can be used
in lieu of recreating the object from scratch.

C5_rules(mode = "classification", trees = NULL, min_n = NULL) # S3 method for C5_rules update( object, parameters = NULL, trees = NULL, min_n = NULL, fresh = FALSE, ... )

mode | A single character string for the type of model. The only possible value for this model is "classification". |
---|---|

trees | A non-negative integer (no greater than 100 for the number of members of the ensemble. |

min_n | An integer greater than one zero and nine for the minimum number of data points in a node that are required for the node to be split further. |

object | A |

parameters | A 1-row tibble or named list with |

fresh | A logical for whether the arguments should be modified in-place or replaced wholesale. |

... | Not used for |

An updated `parsnip`

model specification.

C5.0 is a classification model that is an extension of the C4.5
model of Quinlan (1993). It has tree- and rule-based versions that also
include boosting capabilities. `C5_rules()`

enables the version of the model
that uses a series of rules (see the examples below). To make a set of
rules, an initial C5.0 tree is created and flattened into rules. The rules
are pruned, simplified, and ordered. Rule sets are created within each
iteration of boosting.

The two main tuning parameters are the number of trees in the boosting
ensemble (`trees`

) and the number of samples required to continue splitting
when creating a tree (`min_n`

). There are no arguments to control the total
number of rules in the ensemble.

Note that `C5_rules()`

does not require that categorical predictors be
converted to numeric indicator values. Note that using `parsnip::fit()`

will
*always* create dummy variables so, if there is interest in keeping the
categorical predictors in their original format, `parsnip::fit_xy()`

would
be a better choice. When using the `tune`

package, using a recipe for
pre-processing enables more control over how such predictors are encoded
since recipes do not automatically create dummy variables.

Note that C5.0 has a tool for *early stopping* during boosting where less
iterations of boosting are performed than the number requested. `C5_rules()`

turns this feature off (although it can be re-enabled using
`C50::C5.0Control()`

).

Quinlan R (1993). *C4.5: Programs for Machine Learning*. Morgan
Kaufmann Publishers.

C5_rules()#> C5.0 Model Specification (classification) #> #> Computational engine: C5.0 #># Parameters can be represented by a placeholder: C5_rules(trees = 7)#> C5.0 Model Specification (classification) #> #> Main Arguments: #> trees = 7 #> #> Computational engine: C5.0 #># ------------------------------------------------------------------------------ data(ad_data, package = "modeldata") set.seed(282782) class_rules <- C5_rules(trees = 1, min_n = 10) %>% fit(Class ~ ., data = ad_data) summary(class_rules$fit)#> #> Call: #> C5.0.default(x = x, y = y, trials = trials, rules = TRUE, control #> = C50::C5.0Control(minCases = minCases, seed = sample.int(10^5, #> 1), earlyStopping = FALSE)) #> #> #> C5.0 [Release 2.07 GPL Edition] Wed Jun 10 19:51:03 2020 #> ------------------------------- #> #> Class specified by attribute `outcome' #> #> Read 333 cases (135 attributes) from undefined.data #> #> Rules: #> #> Rule 1: (17, lift 3.5) #> Creatine_Kinase_MB <= -1.610128 #> Fas_Ligand <= 2.407182 #> Ab_42 <= 11.28649 #> -> class Impaired [0.947] #> #> Rule 2: (45/6, lift 3.1) #> Creatine_Kinase_MB <= -1.610128 #> Eotaxin_3 > 54 #> Ab_42 <= 11.28649 #> -> class Impaired [0.851] #> #> Rule 3: (31/9, lift 2.6) #> PAI_1 > 0.58384 #> -> class Impaired [0.697] #> #> Rule 4: (87/8, lift 1.2) #> Creatine_Kinase_MB > -1.610128 #> -> class Control [0.899] #> #> Rule 5: (219/23, lift 1.2) #> PAI_1 <= 0.58384 #> Ab_42 > 11.28649 #> -> class Control [0.891] #> #> Rule 6: (164/24, lift 1.2) #> Eotaxin_3 <= 54 #> -> class Control [0.849] #> #> Default class: Control #> #> #> Evaluation on training data (333 cases): #> #> Rules #> ---------------- #> No Errors #> #> 6 47(14.1%) << #> #> #> (a) (b) <-classified as #> ---- ---- #> 52 39 (a): class Impaired #> 8 234 (b): class Control #> #> #> Attribute usage: #> #> 81.08% Ab_42 #> 75.08% PAI_1 #> 62.76% Eotaxin_3 #> 41.44% Creatine_Kinase_MB #> 5.11% Fas_Ligand #> #> #> Time: 0.0 secs #># ------------------------------------------------------------------------------ model <- C5_rules(trees = 10, min_n = 2) model#> C5.0 Model Specification (classification) #> #> Main Arguments: #> trees = 10 #> min_n = 2 #> #> Computational engine: C5.0 #>#> C5.0 Model Specification (classification) #> #> Main Arguments: #> trees = 1 #> min_n = 2 #> #> Computational engine: C5.0 #>#> C5.0 Model Specification (classification) #> #> Main Arguments: #> trees = 1 #> #> Computational engine: C5.0 #>