Cubist Regresion Models

Cubist is an R port of the Cubist GPL C code released by RuleQuest at http://rulequest.com/cubist-info.html. See the last section of this document for information on the porting. The other parts describes the functionality of the R package.

Model Trees

Cubist is a rule–based model that is an extension of Quinlan’s M5 model tree. A tree is grown where the terminal leaves contain linear regression models. These models are based on the predictors used in previous splits. Also, there are intermediate linear models at each step of the tree. A prediction is made using the linear regression model at the terminal node of the tree, but is “smoothed” by taking into account the prediction from the linear model in the previous node of the tree (which also occurs recursively up the tree). The tree is reduced to a set of rules, which initially are paths from the top of the tree to the bottom. Rules are eliminated via pruning and/or combined for simplification.

This is explained better in Quinlan (1992). Wang and Witten (1997) attempted to recreate this model using a “rational reconstruction” of Quinlan (1992) that is the basis for the M5P model in Weka (and the R package RWeka).

An example of a model tree can be illustrated using the Boston Housing data in the mlbench package.

library(Cubist)
library(mlbench)

data(BostonHousing)
BostonHousing$chas <- as.numeric(BostonHousing$chas) - 1

set.seed(1)
inTrain <- sample(1:nrow(BostonHousing), floor(.8*nrow(BostonHousing)))

train_pred <- BostonHousing[ inTrain, -14]
test_pred  <- BostonHousing[-inTrain, -14]

train_resp <- BostonHousing$medv[ inTrain]
test_resp  <- BostonHousing$medv[-inTrain]

model_tree <- cubist(x = train_pred, y = train_resp)
model_tree

## 
## Call:
## cubist.default(x = train_pred, y = train_resp)
## 
## Number of samples: 404 
## Number of predictors: 13 
## 
## Number of committees: 1 
## Number of rules: 5

summary(model_tree)

## 
## Call:
## cubist.default(x = train_pred, y = train_resp)
## 
## 
## Cubist [Release 2.07 GPL Edition]  Thu Jan  9 14:49:22 2020
## ---------------------------------
## 
##     Target attribute `outcome'
## 
## Read 404 cases (14 attributes) from undefined.data
## 
## Model:
## 
##   Rule 1: [77 cases, mean 14.04, range 5 to 27.5, est err 2.14]
## 
##     if
##  nox > 0.668
##     then
##  outcome = -2.18 + 3.47 dis + 21.6 nox - 0.32 lstat + 0.0089 b
##            - 0.12 ptratio - 0.02 crim - 0.005 age
## 
##   Rule 2: [167 cases, mean 19.30, range 7 to 31, est err 2.15]
## 
##     if
##  nox <= 0.668
##  lstat > 9.53
##     then
##  outcome = 43.02 - 0.94 ptratio - 0.27 lstat - 0.84 dis - 0.034 age
##            + 0.0082 b - 0.1 indus + 0.4 rm
## 
##   Rule 3: [29 cases, mean 25.08, range 18.2 to 50, est err 2.64]
## 
##     if
##  rm <= 6.226
##  lstat <= 9.53
##     then
##  outcome = -18.07 + 3.91 crim - 1.67 lstat + 0.0834 b + 3.3 rm
## 
##   Rule 4: [143 cases, mean 29.49, range 16.5 to 50, est err 2.39]
## 
##     if
##  dis > 2.3999
##  lstat <= 9.53
##     then
##  outcome = -28.26 + 11.1 rm + 0.62 crim - 0.58 lstat - 0.017 tax
##            - 0.059 age + 11.3 nox - 0.58 dis - 0.52 ptratio
## 
##   Rule 5: [14 cases, mean 40.54, range 22 to 50, est err 5.28]
## 
##     if
##  rm > 6.226
##  dis <= 2.3999
##  lstat <= 9.53
##     then
##  outcome = -5.27 + 3.14 crim - 5.18 dis - 1.22 lstat + 9.6 rm
##            - 0.0141 tax - 0.031 age - 0.39 ptratio
## 
## 
## Evaluation on training data (404 cases):
## 
##     Average  |error|               2.13
##     Relative |error|               0.31
##     Correlation coefficient        0.96
## 
## 
##  Attribute usage:
##    Conds  Model
## 
##     82%   100%    lstat
##     57%    51%    nox
##     37%    93%    dis
##     10%    82%    rm
##            93%    age
##            93%    ptratio
##            63%    b
##            61%    crim
##            39%    indus
##            37%    tax
## 
## 
## Time: 0.0 secs

There is no formula method for cubist; the predictors are specified as matrix or data frame and the outcome is a numeric vector.

There is a predict method for the model:

model_tree_pred <- predict(model_tree, test_pred)
## Test set RMSE
sqrt(mean((model_tree_pred - test_resp)^2))

## [1] 3.72

## Test set R^2
cor(model_tree_pred, test_resp)^2

## [1] 0.773

Ensembles By Committees

The Cubist model can also use a boosting–like scheme called committees where iterative model trees are created in sequence. The first tree follows the procedure described in the last section. Subsequent trees are created using adjusted versions to the training set outcome: if the model over–predicted a value, the response is adjusted downward for the next model (and so on). Unlike traditional boosting, stage weights for each committee are not used to average the predictions from each model tree; the final prediction is a simple average of the predictions from each model tree.

The committee option can be used to control number of model trees:

set.seed(1)
com_model <- cubist(x = train_pred, y = train_resp, committees = 5)
summary(com_model)

## 
## Call:
## cubist.default(x = train_pred, y = train_resp, committees = 5)
## 
## 
## Cubist [Release 2.07 GPL Edition]  Thu Jan  9 14:49:23 2020
## ---------------------------------
## 
##     Target attribute `outcome'
## 
## Read 404 cases (14 attributes) from undefined.data
## 
## Model 1:
## 
##   Rule 1/1: [77 cases, mean 14.04, range 5 to 27.5, est err 2.14]
## 
##     if
##  nox > 0.668
##     then
##  outcome = -2.18 + 3.47 dis + 21.6 nox - 0.32 lstat + 0.0089 b
##            - 0.12 ptratio - 0.02 crim - 0.005 age
## 
##   Rule 1/2: [167 cases, mean 19.30, range 7 to 31, est err 2.15]
## 
##     if
##  nox <= 0.668
##  lstat > 9.53
##     then
##  outcome = 43.02 - 0.94 ptratio - 0.27 lstat - 0.84 dis - 0.034 age
##            + 0.0082 b - 0.1 indus + 0.4 rm
## 
##   Rule 1/3: [29 cases, mean 25.08, range 18.2 to 50, est err 2.64]
## 
##     if
##  rm <= 6.226
##  lstat <= 9.53
##     then
##  outcome = -18.07 + 3.91 crim - 1.67 lstat + 0.0834 b + 3.3 rm
## 
##   Rule 1/4: [143 cases, mean 29.49, range 16.5 to 50, est err 2.39]
## 
##     if
##  dis > 2.3999
##  lstat <= 9.53
##     then
##  outcome = -28.26 + 11.1 rm + 0.62 crim - 0.58 lstat - 0.017 tax
##            - 0.059 age + 11.3 nox - 0.58 dis - 0.52 ptratio
## 
##   Rule 1/5: [14 cases, mean 40.54, range 22 to 50, est err 5.28]
## 
##     if
##  rm > 6.226
##  dis <= 2.3999
##  lstat <= 9.53
##     then
##  outcome = -5.27 + 3.14 crim - 5.18 dis - 1.22 lstat + 9.6 rm
##            - 0.0141 tax - 0.031 age - 0.39 ptratio
## 
## Model 2:
## 
##   Rule 2/1: [58 cases, mean 14.03, range 5 to 36, est err 4.06]
## 
##     if
##  dis > 1.4254
##  dis <= 1.8956
##  lstat > 5.91
##     then
##  outcome = 86.06 - 20.56 dis - 0.47 lstat - 3 rm - 0.0201 b - 0.17 crim
##            - 4.1 nox
## 
##   Rule 2/2: [159 cases, mean 19.56, range 8.7 to 27.1, est err 1.84]
## 
##     if
##  rm <= 6.251
##  dis > 1.8956
##     then
##  outcome = -0.22 + 3.8 rm - 0.076 age + 0.021 b + 11.6 nox - 0.56 ptratio
##            - 0.04 lstat - 0.06 dis
## 
##   Rule 2/3: [116 cases, mean 22.43, range 7.2 to 50, est err 2.23]
## 
##     if
##  rm > 6.251
##  lstat > 5.91
##     then
##  outcome = -23.67 + 8.5 rm - 0.46 lstat - 0.3 crim - 0.0088 tax + 6.5 nox
##            - 0.2 dis - 0.19 ptratio + 0.0028 b - 0.008 age
## 
##   Rule 2/4: [10 cases, mean 25.07, range 5 to 50, est err 8.00]
## 
##     if
##  rm <= 6.251
##  dis <= 1.4254
##     then
##  outcome = 174.32 - 119.61 dis - 1.31 lstat + 38.6 nox
## 
##   Rule 2/5: [8 cases, mean 27.56, range 22.5 to 50, est err 3.33]
## 
##     if
##  rm <= 6.461
##  lstat <= 5.91
##     then
##  outcome = 22.34 + 6.98 crim
## 
##   Rule 2/6: [37 cases, mean 36.09, range 22.8 to 50, est err 3.40]
## 
##     if
##  rm > 6.461
##  tax > 265
##  lstat <= 5.91
##     then
##  outcome = 53.88 - 0.2698 b + 0.0588 tax + 11.5 rm + 0.52 crim - 1.83 dis
##            - 0.129 age - 0.3 lstat - 0.12 ptratio
## 
##   Rule 2/7: [47 cases, mean 36.19, range 24.4 to 50, est err 3.56]
## 
##     if
##  rm > 6.461
##  tax <= 265
##     then
##  outcome = -23.33 - 0.0943 tax + 13.6 rm + 0.84 crim - 0.0276 b
##            - 0.21 lstat - 0.09 ptratio - 0.09 dis - 0.005 age + 0.006 zn
## 
## Model 3:
## 
##   Rule 3/1: [33 cases, mean 13.98, range 5 to 23.2, est err 2.95]
## 
##     if
##  nox > 0.668
##  b > 375.52
##     then
##  outcome = 197.68 - 0.3916 b - 0.5 age + 36.1 nox - 0.36 lstat
##            - 0.08 crim
## 
##   Rule 3/2: [44 cases, mean 14.09, range 7 to 27.5, est err 2.58]
## 
##     if
##  nox > 0.668
##  b <= 375.52
##     then
##  outcome = 14.76 - 0.3 lstat + 0.0196 b
## 
##   Rule 3/3: [241 cases, mean 17.54, range 5 to 31, est err 2.50]
## 
##     if
##  lstat > 9.53
##     then
##  outcome = 49.71 - 1.16 dis - 0.32 lstat - 1.06 ptratio - 14.8 nox
##            + 0.0091 b + 0.9 rm + 0.07 rad - 0.017 age - 0.0023 tax
##            - 0.03 crim
## 
##   Rule 3/4: [29 cases, mean 25.08, range 18.2 to 50, est err 3.71]
## 
##     if
##  rm <= 6.226
##  lstat <= 9.53
##     then
##  outcome = -71.35 + 3.01 crim + 0.1693 b - 1.96 lstat + 5.5 rm
##            + 0.47 indus + 0.0173 tax
## 
##   Rule 3/5: [134 cases, mean 31.94, range 16.5 to 50, est err 2.79]
## 
##     if
##  rm > 6.226
##  lstat <= 9.53
##     then
##  outcome = -13.33 + 1.72 crim + 9.9 rm - 0.87 lstat - 0.86 dis
##            - 0.73 ptratio - 0.045 age - 0.0033 tax
## 
## Model 4:
## 
##   Rule 4/1: [17 cases, mean 11.90, range 8.4 to 17.8, est err 1.86]
## 
##     if
##  nox > 0.693
##  lstat > 19.52
##     then
##  outcome = 12.1 - 0.64 crim + 2 rm - 0.18 ptratio - 3.1 nox
## 
##   Rule 4/2: [38 cases, mean 12.67, range 5 to 23.7, est err 3.61]
## 
##     if
##  nox <= 0.693
##  dis > 1.4254
##  lstat > 19.52
##     then
##  outcome = 112.54 - 116.7 nox - 4.67 dis - 0.109 age - 0.0155 b
##            - 0.008 tax
## 
##   Rule 4/3: [70 cases, mean 18.02, range 9.6 to 27.5, est err 1.96]
## 
##     if
##  crim > 1.42502
##  dis > 1.4254
##  lstat > 5.91
##  lstat <= 19.52
##     then
##  outcome = 28.61 - 0.9 lstat + 0.0105 b - 0.027 age - 0.06 crim + 3 nox
## 
##   Rule 4/4: [61 cases, mean 18.76, range 12.7 to 25, est err 1.92]
## 
##     if
##  crim > 0.21977
##  crim <= 1.42502
##  rm <= 6.546
##  lstat > 5.91
##     then
##  outcome = -8.96 + 6.4 rm - 0.108 age - 0.09 lstat + 0.0017 b
##            - 0.07 ptratio - 0.07 dis - 0.0008 tax - 0.01 crim
## 
##   Rule 4/5: [112 cases, mean 21.60, range 13.6 to 29.4, est err 1.67]
## 
##     if
##  crim <= 0.21977
##  rm <= 6.546
##  lstat <= 19.52
##     then
##  outcome = -37.71 + 18.25 crim + 7.3 rm + 0.0451 b - 0.0149 tax
##            + 0.32 lstat - 0.068 age
## 
##   Rule 4/6: [9 cases, mean 22.30, range 5 to 50, est err 11.24]
## 
##     if
##  rm <= 6.546
##  dis <= 1.4254
##  lstat > 5.91
##     then
##  outcome = 216.66 - 123.39 dis - 1.52 lstat + 31.4 nox - 5.1 rm
## 
##   Rule 4/7: [8 cases, mean 27.56, range 22.5 to 50, est err 5.10]
## 
##     if
##  rm <= 6.461
##  lstat <= 5.91
##     then
##  outcome = 93.41 - 2.79 lstat - 11.5 rm + 37.5 nox + 0.49 crim
##            - 0.0023 tax - 0.05 ptratio
## 
##   Rule 4/8: [118 cases, mean 32.26, range 7.5 to 50, est err 4.39]
## 
##     if
##  rm > 6.546
##     then
##  outcome = -12.45 - 0.0364 tax + 8.1 rm - 0.05 lstat - 0.12 dis
##            - 0.007 age - 0.08 ptratio - 0.02 indus - 0.01 crim
## 
##   Rule 4/9: [37 cases, mean 36.09, range 22.8 to 50, est err 4.24]
## 
##     if
##  rm > 6.461
##  tax > 265
##  lstat <= 5.91
##     then
##  outcome = 52.87 - 0.2742 b + 2.81 crim + 12 rm + 0.0478 tax - 0.151 age
##            - 1.73 dis - 0.06 lstat - 0.9 nox
## 
##   Rule 4/10: [33 cases, mean 37.05, range 25 to 50, est err 3.64]
## 
##     if
##  tax <= 265
##  lstat <= 5.91
##     then
##  outcome = -21.76 - 0.1186 tax + 1.59 crim + 13.2 rm - 0.33 lstat
##            - 4.6 nox - 0.16 ptratio
## 
## Model 5:
## 
##   Rule 5/1: [324 cases, mean 19.78, range 5 to 50, est err 2.75]
## 
##     if
##  rm <= 6.781
##     then
##  outcome = 46.87 - 0.47 lstat - 1.05 dis - 14.9 nox - 0.78 ptratio
##            + 0.0104 b + 0.4 rm + 0.02 rad - 0.0006 tax
## 
##   Rule 5/2: [80 cases, mean 35.32, range 7.5 to 50, est err 4.52]
## 
##     if
##  rm > 6.781
##     then
##  outcome = -49.93 + 13 rm - 0.83 crim - 0.19 lstat - 0.52 dis
##            - 0.45 ptratio - 6.3 nox + 0.0049 b + 0.04 rad - 0.0017 tax
##            + 0.007 zn
## 
##   Rule 5/3: [77 cases, mean 35.75, range 22.5 to 50, est err 3.93]
## 
##     if
##  lstat <= 5.91
##     then
##  outcome = -10.9 + 4.64 crim - 2.66 lstat + 9.2 rm - 0.26 indus
##            - 0.84 dis - 0.12 ptratio + 0.02 rad - 1.5 nox - 0.0008 tax
##            + 0.0014 b
## 
## 
## Evaluation on training data (404 cases):
## 
##     Average  |error|               1.82
##     Relative |error|               0.26
##     Correlation coefficient        0.97
## 
## 
##  Attribute usage:
##    Conds  Model
## 
##     62%    97%    lstat
##     57%    88%    rm
##     22%    84%    dis
##     16%    66%    nox
##     10%    68%    crim
##      7%    71%    tax
##      3%    79%    b
##            80%    ptratio
##            69%    age
##            31%    rad
##            17%    indus
##             5%    zn
## 
## 
## Time: 0.0 secs

For this model:

com_pred <- predict(com_model, test_pred)
## RMSE
sqrt(mean((com_pred - test_resp)^2))

## [1] 4.5

## R^2
cor(com_pred, test_resp)^2

## [1] 0.688

Instance–Based Corrections

Another innovation in Cubist using nearest–neighbors to adjust the predictions from the rule–based model. First, a model tree (with or without committees) is created. Once a sample is predicted by this model, Cubist can find it’s nearest neighbors and determine the average of these training set points. See Quinlan (1993a) for the details of the adjustment.

The development of rules and committees is independent of the choice of using instances. The original C code allowed the program to choose whether to use instances, not use them or let the program decide. Our approach is to build a model with the cubist function that is ignorant to the decision about instances. When samples are predicted, the argument neighbors can be used to adjust the rule–based model predictions (or not).

We can add instances to the previously fit committee model:

inst_pred <- predict(com_model, test_pred, neighbors = 5)
## RMSE
sqrt(mean((inst_pred - test_resp)^2))

## [1] 4.64

## R^2
cor(inst_pred, test_resp)^2

## [1] 0.688

Note that the previous models used the implicit default of neighbors = 0 for their predictions.

To tune the model over different values of neighbors and committees, the train function in the `caret package can be used to optimize these parameters. For example:

library(caret)

grid <- expand.grid(committees = c(1, 10, 50, 100),
                    neighbors = c(0, 1, 5, 9))
set.seed(1)
boston_tuned <- train(
  x = train_pred,
  y = train_resp,
  method = "cubist",
  tuneGrid = grid,
  trControl = trainControl(method = "cv")
  )
boston_tuned

## Cubist 
## 
## 404 samples
##  13 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 364, 364, 364, 363, 364, 363, ... 
## Resampling results across tuning parameters:
## 
##   committees  neighbors  RMSE  Rsquared  MAE 
##     1         0          3.49  0.870     2.41
##     1         1          3.48  0.877     2.35
##     1         5          3.23  0.891     2.17
##     1         9          3.27  0.886     2.20
##    10         0          2.98  0.903     2.11
##    10         1          2.88  0.912     2.07
##    10         5          2.73  0.919     1.91
##    10         9          2.76  0.917     1.91
##    50         0          3.01  0.900     2.11
##    50         1          2.90  0.911     2.06
##    50         5          2.71  0.920     1.88
##    50         9          2.74  0.917     1.90
##   100         0          2.97  0.903     2.09
##   100         1          2.87  0.913     2.04
##   100         5          2.67  0.922     1.86
##   100         9          2.71  0.919     1.88
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 100 and neighbors = 5.

The next figure shows the profiles of the tuning parameters produced using ggplot(boston_tuned).

It may also be useful to see how the different models fit a single predictor:

lstat_df <- train_pred[, "lstat", drop = FALSE]
rules_only <- cubist(x = lstat_df, y = train_resp)
rules_and_com <- cubist(x = lstat_df, y = train_resp, committees = 100)

predictions <- lstat_df
predictions$medv <- train_resp
predictions$rules_neigh <- predict(rules_only, lstat_df, neighbors = 5)
predictions$committees <- predict(rules_and_com, lstat_df)

The figure below shows the model fits for the test data. For these data, there doesn’t appear to be much of a improvement when committees or instances are added to the based rules.

Variable Importance

The modelTree method for Cubist shows the usage of each variable in either the rule conditions or the (terminal) linear model. In actuality, many more linear models are used in prediction that are shown in the output. Because of this, the variable usage statistics shown at the end of the output of the summary function will probably be inconsistent with the rules also shown in the output. At each split of the tree, Cubist saves a linear model (after feature selection) that is allowed to have terms for each variable used in the current split or any split above it. Quinlan (1992) discusses a smoothing algorithm where each model prediction is a linear combination of the parent and child model along the tree. As such, the final prediction is a function of all the linear models from the initial node to the terminal node. The percentages shown in the Cubist output reflects all the models involved in prediction (as opposed to the terminal models shown in the output).

The raw usage statistics are contained in a data frame called usage in the cubist object.

The caret package has a general variable importance method varImp. When using this function on a cubist argument, the variable importance is a linear combination of the usage in the rule conditions and the model.

For example:

summary(model_tree)

## 
## Call:
## cubist.default(x = train_pred, y = train_resp)
## 
## 
## Cubist [Release 2.07 GPL Edition]  Thu Jan  9 14:49:22 2020
## ---------------------------------
## 
##     Target attribute `outcome'
## 
## Read 404 cases (14 attributes) from undefined.data
## 
## Model:
## 
##   Rule 1: [77 cases, mean 14.04, range 5 to 27.5, est err 2.14]
## 
##     if
##  nox > 0.668
##     then
##  outcome = -2.18 + 3.47 dis + 21.6 nox - 0.32 lstat + 0.0089 b
##            - 0.12 ptratio - 0.02 crim - 0.005 age
## 
##   Rule 2: [167 cases, mean 19.30, range 7 to 31, est err 2.15]
## 
##     if
##  nox <= 0.668
##  lstat > 9.53
##     then
##  outcome = 43.02 - 0.94 ptratio - 0.27 lstat - 0.84 dis - 0.034 age
##            + 0.0082 b - 0.1 indus + 0.4 rm
## 
##   Rule 3: [29 cases, mean 25.08, range 18.2 to 50, est err 2.64]
## 
##     if
##  rm <= 6.226
##  lstat <= 9.53
##     then
##  outcome = -18.07 + 3.91 crim - 1.67 lstat + 0.0834 b + 3.3 rm
## 
##   Rule 4: [143 cases, mean 29.49, range 16.5 to 50, est err 2.39]
## 
##     if
##  dis > 2.3999
##  lstat <= 9.53
##     then
##  outcome = -28.26 + 11.1 rm + 0.62 crim - 0.58 lstat - 0.017 tax
##            - 0.059 age + 11.3 nox - 0.58 dis - 0.52 ptratio
## 
##   Rule 5: [14 cases, mean 40.54, range 22 to 50, est err 5.28]
## 
##     if
##  rm > 6.226
##  dis <= 2.3999
##  lstat <= 9.53
##     then
##  outcome = -5.27 + 3.14 crim - 5.18 dis - 1.22 lstat + 9.6 rm
##            - 0.0141 tax - 0.031 age - 0.39 ptratio
## 
## 
## Evaluation on training data (404 cases):
## 
##     Average  |error|               2.13
##     Relative |error|               0.31
##     Correlation coefficient        0.96
## 
## 
##  Attribute usage:
##    Conds  Model
## 
##     82%   100%    lstat
##     57%    51%    nox
##     37%    93%    dis
##     10%    82%    rm
##            93%    age
##            93%    ptratio
##            63%    b
##            61%    crim
##            39%    indus
##            37%    tax
## 
## 
## Time: 0.0 secs

model_tree$usage

##    Conditions Model Variable
## 1          82   100    lstat
## 2          57    51      nox
## 3          37    93      dis
## 4          10    82       rm
## 5           0    93      age
## 6           0    93  ptratio
## 7           0    63        b
## 8           0    61     crim
## 9           0    39    indus
## 10          0    37      tax
## 11          0     0       zn
## 12          0     0     chas
## 13          0     0      rad

library(caret)
varImp(model_tree)

##         Overall
## lstat      91.0
## nox        54.0
## dis        65.0
## rm         46.0
## age        46.5
## ptratio    46.5
## b          31.5
## crim       30.5
## indus      19.5
## tax        18.5
## zn          0.0
## chas        0.0
## rad         0.0

It should be noted that this variable importance measure does not capture the influence of the predictors when using the instance–based correction.

Tidy rules

Rules from a Cubist model can be viewed using summary as follows:

summary(model_tree)

## 
## Call:
## cubist.default(x = train_pred, y = train_resp)
## 
## 
## Cubist [Release 2.07 GPL Edition]  Thu Jan  9 14:49:22 2020
## ---------------------------------
## 
##     Target attribute `outcome'
## 
## Read 404 cases (14 attributes) from undefined.data
## 
## Model:
## 
##   Rule 1: [77 cases, mean 14.04, range 5 to 27.5, est err 2.14]
## 
##     if
##  nox > 0.668
##     then
##  outcome = -2.18 + 3.47 dis + 21.6 nox - 0.32 lstat + 0.0089 b
##            - 0.12 ptratio - 0.02 crim - 0.005 age
## 
##   Rule 2: [167 cases, mean 19.30, range 7 to 31, est err 2.15]
## 
##     if
##  nox <= 0.668
##  lstat > 9.53
##     then
##  outcome = 43.02 - 0.94 ptratio - 0.27 lstat - 0.84 dis - 0.034 age
##            + 0.0082 b - 0.1 indus + 0.4 rm
## 
##   Rule 3: [29 cases, mean 25.08, range 18.2 to 50, est err 2.64]
## 
##     if
##  rm <= 6.226
##  lstat <= 9.53
##     then
##  outcome = -18.07 + 3.91 crim - 1.67 lstat + 0.0834 b + 3.3 rm
## 
##   Rule 4: [143 cases, mean 29.49, range 16.5 to 50, est err 2.39]
## 
##     if
##  dis > 2.3999
##  lstat <= 9.53
##     then
##  outcome = -28.26 + 11.1 rm + 0.62 crim - 0.58 lstat - 0.017 tax
##            - 0.059 age + 11.3 nox - 0.58 dis - 0.52 ptratio
## 
##   Rule 5: [14 cases, mean 40.54, range 22 to 50, est err 5.28]
## 
##     if
##  rm > 6.226
##  dis <= 2.3999
##  lstat <= 9.53
##     then
##  outcome = -5.27 + 3.14 crim - 5.18 dis - 1.22 lstat + 9.6 rm
##            - 0.0141 tax - 0.031 age - 0.39 ptratio
## 
## 
## Evaluation on training data (404 cases):
## 
##     Average  |error|               2.13
##     Relative |error|               0.31
##     Correlation coefficient        0.96
## 
## 
##  Attribute usage:
##    Conds  Model
## 
##     82%   100%    lstat
##     57%    51%    nox
##     37%    93%    dis
##     10%    82%    rm
##            93%    age
##            93%    ptratio
##            63%    b
##            61%    crim
##            39%    indus
##            37%    tax
## 
## 
## Time: 0.0 secs

The tidyRules function in the tidyrules package returns rules in a tibble (an extension of dataframe) with one row per rule. The tibble provides these information about the rule: support, mean, min, max, error, LHS, RHS and committee. The values in LHS and RHS columns are strings which can be parsed as R expressions. These can be pasted inside the parenthesis of dplyr::filter() to obtain the rows of the data corresponding to the rule and evaluate the response variable.

library(dplyr)
library(modeldata)
data("attrition")
attrition <- as_tibble(attrition)

# lets predict monthly income
attrition_x <- attrition %>% dplyr::select(-MonthlyIncome, -Attrition)
attrition_y <- attrition %>% dplyr::select(MonthlyIncome) %>% unlist()

model_tree_attrition <- cubist(x = attrition_x, y = attrition_y)

library(tidyrules)

tr <- tidyRules(model_tree_attrition)
tr

## # A tibble: 8 x 9
##      id LHS            RHS            support   mean   min   max error committee
##   <int> <chr>          <chr>            <int>  <dbl> <dbl> <dbl> <dbl>     <int>
## 1     1 JobLevel <= 1  (2265) + (35 …     543  2787.  1009  4968  559.         1
## 2     2 JobLevel > 1 … (-2092) + (32…      57  4459   2272  5301  328.         1
## 3     3 JobLevel > 1 … (-39) + (2333…     124  4672.  2042  9724 1013.         1
## 4     4 JobLevel > 1 … (1859) + (260…     387  6261.  2176  9998  995.         1
## 5     5 JobRole %in% … (-1136) + (35…     245  8469.  2592 13973  932.         1
## 6     6 JobRole %in% … (-1384.7) + (…      26 12857. 11031 17603  517.         1
## 7     7 JobLevel <= 4… (4166.4) + (3…      87 15824  12061 17924  694.         1
## 8     8 JobLevel > 4   (13633) + (10…      69 19192. 18041 19999  416          1

tr[, c("LHS", "RHS")]

## # A tibble: 8 x 2
##   LHS                                     RHS                                   
##   <chr>                                   <chr>                                 
## 1 JobLevel <= 1                           (2265) + (35 * TotalWorkingYears) + (…
## 2 JobLevel > 1 & TotalWorkingYears <= 5   (-2092) + (3279 * JobLevel) + (27 * T…
## 3 JobLevel > 1 & JobRole %in% c('Laborat… (-39) + (2333 * JobLevel)             
## 4 JobLevel > 1 & JobRole %in% c('Healthc… (1859) + (2603 * JobLevel) - (98 * Ye…
## 5 JobRole %in% c('Healthcare_Representat… (-1136) + (3586 * JobLevel) + (19 * T…
## 6 JobRole %in% c('Manager', 'Research_Di… (-1384.7) + (4476 * JobLevel) - (0.01…
## 7 JobLevel <= 4 & JobRole %in% c('Manage… (4166.4) + (3467 * JobLevel) - (23 * …
## 8 JobLevel > 4                            (13633) + (1076 * JobLevel) + (8 * To…

# lets look at 7th rule
tr$LHS[7]

## [1] "JobLevel <= 4 & JobRole %in% c('Manager', 'Research_Director') & TotalWorkingYears > 14"

tr$RHS[7]

## [1] "(4166.4) + (3467 * JobLevel) - (23 * Age) - (0.011 * MonthlyRate)"

These results can be used to look at specific parts of the data. For example, the 7th rule predictions are:

library(dplyr)
library(rlang)

char_to_expr <- function(x, index = 1, model = TRUE) {
  x <- x %>% dplyr::slice(index) 
  if (model) {
    x <- x %>% dplyr::pull(RHS) %>% rlang::parse_expr()
  } else {
    x <- x %>% dplyr::pull(LHS) %>% rlang::parse_expr()
  }
  x
}

rule_expr  <- char_to_expr(tr, 7, model = FALSE)
model_expr <- char_to_expr(tr, 7, model = TRUE)

# filter the data corresponding to rule 7
attrition %>% 
  dplyr::filter(eval_tidy(rule_expr, attrition)) %>%
  # evaluate the estimated MonthlyIncome
  dplyr::mutate(MonthlyIncome_est = eval_tidy(model_expr, .)) %>%
  dplyr::select(MonthlyIncome, MonthlyIncome_est)

## # A tibble: 87 x 2
##    MonthlyIncome MonthlyIncome_est
##            <int>             <dbl>
##  1         15427            16573.
##  2         13458            13642.
##  3         14756            16552.
##  4         13245            13367.
##  5         13664            13439.
##  6         13549            13061.
##  7         17328            16640.
##  8         16959            16831.
##  9         17181            16950.
## 10         16792            16774.
## # … with 77 more rows

Exporting the Model Using the RuleQuest file format

As previously mentioned, this code is a port of the command–line C code. To run the C code, the training set data must be converted to a specific file format as detailed on the RuleQuest website. Two files are created. The file.data file is a header–less, comma delimited version of the data (the file part is a name given by the user). The file.names file provides information about the columns (eg. levels for categorical data and so on). After running the C program, another text file called file.models, which contains the information needed for prediction.

Once a model has been built with the R cubist package, the exportCubistFiles can be used to create the .data, .names and .model files so that the same model can be run at the command–line.