What is boostr
? In brief, boostr
was designed to be a software “laboratory” of sorts. This package is primarily meant to help you tinker with and evaluate your (boosting) algorithms. In a sense, boostr
is here to let you explore and refine.
What is boostr
not? boostr
is not here to design algorithms / boosting procedures for you. As far as I know, no software can do that (yet). If you don’t have an algorithm to play with, but still are interested in this package: don’t worry! In addition to letting you bagg your favorite estimators, boostr
implements three classical boosting algorithms, with the freedom to mix and match aggregators and reweighters, provided the pair are compatible. For a more thorough look at the various user input in the boostr
framework check out this vignette.
Since this is meant to be a “dive right in” kind of vignette, I’m going to assume you are cursorily familiar with the principle behind boosting. In particular, I’m assuming you’ve seen one of the classic boosting algoritms like “AdaBoost”, and have a feel for how boosting might be generalized. If you don’t, check out the paper behind boostr
. The paper may feel a bit math-y but I promise it’s a pretty easy read.
Let’s say you wanted to boost an svm according to the arc-x4 boosting algorithm. Well, good news: boostr
implements this algorithm for you with the boostWithArcX4
function.
library(mlbench)
data(Glass)
set.seed(1234)
boostedSVM1 <-
boostr::boostWithArcX4(x = list(train = e1071::svm),
B = 3,
data = Glass,
.procArgs = list(
.trainArgs=list(
formula=formula(Type~.),
cost=100)))
## Warning: Walker's alias method used: results are different from R < 2.2.0
boostedSVM1
## A boostr object composed of 3 estimators.
##
## Available performance metrics: oobErr, oobConfMat, errVec
##
## Structure of reweighter output:
## List of 2
## $ weights: num [1:3, 1:214] 0.00806 0.00491 0.00319 0.00403 0.00491 ...
## $ m : num [1:3, 1:214] 1 1 1 0 1 1 1 2 2 1 ...
##
## Performance of Boostr object on Learning set:
## $oobErr
## [1] 0.08879
##
## $oobConfMat
## oobResponse
## oobPreds 1 2 3 5 6 7
## 1 63 5 1 0 0 1
## 2 6 70 2 0 1 0
## 3 1 1 14 0 0 0
## 5 0 0 0 13 1 0
## 6 0 0 0 0 7 0
## 7 0 0 0 0 0 28
##
## $errVec
## [1] 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
## [36] 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [71] 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
## [106] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
## [141] 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [176] 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [211] 0 0 0 0
In boostr
lists are the de-facto data-handlers. So to make sure the boostr
interface, boostr::boost
, passing the right information to other functions, make sure you encapsulate things in named lists. In the example above, we want to make sure our svm received the arguments formula=formula(Type~.)
and cost=100
so we put them in a named list, called .trainArgs
, and put that in a named list called .procArgs
. The naming convention in boostr
may seem a bit odd, but the rationale is a list named .xyzArgs
will pass its named arguments to the xyz
variable in the encapsulating list or function. Hence, our procedure x
is a list with named entry train
, so we use .trainArgs
, in .procArgs
to pass arguments to the train
component of proc
(x
). Since this may seem a bit weird, let’s look at this exact same situation, but without the convenience function:
set.seed(1234)
boostedSVM2 <-
boostr::boost(x = list(train=e1071::svm),
B = 3,
reweighter = boostr::arcx4Reweighter,
aggregator = boostr::arcx4Aggregator,
data = Glass,
.procArgs = list(
.trainArgs=list(
formula=formula(Type~.),
cost=100)),
.boostBackendArgs = list(
.reweighterArgs=list(m=0)))
boostedSVM2
## A boostr object composed of 3 estimators.
##
## Available performance metrics: oobErr, oobConfMat, errVec
##
## Structure of reweighter output:
## List of 2
## $ weights: num [1:3, 1:214] 0.00806 0.00491 0.00319 0.00403 0.00491 ...
## $ m : num [1:3, 1:214] 1 1 1 0 1 1 1 2 2 1 ...
##
## Performance of Boostr object on Learning set:
## $oobErr
## [1] 0.08879
##
## $oobConfMat
## oobResponse
## oobPreds 1 2 3 5 6 7
## 1 63 5 1 0 0 1
## 2 6 70 2 0 1 0
## 3 1 1 14 0 0 0
## 5 0 0 0 13 1 0
## 6 0 0 0 0 7 0
## 7 0 0 0 0 0 28
##
## $errVec
## [1] 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
## [36] 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [71] 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
## [106] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
## [141] 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [176] 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [211] 0 0 0 0
identical(boostr::reweighterOutput(boostedSVM1),
boostr::reweighterOutput(boostedSVM2))
## [1] TRUE
But this was micky mouse-type stuff: boostr
already implemented this algorithm for you. What’s really cool about boostr
isn’t the implemented algorithms, its the total modularity. Check out doc for boostr::boost
(the package interface) and the extended vignette for more information.