An unfortunate reality about the boostr
framework is that it’s a bit jargon heavy. To take full advantage of the modularity behind boostr
you’ll want to understand the following terms: “estimation procedure”, “reweighter”, “aggregator”, and “performance analyzer”.
This document will define each term and give examples. While the definitions stand on their own, certain examples will build off each other, so be warned!
At a high level, an estimation procedure is any black-box algorithm that learns from some data and spits out an estimator – some function that can take data and return estimates. This may seem a bit convoluted, so lets look at two prototypical examples with \(k\)-NN and svm.
kNN_EstimationProcedure <- function(k, learningSet) {
learningInput <- learningSet[, -1]
learningResponse <- learningSet[, 1]
function(newdata) {
class::knn(train = learningInput,
cl = learningResponse,
k = k,
test = newdata)
}
}
library(mlbench)
data(Glass)
# train estimator on Glass dataset
kNN_Estimator <- kNN_EstimationProcedure(5, Glass[,10:1])
# predict first 10 observations of Glass data
kNN_Estimator(Glass[1:10, 9:1])
## [1] 1 1 2 1 1 2 1 1 2 1
## Levels: 1 2 3 5 6 7
table(kNN_Estimator(Glass[1:10, 9:1]), Glass[1:10,10])
##
## 1 2 3 5 6 7
## 1 6 0 0 0 0 0
## 2 3 0 0 0 0 0
## 3 1 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## 7 0 0 0 0 0 0
svm_EstimationProcedure <- function(formula, cost, data) {
model <- e1071::svm(formula, cost=cost, data=data)
function(newdata) {
predict(model, newdata=newdata)
}
}
# train estimator on Glass dataset
svm_Estimator <- svm_EstimationProcedure(formula(Type ~ .), 100, Glass)
# predict first 10 observations of Glass data
svm_Estimator(Glass[1:10,-10])
## 1 2 3 4 5 6 7 8 9 10
## 1 1 2 2 1 1 1 1 1 1
## Levels: 1 2 3 5 6 7
table(svm_Estimator(Glass[1:10,-10]), Glass[1:10,10])
##
## 1 2 3 5 6 7
## 1 8 0 0 0 0 0
## 2 2 0 0 0 0 0
## 3 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## 7 0 0 0 0 0 0
Now you may be thinking, “what’s the big deal here?” kNN_EstimationProcedure
is just a wrapper around class::knn
. To that I would say, “keen eye, my friend” – I’ll address that in a moment; however, things get a bit more interesting with svm_EstimationProcedure
where we see that our function involves a “training” step – the call to e1071::svm
– and then returns a function (a closure) that has access to the trained model, to perform prediction. Since an estimation procedure is supposed to be the thing that trains the model we want to make estimates from, it’s very reasonable to consider e1071::svm
an estimation procedure. However, it would be incorrect to consider predict
, by itself an estimator. Really, the wrapper around predict
that gives it access to the model built by e1071::svm
is the estimator, since this is the object we use to generate estimates.
Now, back to the \(k\)-NN example. How is this a demonstration of the estimation procedure-estimator setup we’re trying to cultivate? Well, in this particular instance, the \(k\)-NN algorithm doesn’t have a dedicated “training” step. The model built in the \(k\)-NN algorithm is the learning set. Thus, \(k\)-NN can skip the model building step we saw in the svm example and go straight to the prediction step. Hence, our estimation procedure is just a wrapper around class::knn
that makes sure we’re using the learning set.
For those of you who are more mathematically inclined, you can think of estimation procedures in the following way: suppose you had a learning set \(\mathcal{L}_n = \left\{(x_1, y_1), \ldots, (x_n, y_n)\right\}\) of \(n\) observations \((x_i, y_i)\) where \(x_i \in \mathcal{X} \subseteq \mathbb{R}^{J}\) and \(y_i \in \mathcal{Y} \subseteq \mathbb{R}\) and mapping, \(\widehat{\Psi} : \mathcal{L}_n \to \left\{f \mid f : \mathcal{X} \to \mathcal{Y}\right\}\). We call \(\widehat{\Psi}\) an estimation procedure and the function \(\psi_n = \widehat{\Psi}(\mathcal{L}_n)\) an estimator. Note that since we’re in the world of probability and statistics, the \(x\)’s and \(y\)’s are realizations of random variables, and so for a fixed \(n\), your learning set, \(\mathcal{L}_n\) is also a realization of a random object. Hence, the estimation procedure is actually a function on the space of learning sets.
Technicalities aside, the most profitable way of thinking about estimation procedures (\(\widehat{\Psi}\)) is that they are black-box algorithms that spit out functions (\(\psi_n\)) which can take predictors like \(x_i\) and spit out predictions, \(\hat{y}_i\).
boostr
This is all well and good, but how does this apply to you, the boostr
user? Well, boostr
lets you use your own estimation procedures in boostr::boost
. However, to do so, boostr::boost
needs to make sure the object you’re claiming to be an estimation procedure is, infact, an estimation procedure.
A priori, boostr
assumes that all estimation procedures:
(data, ...)
where data
represents the learning set \(\mathcal{L}_n\),(newdata, ...)
, where newdata
represents the \(x\)’s whose \(y\)’s are to be predicted, andestimationProcedure
.The last detail is just a minor detail; the first two requirements are more important. Basically, if you can rewrite the signature of your estimation procedure’s signature to match (data, ...)
, and it’s output’s signature to match (newdata, ...)
, boostr::boost
can Boost it. However, boostr::boost
doesn’t do this with black-magic, it needs to know information about your estimation procedure. Specifically, boostr::boost
has an argument, metadata
, which is a named list of arguments to pass to boostr
Wrapper Generators written for the express purpose of taking things like your estimation procedure, and creating objects whose signatures and output are compatible inside boostr
.
Wrapper Generators |
---|
wrapProcedure |
buildEstimationProcedure |
wrapReweighter |
wrapAggregator |
WrapPerformanceAnalyzer |
For estimation procedures, the relevant Wrapper Generators are boostr::wrapProcedure
and boostr::buildEstimationProcedure
– when boostr::boost
calls them depends entire on the x
argument to boostr::boost
. Ignoring this caveat for a moment, let’s consider what we would have to do turn kNN_EstimationProcedure
in the \(k\)-NN example into a boostr
-compatible estimation procedure. First, its signature is (k, learningSet)
, so we’d want a wrapper function(data, ...)
where data
corresponds to learningSet
and then have ...
take care of k
. boostr
can build this for you, if you include the entry learningSet="learningInput"
in the metadata
entry of boostr::boost
and pass the value of k
in as a named entry in .procArgs
– see this example where kNN_EstimationProcedure
is boosted according to the arc-x4 algorithm. Since we’re wrapping a whole procedure, and not a closure that combines the train-predict pattern (like in the svm example), the metadata
arguments we’ll want to use are the arguments corresponding to boostr::wrapProcedure
. See the help page for the details on boostr::wrapProcedure
’s signature.
boostr::boostWithArcX4(x = kNN_EstimationProcedure,
B = 3,
data = Glass,
metadata = list(learningSet="learningSet"),
.procArgs = list(k=5),
.boostBackendArgs = list(
.subsetFormula=formula(Type~.))
)
## Warning: Walker's alias method used: results are different from R < 2.2.0
## A boostr object composed of 3 estimators.
##
## Available performance metrics: oobErr, oobConfMat, errVec
##
## Structure of reweighter output:
## List of 2
## $ weights: num [1:3, 1:214] 0.003663 0.001497 0.000634 0.007326 0.002994 ...
## $ m : num [1:3, 1:214] 0 0 0 1 1 2 1 1 1 0 ...
##
## Performance of Boostr object on Learning set:
## $oobErr
## [1] 0.271
##
## $oobConfMat
## oobResponse
## oobPreds 1 2 3 5 6 7
## 1 51 14 8 0 0 0
## 2 11 56 1 2 2 2
## 3 8 2 8 0 0 1
## 5 0 3 0 9 0 0
## 6 0 1 0 0 7 1
## 7 0 0 0 2 0 25
##
## $errVec
## [1] 0 0 0 0 1 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0
## [36] 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0
## [71] 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 1 0 0 1 0 0
## [106] 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 0 0 0
## [141] 0 1 1 0 1 1 0 0 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0
## [176] 0 0 0 0 0 0 0 1 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
## [211] 0 0 0 0
Estimation procedures like svm_EstimationProcedure
above, are so common in R
, boostr
implements a Wrapper Generator, boostr::buildEstimationProcedure
explicitly for this design-pattern. Hence you can skip passing a function to the x
argument of boostr::boost
and just pass in a list of the form list(train=someFun, predict=someOtherFun)
. If you do this, the structure of the .procArgs
argument changes to a list of lists. See this example where an svm is boosted according to arc-x4, and the list-style argument to x
is used. Note, the structure of .procArgs
is now list(.trainArgs=list(...), .predictArgs=list(...))
where .trainArgs
are named arguments to pass to the train
component of x
and .predictArgs
are the named components to pass to the predict
component of x
. See the help documention for boostr::buildEstimationProcedure
for more information.
boostr::boostWithArcX4(x = list(train = e1071::svm),
B = 3,
data = Glass,
.procArgs = list(
.trainArgs=list(
formula=formula(Type~.),
cost=100
)
)
)
## A boostr object composed of 3 estimators.
##
## Available performance metrics: oobErr, oobConfMat, errVec
##
## Structure of reweighter output:
## List of 2
## $ weights: num [1:3, 1:214] 0.00803 0.00509 0.00294 0.00402 0.00509 ...
## $ m : num [1:3, 1:214] 1 1 1 0 1 1 1 2 2 1 ...
##
## Performance of Boostr object on Learning set:
## $oobErr
## [1] 0.1121
##
## $oobConfMat
## oobResponse
## oobPreds 1 2 3 5 6 7
## 1 62 8 3 0 0 1
## 2 5 67 2 0 1 0
## 3 3 1 12 0 0 0
## 5 0 0 0 13 0 0
## 6 0 0 0 0 8 0
## 7 0 0 0 0 0 28
##
## $errVec
## [1] 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
## [36] 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [71] 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0
## [106] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
## [141] 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## [176] 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [211] 0 0 0 0
The whole idea behind Boosting is to adaptively resample observations from the learning set, and train estimators on these (weighted) samples of learning set observations. Specifically, we want to be able to take the performance of a particular estimator and the weights we used to draw the set it was trained on, and come up with new weights. The formal mechanism for doing this is a “reweighter”. That is, a reweighter looks at the weights an estimator was trained on and its performance on the original learning set, and spits out a new set of weights, suggesting where we may want to focus more attention during the training of our next estimator. (It may return addition input, but let’s not get ahead of ourselves.)
boostr
implements a few classic reweighters out of the box: boostr::arcfsReweighter
, boostr::arcx4Reweighter
, boostr::adaboostReweighter
, and boostr::vanillaBagger
.
boostr::arcx4Reweighter
## function(prediction, response, weights, m, ...) {
## d <- as.numeric(prediction != response)
##
## new_m <- m + d
## weights <- (1 + new_m^4) / sum( 1 + new_m^4 )
##
## list(weights=weights, m=new_m)
## }
## <environment: namespace:boostr>
## attr(,"class")
## [1] "reweighter" "function"
boostr
You’ll notice that all the implemented reweighters in boostr
have the followign in common:
(prediction, response, weights, ...)
; in this signature, prediction
represents an estimator’s prediction (vector), response
represents the true response (comes from the learning set) and weights
is the weight associated to the observation in response
. Hence, all three arguments are meant to be vectors of the same length.weights
, andreweighter
.These are the requirements for any function to be compatible inside boostr
. Hence, to use your own reweighter in boostr::boost
you can either write a function from scratch that satistifies these requirements, or if you have one already pre-implemented you can let boostr::boost
build a wrapper around it using boostr::wrapReweighter
. This is done by passing the appropriately named arguments to boostr::wrapReweighter
through boostr::boost
’s metadata
argument. See the example where we Boost an svm with a (rather silly) reweighter that permutes weights.
exoticReweighter <- function(wts, truth, preds) {
permutedWts <- sample(wts)
list(wts=permutedWts)
}
boostr::boost(x = list(train=e1071::svm), B = 3,
initialWeights = seq.int(nrow(Glass)),
reweighter = exoticReweighter,
aggregator = boostr::vanillaAggregator,
data = Glass,
.procArgs = list(
.trainArgs=list(
formula=formula(Type~.),
cost=100)),
metadata = list(
reweighterInputPreds="preds",
reweighterInputResponse="truth",
reweighterInputWts="wts",
reweighterOutputWts="wts")
)
## A boostr object composed of 3 estimators.
##
## Available performance metrics: oobErr, oobConfMat, errVec
##
## Structure of reweighter output:
## List of 1
## $ weights: int [1:3, 1:214] 200 80 72 137 56 60 149 92 2 102 ...
##
## Performance of Boostr object on Learning set:
## $oobErr
## [1] 0.1449
##
## $oobConfMat
## oobResponse
## oobPreds 1 2 3 5 6 7
## 1 54 5 1 0 0 0
## 2 14 67 2 2 0 0
## 3 2 2 14 0 0 0
## 5 0 0 0 11 1 0
## 6 0 1 0 0 8 0
## 7 0 1 0 0 0 29
##
## $errVec
## [1] 0 0 1 1 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0
## [36] 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0
## [71] 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0
## [106] 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
## [141] 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1
## [176] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [211] 0 0 0 0
Once we’re done building all these estimators, we’re going to want to get a single estimate out of them. After all, you didn’t have to go through all the trouble of downloading this package if all you wanted was a cacophony of estimates. This is where aggregators come in; aggregators take your ensemble of estimators and returns a single, aggregated, estimator.
boostr
implements a few classic aggregators out of the box: boostr::arcfsAggregator
, boostr::arcx4Aggregator
, boostr::adaboostAggregator
, boostr::weightedAggregator
and boostr::vanillaAggregator
.
boostr::weightedAggregator
## function(estimators, weights, ..., .parallelPredict=FALSE,
## .parallelTally=FALSE, .rngSeed=1234) {
##
## weights <- as.numeric(weights)
##
## function(newdata) {
## preds <- makePredictions(estimators, newdata, .parallelPredict)
##
## if (typeof(preds) == "character") {
## out <- do.call(predictClassFromWeightedVote,
## list(preds=preds, weights=weights, .parallel=.parallelTally,
## .rngSeed=.rngSeed))
##
## as.factor(out)
## } else {
## do.call(predictResponseFromWeightedAverage,
## list(preds=preds, weights=weights,
## .parralel=.parallelTally, list(...)))
## }
## }
## }
## <environment: namespace:boostr>
## attr(,"class")
## [1] "aggregator" "function"
boostr
You’ll notice that all the implemented aggregators in boostr
have the following in common:
(estimators, ...)
, where estimators
represents an ensemble of estimators,newdata
, andaggregator
.These are the requirements for any function to be compatible inside boostr
. Note that the ...
’s are necessary for an aggregator since boostr::boostBackend
pipes the (named) reweighter ouput to the aggregator, so this allows aggregators to ignore irrelevant reweighter output. Like with reweighters, you can use your own aggregator by letting boostr::boost
build a wrapper using boostr::wrapAggregator
. See below for an example where we Boost an svm with a contrived aggregator that only considers the second estimator. Consult boostr::wrapAggregator
’s help documentation for the details on the arguments you need to pass to metadata
to properly wrap your aggregator.
exoticAggr <- function(ensemble, estimator) {
f <- ensemble[[estimator]]
function(newdata) f(newdata)
}
boostr::boost(x = list(train = e1071::svm), B = 3,
aggregator = exoticAggr,
reweighter = boostr::arcfsReweighter,
data = Glass,
.procArgs = list(
.trainArgs=list(
formula=formula(Type~.),
cost=100)),
metadata = list(.inputEnsemble = "ensemble"),
.boostBackendArgs = list(
.aggregatorArgs = list(estimator = 2))
)
## A boostr object composed of 3 estimators.
##
## Available performance metrics: oobErr, oobConfMat, errVec
##
## Structure of reweighter output:
## List of 2
## $ weights: num [1:3, 1:214] 0.01429 0.00854 0.00505 0.00279 0.00167 ...
## $ beta : num [1:3, 1] 5.11 5.1 5.49
##
## Performance of Boostr object on Learning set:
## $oobErr
## [1] 0.1589
##
## $oobConfMat
## oobResponse
## oobPreds 1 2 3 5 6 7
## 1 56 7 3 0 0 0
## 2 10 63 0 1 1 2
## 3 4 2 14 0 0 0
## 5 0 3 0 12 0 0
## 6 0 1 0 0 8 0
## 7 0 0 0 0 0 27
##
## $errVec
## [1] 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0
## [36] 1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
## [71] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0
## [106] 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 0 0
## [141] 0 0 1 0 1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
## [176] 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
## [211] 0 0 0 0
The idea of a performance analyzer isn’t really specific to boosting, or estimation, for that matter. These functions are just routines called once a new estimator has been trained to calculate some performance statistics of the estimator. The default performance analyzer is boostr::defaultOOBPerformanceAnalysis
which calculates the out-of-bag performance of an estimator.
boostr::defaultOOBPerformanceAnalysis
## function(prediction, response, oobObs) {
##
## n <- nrow(prediction)
## oobPreds <- prediction[oobObs]
## oobResponse <- response[oobObs]
##
## if (class(oobResponse) %in% c("factor", "character")) {
## oobPreds <- as.character(oobPreds)
## oobResponse <- as.character(oobResponse)
##
## errVec <- rep.int(NA, length(prediction))
## errVec[oobObs] <- as.numeric(oobPreds != oobResponse)
##
## oobConfMat <- table(oobPreds, oobResponse)
## oobErr <- mean(errVec, na.rm=TRUE)
##
## list(oobErr=oobErr, oobConfMat=oobConfMat, errVec=errVec)
##
## } else {
##
## resVec <- rep.int(NA, n)
## resVec[oobObs] <- oobPreds - oobResponse
##
## oobMSE <- mean(resVec^2, na.rm=TRUE)
##
## list(oobMSE=oobMSE, resVec=resVec)
## }
## }
## <environment: namespace:boostr>
## attr(,"class")
## [1] "performanceAnalyzer" "function"
The only requirements that a boostr
compatible performance analyzer meet is that
prediction
, response
, and oobObs
, andperformanceAnalyzer
class.Any of its output is (appropriately) organized in the estimatorPerformance
atrribute of the boostr
object returned from boostr::boost
. To pass any additional arguments to a performance analyzer, put .analyzePerformanceArgs = list(...)
inside the .boostBackendArgs
args of boostr::boost
.