Estimation Procedures

A few examples

At a high level, an estimation procedure is any black-box algorithm that learns from some data and spits out an estimator – some function that can take data and return estimates. This may seem a bit convoluted, so lets look at two prototypical examples with \(k\)-NN and svm.

kNN_EstimationProcedure <- function(k, learningSet) {
  learningInput <- learningSet[, -1]
  learningResponse <- learningSet[, 1]
  
  function(newdata) {
    class::knn(train = learningInput,
               cl = learningResponse,
               k = k,
               test = newdata)  
  }
}

library(mlbench)
data(Glass)

# train estimator on Glass dataset
kNN_Estimator <- kNN_EstimationProcedure(5, Glass[,10:1])

# predict first 10 observations of Glass data
kNN_Estimator(Glass[1:10, 9:1])

##  [1] 1 1 2 1 1 2 1 1 2 1
## Levels: 1 2 3 5 6 7

table(kNN_Estimator(Glass[1:10, 9:1]), Glass[1:10,10])

##    
##     1 2 3 5 6 7
##   1 6 0 0 0 0 0
##   2 3 0 0 0 0 0
##   3 1 0 0 0 0 0
##   5 0 0 0 0 0 0
##   6 0 0 0 0 0 0
##   7 0 0 0 0 0 0

svm_EstimationProcedure <- function(formula, cost, data) {
  model <- e1071::svm(formula, cost=cost, data=data)  
  function(newdata) {
    predict(model, newdata=newdata)
  }
}

# train estimator on Glass dataset
svm_Estimator <- svm_EstimationProcedure(formula(Type ~ .), 100, Glass)

# predict first 10 observations of Glass data
svm_Estimator(Glass[1:10,-10])

##  1  2  3  4  5  6  7  8  9 10 
##  1  1  2  2  1  1  1  1  1  1 
## Levels: 1 2 3 5 6 7

table(svm_Estimator(Glass[1:10,-10]), Glass[1:10,10])

##    
##     1 2 3 5 6 7
##   1 8 0 0 0 0 0
##   2 2 0 0 0 0 0
##   3 0 0 0 0 0 0
##   5 0 0 0 0 0 0
##   6 0 0 0 0 0 0
##   7 0 0 0 0 0 0

Now you may be thinking, “what’s the big deal here?” kNN_EstimationProcedure is just a wrapper around class::knn. To that I would say, “keen eye, my friend” – I’ll address that in a moment; however, things get a bit more interesting with svm_EstimationProcedure where we see that our function involves a “training” step – the call to e1071::svm – and then returns a function (a closure) that has access to the trained model, to perform prediction. Since an estimation procedure is supposed to be the thing that trains the model we want to make estimates from, it’s very reasonable to consider e1071::svm an estimation procedure. However, it would be incorrect to consider predict, by itself an estimator. Really, the wrapper around predict that gives it access to the model built by e1071::svm is the estimator, since this is the object we use to generate estimates.

Now, back to the \(k\)-NN example. How is this a demonstration of the estimation procedure-estimator setup we’re trying to cultivate? Well, in this particular instance, the \(k\)-NN algorithm doesn’t have a dedicated “training” step. The model built in the \(k\)-NN algorithm is the learning set. Thus, \(k\)-NN can skip the model building step we saw in the svm example and go straight to the prediction step. Hence, our estimation procedure is just a wrapper around class::knn that makes sure we’re using the learning set.

Mathematical definition of an estimation procedure

For those of you who are more mathematically inclined, you can think of estimation procedures in the following way: suppose you had a learning set \(\mathcal{L}_n = \left\{(x_1, y_1), \ldots, (x_n, y_n)\right\}\) of \(n\) observations \((x_i, y_i)\) where \(x_i \in \mathcal{X} \subseteq \mathbb{R}^{J}\) and \(y_i \in \mathcal{Y} \subseteq \mathbb{R}\) and mapping, \(\widehat{\Psi} : \mathcal{L}_n \to \left\{f \mid f : \mathcal{X} \to \mathcal{Y}\right\}\). We call \(\widehat{\Psi}\) an estimation procedure and the function \(\psi_n = \widehat{\Psi}(\mathcal{L}_n)\) an estimator. Note that since we’re in the world of probability and statistics, the \(x\)’s and \(y\)’s are realizations of random variables, and so for a fixed \(n\), your learning set, \(\mathcal{L}_n\) is also a realization of a random object. Hence, the estimation procedure is actually a function on the space of learning sets.

Technicalities aside, the most profitable way of thinking about estimation procedures (\(\widehat{\Psi}\)) is that they are black-box algorithms that spit out functions (\(\psi_n\)) which can take predictors like \(x_i\) and spit out predictions, \(\hat{y}_i\).

Estimation procedures in `boostr`

This is all well and good, but how does this apply to you, the boostr user? Well, boostr lets you use your own estimation procedures in boostr::boost. However, to do so, boostr::boost needs to make sure the object you’re claiming to be an estimation procedure is, infact, an estimation procedure.

A priori, boostr assumes that all estimation procedures:

have the signature equivalent to (data, ...) where data represents the learning set \(\mathcal{L}_n\),
return a function with signature equivalent to (newdata, ...), where newdata represents the \(x\)’s whose \(y\)’s are to be predicted, and
inherit from the class estimationProcedure.

The last detail is just a minor detail; the first two requirements are more important. Basically, if you can rewrite the signature of your estimation procedure’s signature to match (data, ...), and it’s output’s signature to match (newdata, ...), boostr::boost can Boost it. However, boostr::boost doesn’t do this with black-magic, it needs to know information about your estimation procedure. Specifically, boostr::boost has an argument, metadata, which is a named list of arguments to pass to boostr Wrapper Generators written for the express purpose of taking things like your estimation procedure, and creating objects whose signatures and output are compatible inside boostr.

Table of `boostr`’s Wrapper Generators
Wrapper Generators
wrapProcedure
buildEstimationProcedure
wrapReweighter
wrapAggregator
WrapPerformanceAnalyzer

For estimation procedures, the relevant Wrapper Generators are boostr::wrapProcedure and boostr::buildEstimationProcedure – when boostr::boost calls them depends entire on the x argument to boostr::boost. Ignoring this caveat for a moment, let’s consider what we would have to do turn kNN_EstimationProcedure in the \(k\)-NN example into a boostr-compatible estimation procedure. First, its signature is (k, learningSet), so we’d want a wrapper function(data, ...) where data corresponds to learningSet and then have ... take care of k. boostr can build this for you, if you include the entry learningSet="learningInput" in the metadata entry of boostr::boost and pass the value of k in as a named entry in .procArgs – see this example where kNN_EstimationProcedure is boosted according to the arc-x4 algorithm. Since we’re wrapping a whole procedure, and not a closure that combines the train-predict pattern (like in the svm example), the metadata arguments we’ll want to use are the arguments corresponding to boostr::wrapProcedure. See the help page for the details on boostr::wrapProcedure’s signature.

boostr::boostWithArcX4(x = kNN_EstimationProcedure,
                       B = 3,
                       data = Glass,
                       metadata = list(learningSet="learningSet"),
                       .procArgs = list(k=5),
                       .boostBackendArgs = list(
                         .subsetFormula=formula(Type~.))
                       )

## Warning: Walker's alias method used: results are different from R < 2.2.0

## A boostr object composed of 3 estimators.
## 
## Available performance metrics: oobErr, oobConfMat, errVec
## 
## Structure of reweighter output:
## List of 2
##  $ weights: num [1:3, 1:214] 0.003663 0.001497 0.000634 0.007326 0.002994 ...
##  $ m      : num [1:3, 1:214] 0 0 0 1 1 2 1 1 1 0 ...
##  
## Performance of Boostr object on Learning set:
## $oobErr
## [1] 0.271
## 
## $oobConfMat
##         oobResponse
## oobPreds  1  2  3  5  6  7
##        1 51 14  8  0  0  0
##        2 11 56  1  2  2  2
##        3  8  2  8  0  0  1
##        5  0  3  0  9  0  0
##        6  0  1  0  0  7  1
##        7  0  0  0  2  0 25
## 
## $errVec
##   [1] 0 0 0 0 1 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0
##  [36] 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0
##  [71] 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 1 0 0 1 0 0
## [106] 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 0 0 0
## [141] 0 1 1 0 1 1 0 0 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0
## [176] 0 0 0 0 0 0 0 1 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
## [211] 0 0 0 0

Estimation procedures like svm_EstimationProcedure above, are so common in R, boostr implements a Wrapper Generator, boostr::buildEstimationProcedure explicitly for this design-pattern. Hence you can skip passing a function to the x argument of boostr::boost and just pass in a list of the form list(train=someFun, predict=someOtherFun). If you do this, the structure of the .procArgs argument changes to a list of lists. See this example where an svm is boosted according to arc-x4, and the list-style argument to x is used. Note, the structure of .procArgs is now list(.trainArgs=list(...), .predictArgs=list(...)) where .trainArgs are named arguments to pass to the train component of x and .predictArgs are the named components to pass to the predict component of x. See the help documention for boostr::buildEstimationProcedure for more information.

boostr::boostWithArcX4(x = list(train = e1071::svm),
                       B = 3,
                       data = Glass,
                       .procArgs = list(
                         .trainArgs=list(
                           formula=formula(Type~.),
                           cost=100
                           )
                         )
                       )

## A boostr object composed of 3 estimators.
## 
## Available performance metrics: oobErr, oobConfMat, errVec
## 
## Structure of reweighter output:
## List of 2
##  $ weights: num [1:3, 1:214] 0.00803 0.00509 0.00294 0.00402 0.00509 ...
##  $ m      : num [1:3, 1:214] 1 1 1 0 1 1 1 2 2 1 ...
##  
## Performance of Boostr object on Learning set:
## $oobErr
## [1] 0.1121
## 
## $oobConfMat
##         oobResponse
## oobPreds  1  2  3  5  6  7
##        1 62  8  3  0  0  1
##        2  5 67  2  0  1  0
##        3  3  1 12  0  0  0
##        5  0  0  0 13  0  0
##        6  0  0  0  0  8  0
##        7  0  0  0  0  0 28
## 
## $errVec
##   [1] 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
##  [36] 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [71] 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0
## [106] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
## [141] 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## [176] 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [211] 0 0 0 0

Reweighters

Motivation

The whole idea behind Boosting is to adaptively resample observations from the learning set, and train estimators on these (weighted) samples of learning set observations. Specifically, we want to be able to take the performance of a particular estimator and the weights we used to draw the set it was trained on, and come up with new weights. The formal mechanism for doing this is a “reweighter”. That is, a reweighter looks at the weights an estimator was trained on and its performance on the original learning set, and spits out a new set of weights, suggesting where we may want to focus more attention during the training of our next estimator. (It may return addition input, but let’s not get ahead of ourselves.)

Examples

boostr implements a few classic reweighters out of the box: boostr::arcfsReweighter, boostr::arcx4Reweighter, boostr::adaboostReweighter, and boostr::vanillaBagger.

boostr::arcx4Reweighter

## function(prediction, response, weights, m, ...) {
##   d <- as.numeric(prediction != response)
##   
##   new_m <- m + d
##   weights <- (1 + new_m^4) / sum( 1 + new_m^4 )
##   
##   list(weights=weights, m=new_m)
## }
## <environment: namespace:boostr>
## attr(,"class")
## [1] "reweighter" "function"

Reweighters in `boostr`

You’ll notice that all the implemented reweighters in boostr have the followign in common:

Their signatures are of the form (prediction, response, weights, ...); in this signature, prediction represents an estimator’s prediction (vector), response represents the true response (comes from the learning set) and weights is the weight associated to the observation in response. Hence, all three arguments are meant to be vectors of the same length.
They output named lists that contain an entry named weights, and
They inherit from the class reweighter.

These are the requirements for any function to be compatible inside boostr. Hence, to use your own reweighter in boostr::boost you can either write a function from scratch that satistifies these requirements, or if you have one already pre-implemented you can let boostr::boost build a wrapper around it using boostr::wrapReweighter. This is done by passing the appropriately named arguments to boostr::wrapReweighter through boostr::boost’s metadata argument. See the example where we Boost an svm with a (rather silly) reweighter that permutes weights.

exoticReweighter <- function(wts, truth, preds) {
  permutedWts <- sample(wts)
  list(wts=permutedWts)
}

boostr::boost(x = list(train=e1071::svm), B = 3,
              initialWeights = seq.int(nrow(Glass)),
              reweighter = exoticReweighter,
              aggregator = boostr::vanillaAggregator,
              data = Glass,
              .procArgs = list(
                .trainArgs=list(
                  formula=formula(Type~.),
                  cost=100)),
              metadata = list(
                reweighterInputPreds="preds",
                reweighterInputResponse="truth",
                reweighterInputWts="wts",
                reweighterOutputWts="wts")
              )

## A boostr object composed of 3 estimators.
## 
## Available performance metrics: oobErr, oobConfMat, errVec
## 
## Structure of reweighter output:
## List of 1
##  $ weights: int [1:3, 1:214] 200 80 72 137 56 60 149 92 2 102 ...
##  
## Performance of Boostr object on Learning set:
## $oobErr
## [1] 0.1449
## 
## $oobConfMat
##         oobResponse
## oobPreds  1  2  3  5  6  7
##        1 54  5  1  0  0  0
##        2 14 67  2  2  0  0
##        3  2  2 14  0  0  0
##        5  0  0  0 11  1  0
##        6  0  1  0  0  8  0
##        7  0  1  0  0  0 29
## 
## $errVec
##   [1] 0 0 1 1 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0
##  [36] 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0
##  [71] 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0
## [106] 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
## [141] 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1
## [176] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [211] 0 0 0 0

Aggregators

Motivation

Once we’re done building all these estimators, we’re going to want to get a single estimate out of them. After all, you didn’t have to go through all the trouble of downloading this package if all you wanted was a cacophony of estimates. This is where aggregators come in; aggregators take your ensemble of estimators and returns a single, aggregated, estimator.

Examples

boostr implements a few classic aggregators out of the box: boostr::arcfsAggregator, boostr::arcx4Aggregator, boostr::adaboostAggregator, boostr::weightedAggregator and boostr::vanillaAggregator.

boostr::weightedAggregator

## function(estimators, weights, ..., .parallelPredict=FALSE,
##                             .parallelTally=FALSE, .rngSeed=1234) {
##   
##   weights <- as.numeric(weights)
##   
##   function(newdata) {
##     preds <- makePredictions(estimators, newdata, .parallelPredict)
##     
##     if (typeof(preds) == "character") {
##       out <- do.call(predictClassFromWeightedVote, 
##                      list(preds=preds, weights=weights, .parallel=.parallelTally,
##                           .rngSeed=.rngSeed))
##       
##       as.factor(out)
##     } else {
##       do.call(predictResponseFromWeightedAverage,
##               list(preds=preds, weights=weights,
##                    .parralel=.parallelTally, list(...)))
##     }
##   }
## }
## <environment: namespace:boostr>
## attr(,"class")
## [1] "aggregator" "function"

Aggregators in `boostr`

You’ll notice that all the implemented aggregators in boostr have the following in common:

Their signatures have the form (estimators, ...), where estimators represents an ensemble of estimators,
They return a function of a single argument newdata, and
They inherit from the class aggregator.

These are the requirements for any function to be compatible inside boostr. Note that the ...’s are necessary for an aggregator since boostr::boostBackend pipes the (named) reweighter ouput to the aggregator, so this allows aggregators to ignore irrelevant reweighter output. Like with reweighters, you can use your own aggregator by letting boostr::boost build a wrapper using boostr::wrapAggregator. See below for an example where we Boost an svm with a contrived aggregator that only considers the second estimator. Consult boostr::wrapAggregator’s help documentation for the details on the arguments you need to pass to metadata to properly wrap your aggregator.

exoticAggr <- function(ensemble, estimator) {
  f <- ensemble[[estimator]]
  function(newdata) f(newdata)
}

boostr::boost(x = list(train = e1071::svm), B = 3,
              aggregator = exoticAggr,
              reweighter = boostr::arcfsReweighter,
              data = Glass,
              .procArgs = list(
                .trainArgs=list(
                  formula=formula(Type~.),
                  cost=100)),
              metadata = list(.inputEnsemble = "ensemble"),
              .boostBackendArgs = list(
                .aggregatorArgs = list(estimator = 2))
              )

## A boostr object composed of 3 estimators.
## 
## Available performance metrics: oobErr, oobConfMat, errVec
## 
## Structure of reweighter output:
## List of 2
##  $ weights: num [1:3, 1:214] 0.01429 0.00854 0.00505 0.00279 0.00167 ...
##  $ beta   : num [1:3, 1] 5.11 5.1 5.49
##  
## Performance of Boostr object on Learning set:
## $oobErr
## [1] 0.1589
## 
## $oobConfMat
##         oobResponse
## oobPreds  1  2  3  5  6  7
##        1 56  7  3  0  0  0
##        2 10 63  0  1  1  2
##        3  4  2 14  0  0  0
##        5  0  3  0 12  0  0
##        6  0  1  0  0  8  0
##        7  0  0  0  0  0 27
## 
## $errVec
##   [1] 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0
##  [36] 1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
##  [71] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0
## [106] 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 0 0
## [141] 0 0 1 0 1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
## [176] 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
## [211] 0 0 0 0

Performance Analyzers

The idea of a performance analyzer isn’t really specific to boosting, or estimation, for that matter. These functions are just routines called once a new estimator has been trained to calculate some performance statistics of the estimator. The default performance analyzer is boostr::defaultOOBPerformanceAnalysis which calculates the out-of-bag performance of an estimator.

boostr::defaultOOBPerformanceAnalysis

## function(prediction, response, oobObs) {
##   
##   n <- nrow(prediction)
##   oobPreds <- prediction[oobObs]
##   oobResponse <- response[oobObs]
##   
##   if (class(oobResponse) %in% c("factor", "character")) {
##     oobPreds <- as.character(oobPreds)
##     oobResponse <- as.character(oobResponse)
##     
##     errVec <- rep.int(NA, length(prediction))
##     errVec[oobObs] <- as.numeric(oobPreds != oobResponse)
##     
##     oobConfMat <- table(oobPreds, oobResponse)
##     oobErr <- mean(errVec, na.rm=TRUE)
##     
##     list(oobErr=oobErr, oobConfMat=oobConfMat, errVec=errVec)
##     
##   } else {
## 
##     resVec <- rep.int(NA, n)
##     resVec[oobObs] <- oobPreds - oobResponse
##     
##     oobMSE <- mean(resVec^2, na.rm=TRUE)
##     
##     list(oobMSE=oobMSE, resVec=resVec)
##   }
## }
## <environment: namespace:boostr>
## attr(,"class")
## [1] "performanceAnalyzer" "function"

The only requirements that a boostr compatible performance analyzer meet is that

Its signature include arguments prediction, response, and oobObs, and
It inherits from the performanceAnalyzer class.

Any of its output is (appropriately) organized in the estimatorPerformance atrribute of the boostr object returned from boostr::boost. To pass any additional arguments to a performance analyzer, put .analyzePerformanceArgs = list(...) inside the .boostBackendArgs args of boostr::boost.

Estimation Procedures, Reweighters, Aggregators, and Performance Analyzers

Steven Pollack

May 16, 2014

Introduction

Estimation Procedures

A few examples

Mathematical definition of an estimation procedure

Estimation procedures in `boostr`

Reweighters

Motivation

Examples

Reweighters in `boostr`

Aggregators

Motivation

Examples

Aggregators in `boostr`

Performance Analyzers

Estimation Procedures, Reweighters, Aggregators, and Performance Analyzers

Steven Pollack

May 16, 2014

Introduction

Estimation Procedures

A few examples

Mathematical definition of an estimation procedure

Estimation procedures in boostr

Reweighters

Motivation

Examples

Reweighters in boostr

Aggregators

Motivation

Examples

Aggregators in boostr

Performance Analyzers

Estimation procedures in `boostr`

Reweighters in `boostr`

Aggregators in `boostr`