mlr3proba

Package website: release | dev

Probabilistic Supervised Learning for mlr3.

tic cran checks

CRAN Status Badge codecov StackOverflow Dependencies

What is mlr3proba ?

mlr3proba is a machine learning toolkit for making probabilistic predictions within the mlr3 ecosystem. It currently supports the following tasks:

Key features of mlr3proba are

mlr3proba makes use of the distr6 probability distribution interface as its probabilistic predictive return type.

Feature Overview

The current mlr3proba release focuses on survival analysis, and contains:

Roadmap

The vision of mlr3proba is to provide comprehensive machine learning functionality to the mlr3 ecosystem for continuous probabilistic return types.

The lifecycle of the survival task and features are considered maturing and any major changes are unlikely.

The density and probabilistic supervised regression tasks are currently in the early stages of development. Task frameworks have been drawn up, but may not be stable; learners need to be interfaced, and contributions are very welcome (see issues).

Installation

Install the last release from CRAN:

install.packages("mlr3proba")

Install the development version from GitHub:

remotes::install_github("mlr-org/mlr3proba")

Survival Analysis

Survival Learners

Learners are located either in mlr3proba, the mlr3learners repository, or the mlr3learners organisation. See here for instructions in how to install learners from the mlr3learners organisation.

ID Learner Package
surv.akritas Akritas Conditional Non-Parametric Estimator mlr3learners.proba
surv.blackboost Gradient Boosting with Regression Trees mboost
surv.coxboost Cox Model with Likelihood Based Boosting CoxBoost
surv.coxph Cox Proportional Hazards survival
surv.cvcoxboost Cox Model with Cross-Validation Likelihood Based Boosting CoxBoost
surv.cvglmnet Cross-Validated GLM with Elastic Net Regularization glmnet
surv.flexible Flexible Parametric Spline Models flexsurv
surv.gamboost Gradient Boosting for Additive Models mboost
surv.gbm Generalized Boosting Regression Modeling gbm
surv.glmboost Gradient Boosting with Component-wise Linear Models mboost
surv.glmnet GLM with Elastic Net Regularization glmnet
surv.kaplan Kaplan-Meier Estimator survival
surv.mboost Gradient Boosting for Generalized Additive Models mboost
surv.nelson Nelson-Aalen Estimator survival
surv.parametric Fully Parametric Survival Models survival
surv.penalized L1 and L2 Penalized Estimation in GLMs penalized
surv.randomForestSRC RandomForestSRC Survival Forest randomForestSRC
surv.ranger Ranger Survival Forest ranger
surv.rpart Rpart Survival Forest rpart
surv.svm Regression, Ranking and Hybrid Support Vector Machines survivalsvm
surv.xgboost Cox Model with Gradient Boosting Trees xgboost

Survival Measures

ID Measure Package
surv.calib_alpha van Houwelingen’s Alpha Calibration mlr3proba
surv.calib_beta van Houwelingen’s Beta Calibration mlr3proba
surv.chambless_auc Chambless and Diao’s AUC survAUC
surv.graf Integrated Graf Score mlr3proba
surv.hungAUC Hung and Chiang’s AUC survAUC
surv.intlogloss Integrated Log Loss mlr3proba
surv.logloss Log Loss mlr3proba
surv.nagelk_r2 Nagelkerke’s R2 survAUC
surv.oquigley_r2 O’Quigley, Xu, and Stare’s R2 survAUC
surv.song_auc Song and Zhou’s AUC survAUC
surv.song_tnr Song and Zhou’s TNR survAUC
surv.song_tpr Song and Zhou’s TPR survAUC
surv.uno_auc Uno’s AUC survAUC
surv.uno_tnr Uno’s TNR survAUC
surv.uno_tpr Uno’s TPR survAUC
surv.xu_r2 Xu and O’Quigley’s R2 survAUC

Density Estimation

Density Learners

Learners are located either in mlr3proba, the mlr3learners repository, or the mlr3learners organisation. See here for instructions in how to install learners from the mlr3learners organisation.

ID Learner Package
dens.hist Univariate Histogram Density Estimator graphics
dens.kde Univariate KDE for Different Kernels distr6
dens.kdeKD Nonparametric KDE Using Plug-in Method of Polansky and Baker kerdiest
dens.kdeKS Nonparametric Gaussian KDE ks
dens.locfit Nonparametric KDE Using Gaussian kernel locfit
dens.logspline Logspline Method for Density Estimation logspline
dens.mixed KDE Using Li and Racine Bandwidth Specification np
dens.nonpar Nonparametric KDE Using Normal Optimal Smoothing Parameter sm
dens.pen Density Estimation with a Penalized Mixture pendensity
dens.plug Density Estimation with Iterative Plug-in Bandwidth Selection plugdensity
dens.spline Density Estimation Using Smoothing Spline ANOVA gss

Density Measures

ID Measure Package
dens.logloss Log Loss mlr3proba

Near-Future Plans

Bugs, Questions, Feedback

mlr3proba is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an “issue” about it on the GitHub page!

In case of problems / bugs, it is often helpful if you provide a “minimum working example” that showcases the behaviour (but don’t worry about this if the bug is obvious).

Similar Projects

Predecessors to this package are previous instances of survival modelling in mlr. The skpro package in the python/scikit-learn ecosystem follows a similar interface for probabilistic supervised learning and is an architectural predecessor. Several packages exist which allow probabilistic predictive modelling with a Bayesian model specific general interface, such as jags and stan. For implementation of a few survival models and measures, a central package is survival. There does not appear to be a package that provides an architectural framework for distribution/density estimation, see this list for a review of density estimation packages in R.

Acknowledgements

Several people contributed to the building of mlr3proba. Firstly, thanks to Michel Lang for writing mlr3survival. Several learners and measures implemented in mlr3proba, as well as the prediction, task, and measure surv objects, were written initially in mlr3survival before being absorbed into mlr3proba. Secondly thanks to Franz Kiraly for major contributions towards the design of the proba-specific parts of the package, including compositors and predict types. Also for mathematical contributions towards the scoring rules implemented in the package. Finally thanks to Bernd Bischl and the rest of the mlr core team for building mlr3 and for many conversations about the design of mlr3proba.