mlr3proba

Package website: release | dev

Probabilistic Supervised Learning for mlr3.

What is mlr3proba ?

mlr3proba is a machine learning toolkit for making probabilistic predictions within the mlr3 ecosystem. It currently supports the following tasks:

Probabilistic supervised regression - Supervised regression with a predictive distribution as the return type.
Predictive survival analysis - Survival analysis where individual predictive hazards can be queried. This is equivalent to probabilistic supervised regression with censored observations.
Unconditional distribution estimation, where the distribution is returned. Sub-cases are density estimation and unconditional survival estimation.

Key features of mlr3proba are

A unified fit/predict model interface to any probabilistic predictive model (frequentist, Bayesian, or other)
Pipeline/model composition
Task reduction strategies
Domain-agnostic evaluation workflows using task specific algorithmic performance measures.

mlr3proba makes use of the distr6 probability distribution interface as its probabilistic predictive return type.

Feature Overview

The current mlr3proba release focuses on survival analysis, and contains:

Task frameworks for survival analysis (TaskSurv)
A comprehensive selection of 17 predictive survival learners
A comprehensive selection of 21 performance measures for predictive survival learners, with respect to prognostic index (continuous rank) prediction, and probabilistic (distribution) prediction
PipeOps integrated with mlr3pipelines, for basic pipeline building, and reduction/composition strategies using linear predictors and baseline hazards.

Roadmap

The vision of mlr3proba is to provide comprehensive machine learning functionality to the mlr3 ecosystem for continuous probabilistic return types.

The lifecycle of the survival task and features are considered maturing and any major changes are unlikely.

The density and probabilistic supervised regression tasks are currently in the early stages of development. Task frameworks have been drawn up, but may not be stable; learners need to be interfaced, and contributions are very welcome (see issues).

Installation

Install the last release from CRAN:

install.packages("mlr3proba")

Install the development version from GitHub:

remotes::install_github("mlr-org/mlr3proba")

Survival Analysis

Survival Learners

Learners are located either in mlr3proba, the mlr3learners repository, or the mlr3learners organisation. See here for instructions in how to install learners from the mlr3learners organisation.

ID	Learner	Package
surv.akritas	Akritas Conditional Non-Parametric Estimator	mlr3learners.proba
surv.blackboost	Gradient Boosting with Regression Trees	mboost
surv.coxboost	Cox Model with Likelihood Based Boosting	CoxBoost
surv.coxph	Cox Proportional Hazards	survival
surv.cvcoxboost	Cox Model with Cross-Validation Likelihood Based Boosting	CoxBoost
surv.cvglmnet	Cross-Validated GLM with Elastic Net Regularization	glmnet
surv.flexible	Flexible Parametric Spline Models	flexsurv
surv.gamboost	Gradient Boosting for Additive Models	mboost
surv.gbm	Generalized Boosting Regression Modeling	gbm
surv.glmboost	Gradient Boosting with Component-wise Linear Models	mboost
surv.glmnet	GLM with Elastic Net Regularization	glmnet
surv.kaplan	Kaplan-Meier Estimator	survival
surv.mboost	Gradient Boosting for Generalized Additive Models	mboost
surv.nelson	Nelson-Aalen Estimator	survival
surv.parametric	Fully Parametric Survival Models	survival
surv.penalized	L1 and L2 Penalized Estimation in GLMs	penalized
surv.randomForestSRC	RandomForestSRC Survival Forest	randomForestSRC
surv.ranger	Ranger Survival Forest	ranger
surv.rpart	Rpart Survival Forest	rpart
surv.svm	Regression, Ranking and Hybrid Support Vector Machines	survivalsvm
surv.xgboost	Cox Model with Gradient Boosting Trees	xgboost

Survival Measures

ID	Measure	Package
surv.calib_alpha	van Houwelingen’s Alpha Calibration	mlr3proba
surv.calib_beta	van Houwelingen’s Beta Calibration	mlr3proba
surv.chambless_auc	Chambless and Diao’s AUC	survAUC
surv.graf	Integrated Graf Score	mlr3proba
surv.hungAUC	Hung and Chiang’s AUC	survAUC
surv.intlogloss	Integrated Log Loss	mlr3proba
surv.logloss	Log Loss	mlr3proba
surv.nagelk_r2	Nagelkerke’s R2	survAUC
surv.oquigley_r2	O’Quigley, Xu, and Stare’s R2	survAUC
surv.song_auc	Song and Zhou’s AUC	survAUC
surv.song_tnr	Song and Zhou’s TNR	survAUC
surv.song_tpr	Song and Zhou’s TPR	survAUC
surv.uno_auc	Uno’s AUC	survAUC
surv.uno_tnr	Uno’s TNR	survAUC
surv.uno_tpr	Uno’s TPR	survAUC
surv.xu_r2	Xu and O’Quigley’s R2	survAUC

Density Estimation

Density Learners

Learners are located either in mlr3proba, the mlr3learners repository, or the mlr3learners organisation. See here for instructions in how to install learners from the mlr3learners organisation.

ID	Learner	Package
dens.hist	Univariate Histogram Density Estimator	graphics
dens.kde	Univariate KDE for Different Kernels	distr6
dens.kdeKD	Nonparametric KDE Using Plug-in Method of Polansky and Baker	kerdiest
dens.kdeKS	Nonparametric Gaussian KDE	ks
dens.locfit	Nonparametric KDE Using Gaussian kernel	locfit
dens.logspline	Logspline Method for Density Estimation	logspline
dens.mixed	KDE Using Li and Racine Bandwidth Specification	np
dens.nonpar	Nonparametric KDE Using Normal Optimal Smoothing Parameter	sm
dens.pen	Density Estimation with a Penalized Mixture	pendensity
dens.plug	Density Estimation with Iterative Plug-in Bandwidth Selection	plugdensity
dens.spline	Density Estimation Using Smoothing Spline ANOVA	gss

Density Measures

ID	Measure	Package
dens.logloss	Log Loss	mlr3proba

Near-Future Plans

Add prob predict type to TaskRegr, and associated learners/measures
Allow MeasureSurv to return measures at multiple time-points simultaneously
Continue to add survival measures and learners

Bugs, Questions, Feedback

mlr3proba is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an “issue” about it on the GitHub page!

In case of problems / bugs, it is often helpful if you provide a “minimum working example” that showcases the behaviour (but don’t worry about this if the bug is obvious).

Similar Projects

Predecessors to this package are previous instances of survival modelling in mlr. The skpro package in the python/scikit-learn ecosystem follows a similar interface for probabilistic supervised learning and is an architectural predecessor. Several packages exist which allow probabilistic predictive modelling with a Bayesian model specific general interface, such as jags and stan. For implementation of a few survival models and measures, a central package is survival. There does not appear to be a package that provides an architectural framework for distribution/density estimation, see this list for a review of density estimation packages in R.

Acknowledgements

Several people contributed to the building of mlr3proba. Firstly, thanks to Michel Lang for writing mlr3survival. Several learners and measures implemented in mlr3proba, as well as the prediction, task, and measure surv objects, were written initially in mlr3survival before being absorbed into mlr3proba. Secondly thanks to Franz Kiraly for major contributions towards the design of the proba-specific parts of the package, including compositors and predict types. Also for mathematical contributions towards the scoring rules implemented in the package. Finally thanks to Bernd Bischl and the rest of the mlr core team for building mlr3 and for many conversations about the design of mlr3proba.