The modelStudio()
function computes various (instance and dataset level) model explanations and produces an interactive, customisable dashboard. It consists of multiple panels for plots with their short descriptions. Easily save and share the HTML dashboard with others. Tools for model exploration unite with tools for EDA (Exploratory Data Analysis) to give a broad overview of the model behavior.
Let’s use HR
dataset to explore modelStudio
parameters:
train <- DALEX::HR
train$fired <- as.factor(ifelse(train$status == "fired", 1, 0))
train$status <- NULL
head(train)
gender | age | hours | evaluation | salary | fired |
---|---|---|---|---|---|
male | 32.58 | 41.89 | 3 | 1 | 1 |
female | 41.21 | 36.34 | 2 | 5 | 1 |
male | 37.71 | 36.82 | 3 | 0 | 1 |
female | 30.06 | 38.96 | 3 | 2 | 1 |
male | 21.10 | 62.15 | 5 | 3 | 0 |
male | 40.12 | 69.54 | 2 | 0 | 1 |
Prepare HR_test
data and a ranger
model for the explainer:
# fit a ranger model
library("ranger")
model <- ranger(fired ~., data = train, probability = TRUE)
# prepare validation dataset
test <- DALEX::HR_test[1:1000,]
test$fired <- ifelse(test$status == "fired", 1, 0)
test$status <- NULL
# create an explainer for the model
explainer <- DALEX::explain(model,
data = test,
y = test$fired)
# start modelStudio
library("modelStudio")
Pass data points to the new_observation
parameter for instance explanations such as Break Down, Shapley Values and Ceteris Paribus Profiles. Use new_observation_y
to show their true labels.
Achieve bigger or smaller modelStudio
grid with facet_dim
parameter.
Manipulate time
parameter to set animation length. Value 0 will make them invisible.
N
is a number of observations used for calculation of Partial Dependence and Accumulated Dependence Profiles.10*N
is a number of observations used for calculation of Feature Importance.B
is a number of permutation rounds used for calculation of Shapley Values and Feature Importance.Decrease N
and B
parameters to lower the computation time or increase them to get more accurate empirical results.
Don’t compute the EDA plots if they are not needed. Set the eda
parameter to FALSE
.
Hide computation progress bar messages with show_info
parameter.
Change viewer
parameter to set where to display modelStudio
. Best described in r2d3
documentation.
Speed up modelStudio
computation by setting parallel
parameter to TRUE
. It uses parallelMap
package to calculate local explainers faster. It is really useful when using modelStudio
with complicated models, vast datasets or many observations are being processed.
All options can be set outside of the function call. How to use parallelMap.
# set up the cluster
options(
parallelMap.default.mode = "socket",
parallelMap.default.cpus = 4,
parallelMap.default.show.info = FALSE
)
# calculations of local explanations will be distributed into 4 cores
modelStudio(explainer,
new_observation = test[1:16,],
parallel = TRUE)
Customize some of the modelStudio
looks by overwriting default options returned by the ms_options()
function. Full list of options.
# set additional graphical parameters
new_options <- ms_options(
show_subtitle = TRUE,
bd_subtitle = "Hello World",
line_size = 5,
point_size = 9,
line_color = "pink",
point_color = "purple",
bd_positive_color = "yellow",
bd_negative_color = "orange"
)
modelStudio(explainer,
options = new_options)
All visual options can be changed after the calculations using ms_update_options()
.
old_ms <- modelStudio(explainer)
old_ms
# update the options
new_ms <- ms_update_options(old_ms,
time = 0,
facet_dim = c(1,2),
margin_left = 150)
new_ms
Use ms_update_observations()
to add more observations with their local explanations to the modelStudio
.
Use explain_*()
functions from the DALEXtra package to explain various models.
Bellow basic example of making modelStudio
for a mlr
model using explain_mlr()
.
library(DALEXtra)
library(mlr)
# fit a model
task <- makeClassifTask(id = "task", data = train, target = "fired")
learner <- makeLearner("classif.ranger", predict.type = "prob")
model <- train(learner, task)
# create an explainer for the model
explainer_mlr <- explain_mlr(model,
data = test,
y = test$fired,
label = "mlr")
# make a studio for the model
modelStudio(explainer_mlr)