Getting started with the shinyML package

Jean Bertin

2019-10-29

Description

shinyML is a Shiny application that helps you to easily compare supervised machine learning regression models. The two main fuctions of this package are shiny_h2o and shiny_spark which leaves the choice for the user to train and test models on H2O or Spark framework.

Introduction of shinyML

Once you get your data stored on a data.table or data.frame object, you can just use one line of code to run shiny_h2o and shiny_spark functions and deploy a the Shiny App as below. This app can be shared your colleagues if you put share_app argument to TRUE and select a port that is free on your server.

library(shinyML)
longley2 <- longley %>% mutate(Year = as.Date(as.character(Year),format = "%Y"))
shiny_h2o(data =longley2,x = c("GNP_deflator","Unemployed" ,"Armed_Forces","Employed"),y = "GNP",date_column = "Year",share_app = TRUE,port = 3951)

Explore input dataset before running the models…

Before running machine learning models, it can be useful to inspect the distribution of each variable and to have an insight of dependencies between explicative variables. Bothshiny_h2o and shiny_spark functions allows to check classes of explicative variables, plot histograms of each distribution and show correlation matrix between all variables. This tabs can be used to determine if some variable are strongly correlated to another and eventually removed from the training phase.You can also plot variation of every variable as a function of another using the “Explore dataset” tab.

An example of output of shinyML

Runing the app…

When the shiny has been launched, you can manually adjust main parameters of supervised models (such as generalized linear regression, Random forest, Neural Network, Gradient Boosting …) by moving the coresponding cursors. In addition to hyper-parameters setting for each model, you can adjust train and test period and choose which variables you want to keep during the training phase.

An example of output of shinyML

You can then run each model separately or run all models simultaneously clicking the corresponding button to each box.

Run all models at the same time with your custom configuration

You will see a validation message box once all models have been trained: at that point, you can have an overview of your results comparing variables importances and error metrics like MAPE or RMSE.

Run autoML alogrithm to find automatically configure the best machine learning regression model associated to your dataset

AutoML algorithm will automatically find the best algorithm to suit your regression task: as soon as the maximum time for searching will be reached, you get a message box indicating which machine learning model has been choosed to suit you regression task and specifying all corresponding hyper-parameter values.

The only setting that must be adjusted by the user is the maximum time authorized for searching. Please notice that this functionality is for the moment only available on spark_h2o function.

Run autoML algorithm to compare machine learning techniques