shinyML
is a Shiny application that helps you to easily compare supervised machine learning regression models. The two main fuctions of this package are shiny_h2o
and shiny_spark
which leaves the choice for the user to train and test models on H2O or Spark framework.
Once you get your data stored on a data.table or data.frame object, you can just use one line of code to run shiny_h2o
and shiny_spark
functions and deploy a the Shiny App as below. This app can be shared your colleagues if you put share_app
argument to TRUE
and select a port that is free on your server.
library(shinyML)
longley2 <- longley %>% mutate(Year = as.Date(as.character(Year),format = "%Y"))
shiny_h2o(data =longley2,x = c("GNP_deflator","Unemployed" ,"Armed_Forces","Employed"),y = "GNP",date_column = "Year",share_app = TRUE,port = 3951)
Before running machine learning models, it can be useful to inspect the distribution of each variable and to have an insight of dependencies between explicative variables. Bothshiny_h2o
and shiny_spark
functions allows to check classes of explicative variables, plot histograms of each distribution and show correlation matrix between all variables. This tabs can be used to determine if some variable are strongly correlated to another and eventually removed from the training phase.You can also plot variation of every variable as a function of another using the “Explore dataset” tab.
When the shiny has been launched, you can manually adjust main parameters of supervised models (such as generalized linear regression, Random forest, Neural Network, Gradient Boosting …) by moving the coresponding cursors. In addition to hyper-parameters setting for each model, you can adjust train and test period and choose which variables you want to keep during the training phase.
You can then run each model separately or run all models simultaneously clicking the corresponding button to each box.
You will see a validation message box once all models have been trained: at that point, you can have an overview of your results comparing variables importances and error metrics like MAPE or RMSE.
AutoML algorithm will automatically find the best algorithm to suit your regression task: as soon as the maximum time for searching will be reached, you get a message box indicating which machine learning model has been choosed to suit you regression task and specifying all corresponding hyper-parameter values.
The only setting that must be adjusted by the user is the maximum time authorized for searching. Please notice that this functionality is for the moment only available on spark_h2o
function.