binomial
family models. That is akin to running predict(model, type = "response")
contr.treatment
) are supported.offset
is supportedwt ~ mpg + am
mutate(mtcars, newam = paste0(am))
and then wt ~ mpg + newam
wt ~ mpg + as.factor(am)
wt ~ mpg + as.character(am)
tidypredict_interval()
& tidypredict_sql_interval()
library(tidypredict)
library(dplyr)
df <- mtcars %>%
mutate(char_cyl = paste0("cyl", cyl)) %>%
select(wt, char_cyl, am)
model <- glm(am ~ wt + char_cyl, data = df, family = "binomial")
It returns a SQL query that contains the coefficients (model$coefficients
) operated against the correct variable or categorical variable value. In most cases the resulting SQL is one short CASE WHEN
statement per coefficient. It appends the offset
field or value, if one is provided.
For binomial
models, the sigmoid equation is applied. This means that the target SQL database type will need to support the exponent function.
library(tidypredict)
tidypredict_sql(model, dbplyr::simulate_mssql())
#> <SQL> 1.0 - 1.0 / (1.0 + EXP(20.8527831345691 + (`wt` * -7.85934263583835) + (CASE WHEN (`char_cyl` = 'cyl6') THEN (1.0) WHEN NOT(`char_cyl` = 'cyl6') THEN (0.0) END * 3.10462643177453) + (CASE WHEN (`char_cyl` = 'cyl8') THEN (1.0) WHEN NOT(`char_cyl` = 'cyl8') THEN (0.0) END * 5.37942092366097)))
Alternatively, use tidypredict_to_column()
if the results are the be used or previewed in dplyr
.
df %>%
tidypredict_to_column(model) %>%
head(10)
#> wt char_cyl am fit
#> 1 2.620 cyl6 1 0.96662269
#> 2 2.875 cyl6 1 0.79605201
#> 3 2.320 cyl4 1 0.93208127
#> 4 3.215 cyl6 0 0.21242376
#> 5 3.440 cyl8 0 0.30918450
#> 6 3.460 cyl6 0 0.03783629
#> 7 3.570 cyl8 0 0.13875740
#> 8 3.190 cyl4 0 0.01450687
#> 9 3.150 cyl4 0 0.01975984
#> 10 3.440 cyl6 0 0.04399324
The parser reads several parts of the glm
object to tabulate all of the needed variables. One entry per coefficient is added to the final table. Other variables are added at the end. Some variables are not required for every parsed model. For example, offset
is listed because it’s part of the formula (call) of the model, if there were no offset in a given model, that line would not exist.
pm <- parse_model(model)
str(pm, 2)
#> List of 2
#> $ general:List of 7
#> ..$ model : chr "glm"
#> ..$ version : num 2
#> ..$ type : chr "regression"
#> ..$ residual: int 28
#> ..$ family : chr "binomial"
#> ..$ link : chr "logit"
#> ..$ is_glm : num 1
#> $ terms :List of 4
#> ..$ :List of 5
#> ..$ :List of 5
#> ..$ :List of 5
#> ..$ :List of 5
#> - attr(*, "class")= chr [1:3] "parsed_model" "pm_regression" "list"
The output from parse_model()
is transformed into a dplyr
, a.k.a Tidy Eval, formula. All categorical variables are operated using if_else()
.
tidypredict_fit(model)
#> 1 - 1/(1 + exp(20.8527831345691 + (wt * -7.85934263583835) +
#> (ifelse(char_cyl == "cyl6", 1, 0) * 3.10462643177453) + (ifelse(char_cyl ==
#> "cyl8", 1, 0) * 5.37942092366097)))
From there, the Tidy Eval formula can be used anywhere where it can be operated. tidypredict
provides three paths:
dplyr
, mutate(df, !! tidypredict_fit(model))
tidypredict_to_column(model)
to a piped command settidypredict_to_sql(model)
to retrieve the SQL statementThe same applies to the prediction interval functions.
Testing the tidypredict
results is easy. The tidypredict_test()
function automatically uses the lm
model object’s data frame, to compare tidypredict_fit()
, and tidypredict_interval()
to the results given by predict()