This vignette needs some love.
Let’s start off with a simple example. We will simulate 100 observations of a normally distributed outcome variable (mean = 0 and SD = 1) and a grouping variable made of zeros and ones. Importantly, the mean difference between these two groups (zeros vs. ones) is of 1.
data <- bayestestR::simulate_difference(n = 100,
                                        d = 1,
                                        names = c("Group", "Outcome"))
summary(data)>  Group     Outcome     
>  0:50   Min.   :-2.55  
>  1:50   1st Qu.:-0.74  
>         Median : 0.00  
>         Mean   : 0.00  
>         3rd Qu.: 0.74  
>         Max.   : 2.55
Now, as we are interested in the difference between these two groups, we can first investigate it using a t-test.
library(ggplot2)
ggplot(data, aes(x=Group, y=Outcome, fill=Group)) +
  geom_boxplot() +
  see::theme_modern()ttest <- t.test(Outcome ~ Group, data=data, var.equal=TRUE)
ttest_pars <- parameters::parameters(ttest)
ttest_pars> Parameter | Group | Mean_Group1 | Mean_Group2 | Difference |     t | df |      p |         95% CI |             Method
> ----------------------------------------------------------------------------------------------------------------------
> Outcome   | Group |       -0.50 |        0.50 |       1.00 | -5.32 | 98 | < .001 | [-1.37, -0.63] |  Two Sample t-test
As we can see, this confirms our simulation specifications, the difference is indeed of 1.
Let’s compute now, a traditional Cohen’s d using the effectsize package. While this d should be close to 1, it should theoretically be a tiny bit larger, because it takes into account the (pooled) SD of the whole variable x (across the groups), which because of the difference is a bit larger than 1.
> [1] 1.1
We can compute the Cohen’s d as follows:
> Cohen's d |         95% CI
> --------------------------
>     -1.06 | [-1.48, -0.64]
As expected, it’s pretty close to 1 times the SD of the sample. Interestingly, one can estimate the Cohen’s d directly from the result of the t-test, using the t statistic. We can convert it to a d using the effectsize package:
>     d |         95% CI
> ----------------------
> -1.07 | [-1.50, -0.65]
Fortunately, they are quite close.
Another way of investigating these differences is through the lens of a logistic regression. The main difference is that here, the group variable y becomes the outcome and x the predictor. Let’s fit such model and investigate the parameters:
> Parameter   | Coefficient |   SE |        95% CI |        z |      p
> --------------------------------------------------------------------
> (Intercept) |    1.23e-16 | 0.23 | [-0.45, 0.45] | 5.41e-16 | > .999
> Outcome     |        1.13 | 0.27 | [ 0.65, 1.70] |     4.25 | < .001
How to interpret this output? The coefficients of a logistic model are expressed in log-odds, which is a metric of probability. Using the modelbased package, one can easily visualize this model:
data_grid <- modelbased::estimate_link(model)
ggplot(data_grid, aes(x = Outcome, y = Predicted)) +
  geom_ribbon(aes(ymin = CI_low, ymax = CI_high), alpha = 0.2) +
  geom_line(color = "red", size = 1) + 
  see::theme_modern()We can see that the probability of y being 1 (vs. 0) increases as x increases. This is another way of saying that there is a difference of x between the two groups of y. We can visualize all of our this together as follows:
ggplot(data, aes(x=Group, y=Outcome, fill=Group)) +
  geom_boxplot() +
  geom_jitter(width = 0.1) + 
  # add vertical regression line
  geom_line(data = data_grid, 
            aes(x = Predicted + 1, y = Outcome, fill = NA), 
            color = "red", size = 1) +
  see::theme_modern()You can notice that the red predicted probability line passes through x=0 when y=0.5. This means that when x=0, the probability of the two groups is equal: it is the “middle” of the difference between them.