Analysis of the Stroop experiment data using the Bayesian t-test

Jure Demšar, Erik Štrumbelj and Grega Repovš

2020-02-19

Introduction

The Stroop test showed that when the stimuli is incongruent – the name of a color is printed in different ink than the one denoted by its name – naming the color takes longer and is more error-prone than naming the color of a rectangle or a set of characters that does not form a word.

In our version of the Stroop test participants were faced with four types of conditions:

We are primarily interested in expected task completion times. Every participant had the same number of stimuli in every condition, so we opt for a Bayesian t-test. The data are already split into the four conditions described above, so we only need to specify the priors. We based them on our previous experience with similar tasks – participants finish the task in approximately 1 minute and the typical standard deviation for a participant is less than 2 minutes. Note here that, due to vignette limitations, all fits are built using only one chain, using more chains in parallel is usually more efficient. Also to increase the building speed of vignettes we greatly reduced the amount of iterations, use an appropriate amount of iterations when executing actual analyses!

# libs
library(bayes4psy)
library(ggplot2)

# load data
data <- stroop_simple

# priors
mu_prior <- b_prior(family="normal", pars=c(60, 30))
sigma_prior <- b_prior(family="uniform", pars=c(0, 120))
priors <- list(c("mu", mu_prior),
               c("sigma", sigma_prior))

# fit
fit_reading_neutral <- b_ttest(data$reading_neutral,
                               priors=priors,
                               iter=500, warmup=100, chains=1)
## 
## SAMPLING FOR MODEL 'ttest' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: WARNING: There aren't enough warmup iterations to fit the
## Chain 1:          three stages of adaptation as currently configured.
## Chain 1:          Reducing each adaptation stage to 15%/75%/10% of
## Chain 1:          the given number of warmup iterations:
## Chain 1:            init_buffer = 14
## Chain 1:            adapt_window = 76
## Chain 1:            term_buffer = 10
## Chain 1: 
## Chain 1: Iteration:   1 / 500 [  0%]  (Warmup)
## Chain 1: Iteration:  50 / 500 [ 10%]  (Warmup)
## Chain 1: Iteration: 100 / 500 [ 20%]  (Warmup)
## Chain 1: Iteration: 101 / 500 [ 20%]  (Sampling)
## Chain 1: Iteration: 150 / 500 [ 30%]  (Sampling)
## Chain 1: Iteration: 200 / 500 [ 40%]  (Sampling)
## Chain 1: Iteration: 250 / 500 [ 50%]  (Sampling)
## Chain 1: Iteration: 300 / 500 [ 60%]  (Sampling)
## Chain 1: Iteration: 350 / 500 [ 70%]  (Sampling)
## Chain 1: Iteration: 400 / 500 [ 80%]  (Sampling)
## Chain 1: Iteration: 450 / 500 [ 90%]  (Sampling)
## Chain 1: Iteration: 500 / 500 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.012 seconds (Warm-up)
## Chain 1:                0.045 seconds (Sampling)
## Chain 1:                0.057 seconds (Total)
## Chain 1:
fit_reading_incongruent <- b_ttest(data$reading_incongruent,
                                   priors=priors,
                                   iter=500, warmup=100, chains=1)
## 
## SAMPLING FOR MODEL 'ttest' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: WARNING: There aren't enough warmup iterations to fit the
## Chain 1:          three stages of adaptation as currently configured.
## Chain 1:          Reducing each adaptation stage to 15%/75%/10% of
## Chain 1:          the given number of warmup iterations:
## Chain 1:            init_buffer = 14
## Chain 1:            adapt_window = 76
## Chain 1:            term_buffer = 10
## Chain 1: 
## Chain 1: Iteration:   1 / 500 [  0%]  (Warmup)
## Chain 1: Iteration:  50 / 500 [ 10%]  (Warmup)
## Chain 1: Iteration: 100 / 500 [ 20%]  (Warmup)
## Chain 1: Iteration: 101 / 500 [ 20%]  (Sampling)
## Chain 1: Iteration: 150 / 500 [ 30%]  (Sampling)
## Chain 1: Iteration: 200 / 500 [ 40%]  (Sampling)
## Chain 1: Iteration: 250 / 500 [ 50%]  (Sampling)
## Chain 1: Iteration: 300 / 500 [ 60%]  (Sampling)
## Chain 1: Iteration: 350 / 500 [ 70%]  (Sampling)
## Chain 1: Iteration: 400 / 500 [ 80%]  (Sampling)
## Chain 1: Iteration: 450 / 500 [ 90%]  (Sampling)
## Chain 1: Iteration: 500 / 500 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.011 seconds (Warm-up)
## Chain 1:                0.028 seconds (Sampling)
## Chain 1:                0.039 seconds (Total)
## Chain 1:
fit_naming_neutral <- b_ttest(data$naming_neutral,
                              priors=priors,
                              iter=500, warmup=100, chains=1)
## 
## SAMPLING FOR MODEL 'ttest' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: WARNING: There aren't enough warmup iterations to fit the
## Chain 1:          three stages of adaptation as currently configured.
## Chain 1:          Reducing each adaptation stage to 15%/75%/10% of
## Chain 1:          the given number of warmup iterations:
## Chain 1:            init_buffer = 14
## Chain 1:            adapt_window = 76
## Chain 1:            term_buffer = 10
## Chain 1: 
## Chain 1: Iteration:   1 / 500 [  0%]  (Warmup)
## Chain 1: Iteration:  50 / 500 [ 10%]  (Warmup)
## Chain 1: Iteration: 100 / 500 [ 20%]  (Warmup)
## Chain 1: Iteration: 101 / 500 [ 20%]  (Sampling)
## Chain 1: Iteration: 150 / 500 [ 30%]  (Sampling)
## Chain 1: Iteration: 200 / 500 [ 40%]  (Sampling)
## Chain 1: Iteration: 250 / 500 [ 50%]  (Sampling)
## Chain 1: Iteration: 300 / 500 [ 60%]  (Sampling)
## Chain 1: Iteration: 350 / 500 [ 70%]  (Sampling)
## Chain 1: Iteration: 400 / 500 [ 80%]  (Sampling)
## Chain 1: Iteration: 450 / 500 [ 90%]  (Sampling)
## Chain 1: Iteration: 500 / 500 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.015 seconds (Warm-up)
## Chain 1:                0.057 seconds (Sampling)
## Chain 1:                0.072 seconds (Total)
## Chain 1:
fit_naming_incongruent <- b_ttest(data$naming_incongruent,
                                  priors=priors,
                                  iter=500, warmup=100, chains=1)
## 
## SAMPLING FOR MODEL 'ttest' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: WARNING: There aren't enough warmup iterations to fit the
## Chain 1:          three stages of adaptation as currently configured.
## Chain 1:          Reducing each adaptation stage to 15%/75%/10% of
## Chain 1:          the given number of warmup iterations:
## Chain 1:            init_buffer = 14
## Chain 1:            adapt_window = 76
## Chain 1:            term_buffer = 10
## Chain 1: 
## Chain 1: Iteration:   1 / 500 [  0%]  (Warmup)
## Chain 1: Iteration:  50 / 500 [ 10%]  (Warmup)
## Chain 1: Iteration: 100 / 500 [ 20%]  (Warmup)
## Chain 1: Iteration: 101 / 500 [ 20%]  (Sampling)
## Chain 1: Iteration: 150 / 500 [ 30%]  (Sampling)
## Chain 1: Iteration: 200 / 500 [ 40%]  (Sampling)
## Chain 1: Iteration: 250 / 500 [ 50%]  (Sampling)
## Chain 1: Iteration: 300 / 500 [ 60%]  (Sampling)
## Chain 1: Iteration: 350 / 500 [ 70%]  (Sampling)
## Chain 1: Iteration: 400 / 500 [ 80%]  (Sampling)
## Chain 1: Iteration: 450 / 500 [ 90%]  (Sampling)
## Chain 1: Iteration: 500 / 500 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.018 seconds (Warm-up)
## Chain 1:                0.081 seconds (Sampling)
## Chain 1:                0.099 seconds (Total)
## Chain 1:

Next we perform MCMC diagnostics and visual checks of model fits.

# trace plots
#plot_trace(fit_reading_neutral)
#plot_trace(fit_reading_incongruent)
#plot_trace(fit_naming_neutral)
plot_trace(fit_naming_incongruent)

# check fit
#print(fit_reading_neutral)
#print(fit_reading_incongruent)
#print(fit_naming_neutral)
print(fit_naming_incongruent)
## Inference for Stan model: ttest.
## 1 chains, each with iter=500; warmup=100; thin=1; 
## post-warmup draws per chain=400, total post-warmup draws=400.
## 
##          mean se_mean    sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
## mu      73.09    0.24  1.93   69.15   71.92   73.26   74.27   77.01    62 1.01
## sigma   15.50    0.09  1.69   12.70   14.29   15.33   16.51   19.11   339 1.00
## nu      36.42    1.97 27.43    6.22   16.43   29.58   48.62  105.12   194 1.00
## lp__  -216.11    0.08  1.05 -218.49 -216.65 -215.84 -215.38 -214.85   163 1.00
## 
## Samples were drawn using NUTS(diag_e) at Wed Feb 19 11:06:01 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).
# visual inspection
#plot(fit_reading_neutral)
#plot(fit_reading_incongruent)
#plot(fit_naming_neutral)
plot(fit_naming_incongruent)

There were no reasons for concern, several commands in the example above are commented out for brevity. In practice, we should of course always perform these steps. We proceed by cross-comparing several fits with a single line of code.

# join all fits but the first in a list
fit_list <- c(fit_reading_incongruent,
              fit_naming_neutral,
              fit_naming_incongruent)

# compare all fits simultaneously
multiple_comparison <- compare_means(fit_reading_neutral,
                                     fits=fit_list)
## 
## ---------- Group 1 vs Group 2 ----------
## Probabilities:
##   - Group 1 < Group 2: 1.00 +/- 0.00000
##   - Group 1 > Group 2: 0.00 +/- 0.00000
## 95% HDI:
##   - Group 1 - Group 2: [-4.27, -0.78]
## 
## 
## ---------- Group 1 vs Group 3 ----------
## Probabilities:
##   - Group 1 < Group 3: 1.00 +/- 0.00000
##   - Group 1 > Group 3: 0.00 +/- 0.00000
## 95% HDI:
##   - Group 1 - Group 2: [-15.54, -10.10]
## 
## 
## ---------- Group 1 vs Group 4 ----------
## Probabilities:
##   - Group 1 < Group 4: 1.00 +/- 0.00000
##   - Group 1 > Group 4: 0.00 +/- 0.00000
## 95% HDI:
##   - Group 1 - Group 2: [-36.13, -27.77]
## 
## 
## ---------- Group 2 vs Group 3 ----------
## Probabilities:
##   - Group 2 < Group 3: 1.00 +/- 0.00000
##   - Group 2 > Group 3: 0.00 +/- 0.00000
## 95% HDI:
##   - Group 1 - Group 2: [-12.36, -6.78]
## 
## 
## ---------- Group 2 vs Group 4 ----------
## Probabilities:
##   - Group 2 < Group 4: 1.00 +/- 0.00000
##   - Group 2 > Group 4: 0.00 +/- 0.00000
## 95% HDI:
##   - Group 1 - Group 2: [-33.57, -26.12]
## 
## 
## ---------- Group 3 vs Group 4 ----------
## Probabilities:
##   - Group 3 < Group 4: 1.00 +/- 0.00000
##   - Group 3 > Group 4: 0.00 +/- 0.00000
## 95% HDI:
##   - Group 1 - Group 2: [-24.30, -14.85]
## 
## -----------------------------------------
## Probabilities that a certain group is
## smallest/largest or equal to all others:
## 
##   largest smallest equal
## 1       0        1     0
## 2       0        0     0
## 3       0        0     0
## 4       1        0     0

When we compare more than 2 fits, we also get an estimate of the probabilities that a group has the largest or the smallest expected value. Based on the above output, the participants are best at the reading neutral task (Group 1), followed by the reading incongruent task (Group 2) and the naming neutral task (Group 3). They are the worst at the naming incongruent task (Group 4). This ordering is true with very high probability, so we can conclude that both naming and incongruency of stimuli increase response times of subjects, with naming having a bigger effect. We can also visualize this in various ways, either as distributions of mean times needed to solve the given tasks (with the plot_means function) or as a difference between these means (with the plot_means_difference function).

plot_means(fit_reading_neutral, fits=fit_list) +
  scale_fill_hue(labels=c("Reading neutral",
                          "Reading incongruent",
                          "Naming neutral",
                          "Naming incongruent")) +
  theme(legend.title=element_blank())

Above is a visualization of means for all four types of Stroop tasks X-axis (value) denotes task completion time. Naming and incongruency conditions make the task more difficult, with naming having a bigger effect.

plot_means_difference(fit_reading_neutral, fits=fit_list) 

The figure above visualizes the differences in the mean task completion times for the four conditions. Row and column 1 represent the reading neutral task, row and column 2 the reading incongruent task, row and column 3 the naming neutral task and row and column 4 the naming incongruent task. Since 95% HDI intervals in all cases exclude 0 we are confident that the task completion times are different.