Introduction

Since the DDM is widely used in parameter estimation usually involving numerical optimization, significant effort has been put into making the evaluation of its density as fast as possible. However, the density function for the DDM is notorious for containing an unavoidable infinite sum; hence, the literature has produced a few different methods of approximating the density. This vignette is designed to compare the speed of the currently available algorithms that calculate the lower probability density of the DDM. To this aim, we present benchmark data of currently available methods from the literature, our streamlined implementations of those methods, and our own novel methods. The first section will record the benchmark data of the density function approximations themselves and show the results through a series of visualizations. The second section will record the benchmark data of parameter estimation that uses the density function approximations in the optimization process for fitting to real-world data; this section will also include visualizations to illustrate the differences among the density function approximations. Please note that we will not be testing the functions for accuracy or consistency in this vignette, as that is covered in the Validity Vignette.

Benchmarking the Density Functions

We want to determine the performance of each method across a wide variety of parameters in order to identify any slow areas for individual methods. To achieve this rigor, we define an extensive parameter space (in a code chunk below) and loop through each combination of parameters. For the preliminary benchmark testing, we will input the response times as a vector to each function instead of inputting individual response times for two reasons: first, this is the most common way to input the response time data; and second, this allows for the R-based method to exploit vectorization to make that method as fast as possible. For each combination of parameters, we run the microbenchmark function from the package microbenchmark 100 times for each method and only save the median benchmark time of these 100. We will refer to these as the “median benchmark times” for the remainder of this vignette. Running the benchmark tests this many times for each set of parameters generates a sufficient amount of benchmark data so that the median of these times is unlikely to be an outlier, either faster or slower than a typical run time.

Upon collecting the benchmark data, we perform a preliminary analysis of the approximations’ performance by showing the distributions of median benchmark times as side-by-side violin plots for each approximation. Then we cull the methods and only test further on the fastest and most widely used approximations. To elucidate the areas of the parameter space where the different methods excel and struggle, we plot the median benchmark times as a function of individual parameter values. These plots will reveal which model parameters affect the efficiency of each approximation and how those parameters can slow down the approximations.

Generating Benchmark Data

Before analyzing how the different density function approximations perform, we must first generate the benchmark data. In addition to testing all of the methods available in dfddm, we also include three functions that are considered to be current and widely used. First, we include the function ddiffusion from the package rtdists as it is well known for being not only user friendly and feature-rich but also designed specifically for handling data with regard to distributions for response time models. Second, we also test the dwiener function from the RWiener package, which is mainly aimed at providing an R language interface to calculate various functions from the Wiener process. Third, we include some raw R code that is bundled with the paper written by Gondan, Blurton, and Kesselmeier (2014) about improving the approximation to the density function. As this code was not yet available in an R package, it is part of fddm and can be accessed after running source on the corresponding file as shown below.

To capture the behavior of the density function approximations, we test each one in a granular parameter space that includes the parameter values most likely used in practice. This parameter space can be found fully defined in the code chunk later in this section. Note that we are only testing the “lower” density function as the conversion from the “lower” density function to the “upper” density function involves only the mappings \(v \to -v\) and \(w \to 1-w\). Since our parameter space includes the negatives of all positive values of \(v\) and the complements of all values of \(w\), testing both the “lower” and “upper” density functions would be redundant.

In addition, we include two functions for benchmarking the performance of the approximations: one where the response times are input as a vector, and one where the response times are input one at a time to the density function. Inputting the response times as a vector is the most common way of inputting data to the density function and allows the R code from Gondan, Blurton, and Kesselmeier (2014) to exploit R’s powerful vectorization, operating at its most efficient. We also include a benchmark function that inputs the response times individually as this allows us to see the impact that the size of the response time has on the evaluation time of the density function approximations.

At each point in this parameter space, we run the microbenchmark function \(100\) times for each approximation and save the median of these 100 benchmark times. Using the median of these data should reduce the number of outliers in the data while maintaining a truthful representation of the approximations’ performance. The following code chunk defines the functions that we will use to benchmark the algorithms; we will be using the microbenchmark package to time the evaluations.

library("fddm")
library("rtdists")
library("RWiener")
source(system.file("extdata", "Gondan_et_al_density.R", package = "fddm", mustWork = TRUE))
library("microbenchmark")

rt_benchmark_vec <- function(RT, resp, V, A, t0 = 1e-4, W = 0.5, SV = 0.0,
                             err_tol = 1e-6, times = 100, unit = "ns") {

  fnames <- c("fs_SWSE_17", "fs_SWSE_14",
              "fs_Gon_17", "fs_Gon_14", "fs_Nav_17", "fs_Nav_14",
              "fb_Gon_17", "fb_Gon_14", "fb_Nav_17", "fb_Nav_14",
              "fl_Nav_09", "RWiener", "Gondan", "rtdists")
  nf <- length(fnames) # number of functions being benchmarked
  nV <- length(V)
  nA <- length(A)
  nW <- length(W)
  nSV <- length(SV)
  resp <- rep(resp, length(RT)) # for RWiener

  # Initialize the dataframe to contain the microbenchmark results
  mbm_res <- data.frame(matrix(ncol = 4+nf, nrow = nV*nA*nW*nSV))
  colnames(mbm_res) <- c('V', 'A', 'W', 'SV', fnames)
  row_idx <- 1

  # Loop through each combination of parameters and record microbenchmark results
  for (v in 1:nV) {
    for (a in 1:nA) {
      for (w in 1:nW) {
        for (sv in 1:nSV) {
          mbm <- microbenchmark(
          fs_SWSE_17 = dfddm(rt = RT, response = resp, a = A[a],
                             v = V[v], t0 = t0, w = W[w],
                             log = FALSE, n_terms_small = "SWSE",
                             summation_small = "2017", scale = "small",
                             err_tol = err_tol),
          fs_SWSE_14 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                             log = FALSE, n_terms_small = "SWSE",
                             summation_small = "2014", scale = "small",
                             err_tol = err_tol),
          fs_Gon_17 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "Gondan",
                            summation_small = "2017", scale = "small",
                            err_tol = err_tol),
          fs_Gon_14 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "Gondan",
                            summation_small = "2014", scale = "small",
                            err_tol = err_tol),
          fs_Nav_17 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "Navarro",
                            summation_small = "2017", scale = "small",
                            err_tol = err_tol),
          fs_Nav_14 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "Navarro",
                            summation_small = "2014", scale = "small",
                            err_tol = err_tol),
          fb_Gon_17 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "Gondan",
                            summation_small = "2017", scale = "both",
                            err_tol = err_tol),
          fb_Gon_14 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "Gondan",
                            summation_small = "2014", scale = "both",
                            err_tol = err_tol),
          fb_Nav_17 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "Navarro",
                            summation_small = "2017", scale = "both",
                            err_tol = err_tol),
          fb_Nav_14 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "Navarro",
                            summation_small = "2014", scale = "both",
                            err_tol = err_tol),
          fl_Nav_09 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "",
                            summation_small = "", scale = "large",
                            err_tol = err_tol),
          RWiener = dwiener(RT, resp = resp, alpha = A[a],
                            delta = V[v], tau = t0, beta = W[w],
                            give_log = FALSE),
          Gondan = fs(t = RT-t0, a = A[a], v = V[v],
                      w = W[w], eps = err_tol), # only "lower" resp
          rtdists = ddiffusion(RT, resp, a = A[a], v = V[v],
                               t0 = t0, z = W[w]*A[a]),
          times = times, unit = unit)
        # add the v, a, and w values to the dataframe
        mbm_res[row_idx, 1] <- V[v]
        mbm_res[row_idx, 2] <- A[a]
        mbm_res[row_idx, 3] <- W[w]
        mbm_res[row_idx, 4] <- SV[sv]
        # add the median microbenchmark results to the dataframe
        for (i in 1:nf) {
          mbm_res[row_idx, 4+i] <- median(mbm[mbm[,1] == fnames[i],2])
        }
        # iterate start value
        row_idx = row_idx + 1
        }
      }
    }
  }
  return(mbm_res)
}

rt_benchmark_ind <- function(RT, resp, V, A, t0 = 1e-4, W = 0.5, SV = 0.0,
                             err_tol = 1e-6, times = 100, unit = "ns") {

  fnames <- c("fs_SWSE_17", "fs_SWSE_14",
              "fs_Gon_17", "fs_Gon_14", "fs_Nav_17", "fs_Nav_14",
              "fb_Gon_17", "fb_Gon_14", "fb_Nav_17", "fb_Nav_14",
              "fl_Nav_09", "RWiener", "Gondan", "rtdists")
  nf <- length(fnames) # number of functions being benchmarked
  nRT <- length(RT)
  nV <- length(V)
  nA <- length(A)
  nW <- length(W)
  nSV <- length(SV)

  # Initialize the dataframe to contain the microbenchmark results
  mbm_res <- data.frame(matrix(ncol = 5+nf, nrow = nRT*nV*nA*nW*nSV))
  colnames(mbm_res) <- c('RT', 'V', 'A', 'W', 'SV', fnames)
  row_idx <- 1

  # Loop through each combination of parameters and record microbenchmark results
  for (rt in 1:nRT) {
    for (v in 1:nV) {
      for (a in 1:nA) {
        for (w in 1:nW) {
          for (sv in 1:nSV) {
            mbm <- microbenchmark(
            fs_SWSE_17 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "SWSE",
                              summation_small = "2017", scale = "small",
                              err_tol = err_tol),
            fs_SWSE_14 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "SWSE",
                              summation_small = "2014", scale = "small",
                              err_tol = err_tol),
            fs_Gon_17 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "Gondan",
                              summation_small = "2017", scale = "small",
                              err_tol = err_tol),
            fs_Gon_14 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "Gondan",
                              summation_small = "2014", scale = "small",
                              err_tol = err_tol),
            fs_Nav_17 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "Navarro",
                              summation_small = "2017", scale = "small",
                              err_tol = err_tol),
            fs_Nav_14 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "Navarro",
                              summation_small = "2014", scale = "small",
                              err_tol = err_tol),
            fb_Gon_17 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "Gondan",
                              summation_small = "2017", scale = "both",
                              err_tol = err_tol),
            fb_Gon_14 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "Gondan",
                              summation_small = "2014", scale = "both",
                              err_tol = err_tol),
            fb_Nav_17 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "Navarro",
                              summation_small = "2017", scale = "both",
                              err_tol = err_tol),
            fb_Nav_14 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "Navarro",
                              summation_small = "2014", scale = "both",
                              err_tol = err_tol),
            fl_Nav_09 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "",
                              summation_small = "", scale = "large",
                              err_tol = err_tol),
            RWiener = dwiener(RT[rt], resp = resp, alpha = A[a],
                              delta = V[v], tau = t0, beta = W[w],
                              give_log = FALSE),
            Gondan = fs(t = RT[rt]-t0, a = A[a], v = V[v],
                        w = W[w], eps = err_tol), # only "lower" resp
            rtdists = ddiffusion(RT[rt], resp, a = A[a], v = V[v],
                                 t0 = t0, z = W[w]*A[a]),
            times = times, unit = unit)
          # add the v, a, and w values to the dataframe
          mbm_res[row_idx, 1] <- RT[rt]
          mbm_res[row_idx, 2] <- V[v]
          mbm_res[row_idx, 3] <- A[a]
          mbm_res[row_idx, 4] <- W[w]
          mbm_res[row_idx, 5] <- SV[sv]
          # add the median microbenchmark results to the dataframe
          for (i in 1:nf) {
            mbm_res[row_idx, 5+i] <- median(mbm[mbm[,1] == fnames[i],2])
          }
          # iterate start value
          row_idx = row_idx + 1
          }
        }
      }
    }
  }
  return(mbm_res)
}

Control loops in R have a lot of overhead and are comparatively slow compared to loops in C++; thus, we will only run rt_benchmark_vec() in this vignette as this benchmark function inputs the response times as a vector, avoiding an additional loop in R. To show comparisons across the algorithms with respect to the input response time, we will load pre-run benchmark data obtain through using rt_benchmark_ind() as this takes too long to evaluate in this vignette. The following code chunk defines the parameter space to be explored throughout the benchmark testing and runs the vectorized benchmark function defined in the previous code chunk.

# Define parameter space
RT <- c(0.001, 0.1, 1, 2, 3, 4, 5, 10, 30)
A <- c(0.25, 0.5, 1, 2.5, 5)
V <- c(-5, -2, 0, 2, 5)
t0 <- 1e-4 # must be nonzero for RWiener
W <- c(0.2, 0.5, 0.8)
SV <- c(0, 0.5, 1, 1.5)
err_tol <- 1e-6 # this is the setting from rtdists

# Run benchmark tests
bm_vec <- rt_benchmark_vec(RT = RT, resp = "lower", V = V, A = A, t0 = t0,
                           W = W, SV = SV, err_tol = err_tol,
                           times = 100, unit = "ns")
## results of rt_benchmark_ind are pre-loaded and not run here:
# bm_ind <- rt_benchmark_ind(RT = RT, resp = "lower", V = V, A = A, t0 = t0,
#                            W = W, SV = SV, err_tol = err_tol,
#                            times = 100, unit = "ns")

Analysis of Benchmark Results

The first step in analyzing the benchmark data is a rough visualization of the results in side-by-side violin plots that show the distribution of microbenchmark timings. We use the data that we just generated in the previous code chunk to plot a density violin for each method. Each density violin is overlaid with a boxplot showing the median and the first and third quartiles, in addition to a dashed line representing the mean. We also constrain the vertical axis of the plot to focus on the area where most of the fddm methods lie; depending on your system, this will likely hide the results from rtdists as they are typically much higher than that of fddm.

For this visualization we will use the benchmark data where the response times were handed over as a vector to each fddm call. As discussed above, running the benchmark on individual response times is considerably slower as the loop over the response times resides in R and not in C++.

library("reshape2")
library("ggplot2")

t_idx <- match("SV", colnames(bm_vec))
bm_vec[, -seq_len(t_idx)] <- bm_vec[, -seq_len(t_idx)]/1000 # convert to microseconds
mbm_vec <- melt(bm_vec, measure.vars = -seq_len(t_idx),
                variable.name = "FuncName", value.name = "time")

Names_vec <- c("fs_SWSE_17", "fs_SWSE_14",
               "fs_Gon_17", "fs_Gon_14", "fs_Nav_17", "fs_Nav_14",
               "fb_Gon_17", "fb_Gon_14", "fb_Nav_17", "fb_Nav_14",
               "fl_Nav_09", "RWiener", "Gondan", "rtdists")
Color_vec <- c("#9900cc", "#cc99ff",
               "#006699", "#66ccff", "#336600", "#33cc33",
               "#c2a500", "#d7db42", "#e68a00", "#ffb366",
               "#996633", "#ff9999", "#ff5050", "#990000")

mi <- min(bm_vec[, -seq_len(t_idx)])
ma <- max(bm_vec[, (t_idx+1):(ncol(bm_vec)-4)])

ggplot(mbm_vec, aes(x = FuncName, y = time,
                    color = factor(FuncName, levels = Names_vec),
                    fill = factor(FuncName, levels = Names_vec))) +
       geom_violin(trim = TRUE, alpha = 0.5) +
       scale_color_manual(values = Color_vec) +
       scale_fill_manual(values = Color_vec) +
       geom_boxplot(width = 0.15, fill = "white", alpha = 0.5) +
       stat_summary(fun = mean, geom = "errorbar",
                    aes(ymax = ..y.., ymin = ..y..),
                    width = .35, linetype = "dashed") +
       coord_cartesian(ylim = c(mi, ma)) +
       labs(title = "Distribution of median benchmark times",
            subtitle = "Dashed lines represent mean benchmark times",
            x = "Method", y = "Time (microseconds)",
            color = "Method") +
       theme_bw() +
       theme(panel.border = element_blank(),
             plot.title = element_text(size = 23),
             plot.subtitle = element_text(size = 16),
             axis.text.x = element_text(size = 16, angle = 90,
                                        vjust = 0.5, hjust = 1),
             axis.text.y = element_text(size = 16),
             axis.title.x = element_text(size = 20),
             axis.title.y = element_text(size = 20),
             legend.position = "none")

Depending on your system, the results might differ from the ones shown here, but they should replicate the general pattern. The small-time methods (“fs_”) have long tails due to very few outliers, but their central tendencies are rather fast. The methods combining small-time and large-time (“fb_”) show an almost normal distribution with only a short tail; they also have the fastest central tendencies. Of the fddm methods, the large-time approach (“fl_”) is the slowest with a long tail. The function from the Rwiener package combines small-time and large-time approach and also shows an almost normal distribution, but its distribution of median benchmark times is shifted higher compared to those of fddm. The other two methods are clearly slower; the function provided by Gondan, Blurton, and Kesselmeier (2014) is purely in R so it is constrained by the speed of R’s vectorization. Similarly constrained by R, the function from the rtdists package performs quite a few computations in R that make it very user-friendly, but at the cost of having the slowest central tendencies.

To visualize the problem areas for each method, we plot that method’s median benchmark times as a function of each individual parameter value; this will allow us to see where in the parameter space each method is efficient and where it is inefficient. In each of the following plots, we will vary one model parameter and show the mean of the median benchmark times, the 10% quantiles and 90% quantiles of the median benchmark times, as well as the minimum and maximum median benchmark times. For each value of that parameter, we aggregate the benchmark data for all values of the other parameter to generate the statistics plotted. For example, in a plot illustrating the effect of varying \(v\), the statistics plotted for the parameter value \(v = 0\) will include the benchmark data for the full ranges of \(rt\), \(a\), \(w\), and \(sv\). Note that because we also vary the response time in these plots, all of the following plots show the results based on the individual response times (i.e., rt_benchmark_ind). As this benchmark function records the evaluation time of approximating the density for only one response time in contrast to a vector of response times, we expect the typical benchmark times produced by rt_benchmark_ind to be faster than that of rt_benchmark_vec.

To avoid too many repetitious plots, we will only include nine of the methods from the previous plot that illustrate the different variations and implementations of the density functions. We include: all three small-time implementations in dfddm (“fs_”), the one large-time implementation of the large-time in dfddm (“fl_”), the two implementations in dfddm that combine the small-time and large-time (“fb_”), and the three methods from R packages and the literature as described above.

The following code chunk loads the pre-run benchmark data and prepares it for the subsequent plots that will be generated by the later code chunks.

# load data, will be in the variable 'bm_ind'
load(system.file("extdata", "bm_ind.Rds", package = "fddm", mustWork = TRUE))
bm_ind$RTAA <- bm_ind$RT / bm_ind$A / bm_ind$A
bm_ind <- bm_ind[, c(1, 2, ncol(bm_ind), 3:(ncol(bm_ind)-1)) ]

t_idx <- match("SV", colnames(bm_ind))
bm_ind[,-seq_len(t_idx)] <- bm_ind[, -seq_len(t_idx)]/1000 # convert to microseconds
mbm_ind <- melt(bm_ind, measure.vars = -seq_len(t_idx),
                variable.name = "FuncName", value.name = "time")

Names_meq <- c("fs_SWSE_17", "fs_Gon_17", "fs_Nav_17",
               "fl_Nav_09", "fb_Gon_17", "fb_Nav_17",
               "RWiener", "Gondan", "rtdists")
Color_meq <- c("#9900cc", "#006699", "#336600",
               "#996633", "#c2a500", "#e68a00",
               "#ff9999", "#ff5050", "#990000")
mbm_meq <- subset(mbm_ind, FuncName %in% Names_meq)

We first investigate how the response times affect the performance of each method. When calculating the density, the main culprit of longer evaluation times is the approximation of the infinite sum. Since the response time, \(rt\), is always scaled by the threshold separation, \(a\), in the infinite sum, we combine these model parameters and instead examine the “effective response time”, defined as \(\frac{rt}{a^2}\). This code chunk will generate a plot showing how varying the “effective response time” can affect the median benchmark time of the approximations.

ggplot(mbm_meq, aes(x = RTAA, y = time,
                    color = factor(FuncName, levels = Names_meq),
                    fill = factor(FuncName, levels = Names_meq))) +
  stat_summary(fun.min = min, fun.max = max,
               geom = "ribbon", color = NA, alpha = 0.1) +
  stat_summary(fun.min = function(z) { quantile(z, 0.1) },
               fun.max = function(z) { quantile(z, 0.9) },
               geom = "ribbon", color = NA, alpha = 0.2) +
  stat_summary(fun = mean, geom = "line") +
  scale_x_log10() +
  scale_color_manual(values = Color_meq) +
  scale_fill_manual(values = Color_meq) +
  labs(title = "Means of the median microbenchmark results",
       subtitle = "The shaded regions represent the 10% and 90% quantiles;\nThe lighter shaded regions represent the min and max times",
       x = bquote(frac(rt, a^2) ~ ", effective response time, " ~ log[10]),
       y = "Time (microseconds)") +
  theme_bw() +
  theme(panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 16,
                                     margin = margin(5,5,15,5, "pt")),
        axis.text.x = element_text(size = 16, angle = 90,
                                   vjust = 0.5, hjust = 1),
        axis.text.y = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        strip.text = element_text(size = 14),
        strip.background = element_rect(fill = "white"),
        legend.position = "none") +
  facet_wrap(~ factor(FuncName, levels = Names_meq), scales = "free_y")

This plot confirms that the aptly named small-time and large-time density functions are generally more efficient at calculating the density for small response times and large response times, respectively. Of the three methods used from the literature, the RWiener and rtdists functions use a switching mechanism between the small-time and large-time density functions, and the algorithm from Gondan, Blurton, and Kesselmeier (2014) only uses the small-time density function. Importantly, the methods that utilize a switching mechanism between the small-time and large-time density functions have much flatter plots and generally perform well across the \(\frac{rt}{a^2}\) parameter space.

Similarly to the previous plot, this code chunk plots the effect on the median benchmark time; but this time we will instead vary the model parameter \(w\), the relative starting point.

ggplot(mbm_meq, aes(x = W, y = time,
                    color = factor(FuncName, levels = Names_meq),
                    fill = factor(FuncName, levels = Names_meq))) +
  stat_summary(fun.min = min, fun.max = max,
               geom = "ribbon", color = NA, alpha = 0.1) +
  stat_summary(fun.min = function(z) { quantile(z, 0.1) },
               fun.max = function(z) { quantile(z, 0.9) },
               geom = "ribbon", color = NA, alpha = 0.2) +
  stat_summary(fun = mean, geom = "line") +
  scale_color_manual(values = Color_meq) +
  scale_fill_manual(values = Color_meq) +
  labs(title = "Means of the median microbenchmark results",
       subtitle = "The shaded regions represent the 10% and 90% quantiles;\nThe lighter shaded regions represent the min and max times",
       x = "w, relative starting point (a priori bias)",
       y = "Time (microseconds)") +
  theme_bw() +
  theme(panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 16,
                                     margin = margin(5,5,15,5, "pt")),
        axis.text.x = element_text(size = 16, angle = 90,
                                   vjust = 0.5, hjust = 1),
        axis.text.y = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        strip.text = element_text(size = 14),
        strip.background = element_rect(fill = "white"),
        legend.position = "none") +
  facet_wrap(~ factor(FuncName, levels = Names_meq), scales = "free_y")

The relative starting point \(w\) has very little impact on the infinite sum, and thus we see that the means and quantiles are relatively flat across the range of \(w\). However, we see a rather clear dip around 0.5 for the maximum times of many methods.

Here we plot the effects of varying the parameter \(v\), the drift rate, on the median benchmark time.

ggplot(mbm_meq, aes(x = V, y = time,
                    color = factor(FuncName, levels = Names_meq),
                    fill = factor(FuncName, levels = Names_meq))) +
  stat_summary(fun.min = min, fun.max = max,
               geom = "ribbon", color = NA, alpha = 0.1) +
  stat_summary(fun.min = function(z) { quantile(z, 0.1) },
               fun.max = function(z) { quantile(z, 0.9) },
               geom = "ribbon", color = NA, alpha = 0.2) +
  stat_summary(fun = mean, geom = "line") +
  scale_color_manual(values = Color_meq) +
  scale_fill_manual(values = Color_meq) +
  labs(title = "Means of the median microbenchmark results",
       subtitle = "The shaded regions represent the 10% and 90% quantiles;\nThe lighter shaded regions represent the min and max times",
       x = "v, drift rate",
       y = "Time (microseconds)") +
  theme_bw() +
  theme(panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 16,
                                     margin = margin(5,5,15,5, "pt")),
        axis.text.x = element_text(size = 16, angle = 90,
                                   vjust = 0.5, hjust = 1),
        axis.text.y = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        strip.text = element_text(size = 14),
        strip.background = element_rect(fill = "white"),
        legend.position = "none") +
  facet_wrap(~ factor(FuncName, levels = Names_meq), scales = "free_y")

The drift rate \(v\) has nothing to do with the infinite sum, which is why we see overall flat means across the range of \(v\). The only pattern we see is a small bump for \(v = 0\) of the small-time methods.

Lastly we plot the effects of varying \(sv\), the inter-trial variability in the drift rate, on the median benchmark time.

ggplot(mbm_meq, aes(x = SV, y = time,
                    color = factor(FuncName, levels = Names_meq),
                    fill = factor(FuncName, levels = Names_meq))) +
  stat_summary(fun.min = min, fun.max = max,
               geom = "ribbon", color = NA, alpha = 0.1) +
  stat_summary(fun.min = function(z) { quantile(z, 0.1) },
               fun.max = function(z) { quantile(z, 0.9) },
               geom = "ribbon", color = NA, alpha = 0.2) +
  stat_summary(fun = mean, geom = "line") +
  scale_color_manual(values = Color_meq) +
  scale_fill_manual(values = Color_meq) +
  labs(title = "Means of the median microbenchmark results",
       subtitle = "The shaded regions represent the 10% and 90% quantiles;\nThe lighter shaded regions represent the min and max times",
       x = "sv, inter-trial variability in the drift rate",
       y = "Time (microseconds)") +
  theme_bw() +
  theme(panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 16,
                                     margin = margin(5,5,15,5, "pt")),
        axis.text.x = element_text(size = 16, angle = 90,
                                   vjust = 0.5, hjust = 1),
        axis.text.y = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        strip.text = element_text(size = 14),
        strip.background = element_rect(fill = "white"),
        legend.position = "none") +
  facet_wrap(~ factor(FuncName, levels = Names_meq), scales = "free_y")

The inter-trial variability in the drift rate, \(sv\), affects the evaluation times similarly to its related parameter \(v\). Neither one of these parameters is integral to the approximation of the infinite sum, and that insignificance is shown by the flat means across the range of \(sv\).

Overall, these plots show the unsurprising conclusion that the small-time densities generally struggled for large “effective response times” and the large-time density is very inefficient for small “effective response times”. We see that the methods that combined the two density functions were noticeably more consistent across the entire range of “effective response times,” confirming that using a combination of both methods results in a very stable algorithm.

The other three figures vary \(w\), \(v\), and \(sv\), and do not reveal any meaningful patterns. Both \(v\) and \(sv\) have nothing to do with the infinite sum, and \(w\) only has a minimal role in the infinite sum; thus we do not expect any great contributions to slowdowns in evaluation times of either the small-time or large-time density functions.

Benchmarking Model Fitting to Real-World Data

Having shown the performance of each approximation to the Ratcliff DDM density functions, we now examine how these methods compare in practice. To determine how each method behaves in a practical setting, we will use each density approximation as the basis for fitting the DDM to real-world data. The primary analysis of the methods’ performance will come from benchmarking the time taken for the parameter estimation using the different methods. In addition, a secondary analysis will compare the performance of the different methods by using the number of function evaluations used by the optimizer during the optimization process.

The parameters that we will fit are: \(a\) (threshold separation), \(v\) (drift rate), \(t_0\) (non-decision time/response time constant), \(w\) (relative starting point), and \(sv\) (inter-trial-variability of drift). Since the example data we are using consists of two different item types that each require a different correct response (i.e., for one item type the correct response is mapped to the lower boundary and for the other item type the correct response is mapped to the upper boundary), the model includes two different versions of \(v\) (drift rate): \(v_\ell\) for fitting to the truthful lower boundary, and \(v_u\) for fitting to the truthful upper boundary. As many of the approximations in dfddm have different styles, we only test a subset of all the available methods in dfddm to avoid unnecessary testing. After using the various density function approximations in fitting the DDM to real-world data, we then validate that the produced parameter estimates are consistent across the various methods within a given error tolerance.

We will not test for accuracy or consistency in this vignette because that has already been covered in the Validity Vignette and the optimization is deterministic so it will return the same result given the same inital setup.

Generating Benchmark Data for Parameter Estimation

The following subsections will define all of the functions used to generate the fittings and store the benchmark results and provide the code to run the full fitting. Since running the full fitting for all of the individuals in the data takes a long time, we will forgo running the fitting in this vignette and instead read pre-fit parameter estimates that used the provided code.

To avoid too many repetitious plots, we will only include six of the methods from the previous section that illustrate the different variations and implementations of the density functions. We include: all three small-time implementations in dfddm, the two implementations in dfddm that combine the small-time and large-time, and the one method available in the rtdists package. We only use a selection of the methods available in dfddm to avoid redundancy in our testing since most of the methods have multiple sibling approximations that are extremely similar. Moreover, we do not require the RWiener package nor the Gondan raw R code because the associated density function approximations do not include an option for variability in drift rate. While we can convert the constant drift rate density to the variable drift rate density using a multiplicative factor, the densities are still potentially calculated incorrectly. For more details on the differences between the constant drift rate density functions and their variable drift rate counterparts, see the Math Vignette.

First we load the necessary packages.

library("fddm")
library("rtdists")
library("microbenchmark")

Log-Likelihood Functions

This code chunk defines the log-likelihood functions used in the optimization algorithm; all of the log-likelihood functions are the same as in the Validity Vignette. The log-likelihood functions are fairly straightforward and split the responses and associated response times by the true item status (i.e., the correct response) to enable fitting distinct drift rates (\(v_\ell\) for the items in which the correct response is the lower boundary and \(v_u\) for the items for which the correct response is the upper boundary). In addition, the log-likelihood functions heavily penalize any combination of parameters that returns a log-density of \(-\infty\) (equivalent to a regular density of \(0\)).

ll_fs_SWSE_17 <- function(pars, rt, resp, truth, err_tol) {
  rtu <- rt[truth == "upper"]
  rtl <- rt[truth == "lower"]
  respu <- resp[truth == "upper"]
  respl <- resp[truth == "lower"]

  # the truth is "upper" so use vu
  densu <- dfddm(rt = rtu, response = respu, a = pars[[3]], v = pars[[1]],
                 t0 = pars[[4]], w = pars[[5]], sv = pars[[6]], log = TRUE,
                 n_terms_small = "SWSE", summation_small = "2017",
                 scale = "small", err_tol = err_tol)
  # the truth is "lower" so use vl
  densl <- dfddm(rt = rtl, response = respl, a = pars[[3]], v = pars[[2]],
                 t0 = pars[[4]], w = pars[[5]], sv = pars[[6]], log = TRUE,
                 n_terms_small = "SWSE", summation_small = "2017",
                 scale = "small", err_tol = err_tol)

  densities <- c(densu, densl)
  if (any(!is.finite(densities))) return(1e6)
  return(-sum(densities))
}

ll_fs_Gon_17 <- function(pars, rt, resp, truth, err_tol) {
  rtu <- rt[truth == "upper"]
  rtl <- rt[truth == "lower"]
  respu <- resp[truth == "upper"]
  respl <- resp[truth == "lower"]

  # the truth is "upper" so use vu
  densu <- dfddm(rt = rtu, response = respu, a = pars[[3]], v = pars[[1]],
                 t0 = pars[[4]], w = pars[[5]], sv = pars[[6]], log = TRUE,
                 n_terms_small = "Gondan", summation_small = "2017",
                 scale = "small", err_tol = err_tol)
  # the truth is "lower" so use vl
  densl <- dfddm(rt = rtl, response = respl, a = pars[[3]], v = pars[[2]],
                 t0 = pars[[4]], w = pars[[5]], sv = pars[[6]], log = TRUE,
                 n_terms_small = "Gondan", summation_small = "2017",
                 scale = "small", err_tol = err_tol)

  densities <- c(densu, densl)
  if (any(!is.finite(densities))) return(1e6)
  return(-sum(densities))
}

ll_fs_Nav_17 <- function(pars, rt, resp, truth, err_tol) {
  rtu <- rt[truth == "upper"]
  rtl <- rt[truth == "lower"]
  respu <- resp[truth == "upper"]
  respl <- resp[truth == "lower"]

  # the truth is "upper" so use vu
  densu <- dfddm(rt = rtu, response = respu, a = pars[[3]], v = pars[[1]],
                 t0 = pars[[4]], w = pars[[5]], sv = pars[[6]], log = TRUE,
                 n_terms_small = "Navarro", summation_small = "2017",
                 scale = "small", err_tol = err_tol)
  # the truth is "lower" so use vl
  densl <- dfddm(rt = rtl, response = respl, a = pars[[3]], v = pars[[2]],
                 t0 = pars[[4]], w = pars[[5]], sv = pars[[6]], log = TRUE,
                 n_terms_small = "Navarro", summation_small = "2017",
                 scale = "small", err_tol = err_tol)

  densities <- c(densu, densl)
  if (any(!is.finite(densities))) return(1e6)
  return(-sum(densities))
}

ll_fb_Gon_17 <- function(pars, rt, resp, truth, err_tol) {
  rtu <- rt[truth == "upper"]
  rtl <- rt[truth == "lower"]
  respu <- resp[truth == "upper"]
  respl <- resp[truth == "lower"]

  # the truth is "upper" so use vu
  densu <- dfddm(rt = rtu, response = respu, a = pars[[3]], v = pars[[1]],
                 t0 = pars[[4]], w = pars[[5]], sv = pars[[6]], log = TRUE,
                 n_terms_small = "Gondan", summation_small = "2017",
                 scale = "both", err_tol = err_tol)
  # the truth is "lower" so use vl
  densl <- dfddm(rt = rtl, response = respl, a = pars[[3]], v = pars[[2]],
                 t0 = pars[[4]], w = pars[[5]], sv = pars[[6]], log = TRUE,
                 n_terms_small = "Gondan", summation_small = "2017",
                 scale = "both", err_tol = err_tol)

  densities <- c(densu, densl)
  if (any(!is.finite(densities))) return(1e6)
  return(-sum(densities))
}

ll_fb_Nav_17 <- function(pars, rt, resp, truth, err_tol) {
  rtu <- rt[truth == "upper"]
  rtl <- rt[truth == "lower"]
  respu <- resp[truth == "upper"]
  respl <- resp[truth == "lower"]

  # the truth is "upper" so use vu
  densu <- dfddm(rt = rtu, response = respu, a = pars[[3]], v = pars[[1]],
                 t0 = pars[[4]], w = pars[[5]], sv = pars[[6]], log = TRUE,
                 n_terms_small = "Navarro", summation_small = "2017",
                 scale = "both", err_tol = err_tol)
  # the truth is "lower" so use vl
  densl <- dfddm(rt = rtl, response = respl, a = pars[[3]], v = pars[[2]],
                 t0 = pars[[4]], w = pars[[5]], sv = pars[[6]], log = TRUE,
                 n_terms_small = "Navarro", summation_small = "2017",
                 scale = "both", err_tol = err_tol)

  densities <- c(densu, densl)
  if (any(!is.finite(densities))) return(1e6)
  return(-sum(densities))
}

ll_RTDists <- function(pars, rt, resp, truth) {
  rtu <- rt[truth == "upper"]
  rtl <- rt[truth == "lower"]
  respu <- resp[truth == "upper"]
  respl <- resp[truth == "lower"]

  # the truth is "upper" so use vu
  densu <- ddiffusion(rtu, respu, a = pars[[3]], v = pars[[1]],
                      z = pars[[5]]*pars[[3]], t0 = pars[[4]], sv = pars[[6]])
  # the truth is "lower" so use vl
  densl <- ddiffusion(rtl, respl, a = pars[[3]], v = pars[[2]],
                      z = pars[[5]]*pars[[3]], t0 = pars[[4]], sv = pars[[6]])

  densities <- c(densu, densl)
  if (any(densities <= 0)) return(1e6)
  return(-sum(log(densities)))
}

Fitting Function

We use a different set of initial parameter values for each combination of data set and method. In a real-life situation we would probably do so using random initial parameter values to avoid local minima. Here we do this with a fixed set of initial parameter values to ensure that the different method produce the same results for different starting points. However, there are a couple of restrictions to these initial values that we must address (we would also have to do so if we picked the initial values randomly). The parameter \(t_0\) must be less than the response time, so we set the initial values for \(t_0\) to be strictly less than the minimum response time according to each individual in the input data.frame.

Furthermore, we must place a lower bound on \(a\) because the optimization algorithm occasionally evaluates the log-likelihood functions (and thus the underlying density function approximations) using values of \(a\) that are very close to zero in an effort to fully explore the parameter space. In these cases, the small-time approximations struggle because \(a\) inversely scales the input response time and effectively changes the input response time into a very large one. In particular, an extremely small value of \(a\) can cause the “SWSE” small-time density approximation to run for a very long time; to prevent this, we bound \(a\) below by \(0.05\) as it is a very small value such that anything less would be of no psychological significance to the model. In common use this is not an issue because very small values of \(a\) do not make any sense with regard to the psychological interpretation of the parameter, but this issue can arise in an exploratory optimization environment.

The following code chunk defines the function that will run the optimization and produce the fitted parameter estimates for \(v_u\), \(v_\ell\), \(a\), \(t_0\), \(w\), and \(sv\). As discussed above, we only use a selection of methods from those available in dfddm. This fitting function will run the optimization for each set of initial parameter values and store the following results from the optimization: 1) resulting convergence code (either \(0\) indicating successful convergence or \(1\) indicating unsuccessful convergence); 2) minimized value of the objective log-likelihood function; 3) the number of iterations used by the optimizer during the optimization process; 4) the number of calls to the log-likelihood function during the optimization process; and 5) the median benchmark time of the optimization using each method.

As in the previous section, we run the microbenchmark function multiple times for each approximation at each point in this parameter space and save the median of these benchmark times. However, we only use a smaller number of repetitions here because a full optimization run entails several calls to the density function. Using the median of these data should reduce the number of outliers in the data while maintaining a truthful representation of the approximations’ performance. We will be using the microbenchmark package to time the evaluations.

rt_fit <- function(data, id_idx = NULL, rt_idx = NULL, response_idx = NULL,
                   truth_idx = NULL, response_upper = NULL, err_tol = 1e-6,
                   times = 100, unit = "ns") {

  # Format data for fitting
  if (all(is.null(id_idx), is.null(rt_idx), is.null(response_idx),
      is.null(truth_idx), is.null(response_upper))) {
    df <- data # assume input data is already formatted
  } else {
    if(any(data[,rt_idx] < 0)) {
      stop("Input data contains negative response times; fit will not be run.")
    }
    if(any(is.na(data[,response_idx]))) {
      stop("Input data contains invalid responses (NA); fit will not be run.")
    }

    nr <- nrow(data)
    df <- data.frame(id = character(nr),
                     rt = double(nr),
                     response = character(nr),
                     truth = character(nr),
                     stringsAsFactors = FALSE)

    if (!is.null(id_idx)) { # relabel identification tags
      for (i in 1:length(id_idx)) {
        idi <- unique(data[,id_idx[i]])
        for (j in 1:length(idi)) {
          df$id[data[,id_idx[i]] == idi[j]] <- paste(df$id[data[,id_idx[i]] == idi[j]], idi[j], sep = " ")
        }
      }
      df$id <- trimws(df$id, which = "left")
    }

    df$rt <- as.double(data[,rt_idx])

    df$response <- "lower"
    df$response[data[,response_idx] == response_upper] <- "upper"

    df$truth <- "lower"
    df$truth[data[,truth_idx] == response_upper] <- "upper"
  }

  # Preliminaries
  ids <- unique(df$id)
  nids <- max(length(ids), 1) # if inds is null, there is only one individual

  init_vals <- data.frame(v1 = c( 0,  10, -.5,  0,  0,  0,  0,  0,  0,   0,  0),
                          v0 = c( 0, -10,  .5,  0,  0,  0,  0,  0,  0,   0,  0),
                          a  = c( 1,   1,   1, .5,  5,  1,  1,  1,  1,   1,  1),
                          t0 = c( 0,   0,   0,  0,  0,  0,  0,  0,  0,   0,  0),
                          w  = c(.5,  .5,  .5, .5, .5, .5, .5, .2, .8,  .5, .5),
                          sv = c( 1,   1,   1,  1,  1,  1,  1,  1,  1, .05,  5))
  ninit_vals <- nrow(init_vals)

  algo_names <- c("fs_SWSE_17", "fs_Gon_17", "fs_Nav_17",
                  "fb_Gon_17", "fb_Nav_17","rtdists")
  nalgos <- length(algo_names)
  ni <- nalgos*ninit_vals

  # Initilize the result dataframe
  cnames <- c("ID", "Algorithm", "Convergence", "Objective", "Iterations",
              "FuncEvals", "BmTime")
  res <- data.frame(matrix(ncol = length(cnames), nrow = nids*ninit_vals*nalgos))
  colnames(res) <- cnames

  # label the result dataframe
  algo_col <- c()
  for (n in 1:nalgos) {
    algo_col <- c(algo_col, rep(algo_names[n], ninit_vals))
  }
  res$Algorithm <- algo_col # label algorithms

  # Loop through each individual
  for (i in 1:nids) {
    # extract data for id i
    dfi <- df[df$id == ids[i],]
    rti <- dfi$rt
    respi <- dfi$response
    truthi <- dfi$truth

    # starting value for t0 must be smaller than the smallest rt
    min_rti <- min(rti)
    t0_lo <- 0.01*min_rti
    t0_me <- 0.50*min_rti
    t0_hi <- 0.99*min_rti

    # label the result dataframe
    res$ID[((i-1)*ni+1):(i*ni)] <- ids[i] # label individuals

    # loop through all of the starting values
    for (j in 1:ninit_vals) {
      # get number of evaluations
      temp <- nlminb(init_vals[j,], ll_fs_SWSE_17, err_tol = err_tol,
                     rt = rti, resp = respi, truth = truthi,
                     # limits:   vu,   vl,   a,  t0, w,  sv
                     lower = c(-Inf, -Inf, .05,   0, 0,   0),
                     upper = c( Inf,  Inf, Inf, Inf, 1, Inf))
      res$Convergence[(i-1)*ni+0*ninit_vals+j] <- temp$convergence
      res$Objective[(i-1)*ni+0*ninit_vals+j] <- temp$objective
      res$Iterations[(i-1)*ni+0*ninit_vals+j] <- temp$iterations
      res$FuncEvals[(i-1)*ni+0*ninit_vals+j] <- temp$evaluations[[1]]

      temp <- nlminb(init_vals[j,], ll_fs_Gon_17, err_tol = err_tol,
                     rt = rti, resp = respi, truth = truthi,
                     # limits:   vu,   vl,   a,  t0, w,  sv
                     lower = c(-Inf, -Inf, .05,   0, 0,   0),
                     upper = c( Inf,  Inf, Inf, Inf, 1, Inf))
      res$Convergence[(i-1)*ni+1*ninit_vals+j] <- temp$convergence
      res$Objective[(i-1)*ni+1*ninit_vals+j] <- temp$objective
      res$Iterations[(i-1)*ni+1*ninit_vals+j] <- temp$iterations
      res$FuncEvals[(i-1)*ni+1*ninit_vals+j] <- temp$evaluations[[1]]

      temp <- nlminb(init_vals[j,], ll_fs_Nav_17, err_tol = err_tol,
                     rt = rti, resp = respi, truth = truthi,
                     # limits:   vu,   vl,   a,  t0, w,  sv
                     lower = c(-Inf, -Inf, .05,   0, 0,   0),
                     upper = c( Inf,  Inf, Inf, Inf, 1, Inf))
      res$Convergence[(i-1)*ni+2*ninit_vals+j] <- temp$convergence
      res$Objective[(i-1)*ni+2*ninit_vals+j] <- temp$objective
      res$Iterations[(i-1)*ni+2*ninit_vals+j] <- temp$iterations
      res$FuncEvals[(i-1)*ni+2*ninit_vals+j] <- temp$evaluations[[1]]

      temp <- nlminb(init_vals[j,], ll_fb_Gon_17, err_tol = err_tol,
                     rt = rti, resp = respi, truth = truthi,
                     # limits:   vu,   vl,   a,  t0, w,  sv
                     lower = c(-Inf, -Inf, .05,   0, 0,   0),
                     upper = c( Inf,  Inf, Inf, Inf, 1, Inf))
      res$Convergence[(i-1)*ni+3*ninit_vals+j] <- temp$convergence
      res$Objective[(i-1)*ni+3*ninit_vals+j] <- temp$objective
      res$Iterations[(i-1)*ni+3*ninit_vals+j] <- temp$iterations
      res$FuncEvals[(i-1)*ni+3*ninit_vals+j] <- temp$evaluations[[1]]

      temp <- nlminb(init_vals[j,], ll_fb_Nav_17, err_tol = err_tol,
                     rt = rti, resp = respi, truth = truthi,
                     # limits:   vu,   vl,   a,  t0, w,  sv
                     lower = c(-Inf, -Inf, .05,   0, 0,   0),
                     upper = c( Inf,  Inf, Inf, Inf, 1, Inf))
      res$Convergence[(i-1)*ni+4*ninit_vals+j] <- temp$convergence
      res$Objective[(i-1)*ni+4*ninit_vals+j] <- temp$objective
      res$Iterations[(i-1)*ni+4*ninit_vals+j] <- temp$iterations
      res$FuncEvals[(i-1)*ni+4*ninit_vals+j] <- temp$evaluations[[1]]

      temp <- nlminb(init_vals[j,], ll_RTDists,
                     rt = rti, resp = respi, truth = truthi,
                     # limits:   vu,   vl,   a,  t0, w,  sv
                     lower = c(-Inf, -Inf, .05,   0, 0,   0),
                     upper = c( Inf,  Inf, Inf, Inf, 1, Inf))
      res$Convergence[(i-1)*ni+5*ninit_vals+j] <- temp$convergence
      res$Objective[(i-1)*ni+5*ninit_vals+j] <- temp$objective
      res$Iterations[(i-1)*ni+5*ninit_vals+j] <- temp$iterations
      res$FuncEvals[(i-1)*ni+5*ninit_vals+j] <- temp$evaluations[[1]]

      # microbenchmark
      mbm <- microbenchmark(
        fs_SWSE_17 = nlminb(init_vals[j,], ll_fs_SWSE_17, err_tol = err_tol,
                            rt = rti, resp = respi, truth = truthi,
                            # limits:   vu,   vl,   a,  t0, w,  sv
                            lower = c(-Inf, -Inf, .05,   0, 0,   0),
                            upper = c( Inf,  Inf, Inf, Inf, 1, Inf)),
        fs_Gon_17 = nlminb(init_vals[j,], ll_fs_Gon_17, err_tol = err_tol,
                           rt = rti, resp = respi, truth = truthi,
                           # limits:   vu,   vl,   a,  t0, w,  sv
                           lower = c(-Inf, -Inf, .05,   0, 0,   0),
                           upper = c( Inf,  Inf, Inf, Inf, 1, Inf)),
        fs_Nav_17 = nlminb(init_vals[j,], ll_fs_Nav_17, err_tol = err_tol,
                           rt = rti, resp = respi, truth = truthi,
                           # limits:   vu,   vl,   a,  t0, w,  sv
                           lower = c(-Inf, -Inf, .05,   0, 0,   0),
                           upper = c( Inf,  Inf, Inf, Inf, 1, Inf)),
        fb_Gon_17 = nlminb(init_vals[j,], ll_fb_Gon_17, err_tol = err_tol,
                           rt = rti, resp = respi, truth = truthi,
                           # limits:   vu,   vl,   a,  t0, w,  sv
                           lower = c(-Inf, -Inf, .05,   0, 0,   0),
                           upper = c( Inf,  Inf, Inf, Inf, 1, Inf)),
        fb_Nav_17 = nlminb(init_vals[j,], ll_fs_Nav_17, err_tol = err_tol,
                           rt = rti, resp = respi, truth = truthi,
                           # limits:   vu,   vl,   a,  t0, w,  sv
                           lower = c(-Inf, -Inf, .05,   0, 0,   0),
                           upper = c( Inf,  Inf, Inf, Inf, 1, Inf)),
        rtdists = nlminb(init_vals[j,], ll_RTDists,
                         rt = rti, resp = respi, truth = truthi,
                         # limits:   vu,   vl,   a,  t0, w,  sv
                         lower = c(-Inf, -Inf, .05,   0, 0,   0),
                         upper = c( Inf,  Inf, Inf, Inf, 1, Inf)),
        times = times, unit = unit
      )
      for (k in 1:nalgos) {
        res$BmTime[(i-1)*ni+(k-1)*ninit_vals+j] <- median(mbm[mbm$expr == algo_names[k], 2])
      }
    }
  }
  return(res)
}

Running the Fitting

As an example data set, we use the med_dec data that comes with fddm. This data contains the accuracy condition reported in Trueblood et al. (2018) investigating medical decision making among medical professionals (pathologists) and novices (i.e., undergraduate students). The task of participants was to judge whether pictures of blood cells show cancerous cells (i.e., blast cells) or non-cancerous cells (i.e., non-blast cells). The data set contains 200 decision per participant, based on pictures of 100 true cancerous cells and pictures of 100 true non-cancerous cells. We remove the trials without response (indicated by rt < 0 in the data) before fitting.

Having set up the fitting functions in the above chunks of code, we could pass the med_dec data to this function to get the parameter estimates. However, as this takes a relatively long time (around 30 minutes) we skip the fitting in this vignette and instead read the pre-fit parameter estimates in the next section.

data(med_dec, package = "fddm")
med_dec <- med_dec[which(med_dec$rt >= 0),]
fit <- rt_fit(med_dec, id_idx = c(2,1), rt_idx = 8, response_idx = 7,
              truth_idx = 5, response_upper = "blast", err_tol = 1e-6,
              times = 6, unit = "ns")

Analysis of Benchmark Results

We begin by loading the required packages for plotting the results.

library("reshape2")
library("ggplot2")

Before visualizing the results of the data fitting benchmark tests, we need to provide some context for each set of parameter estimates since each run of the optimization does not necessarily converge. To add this context, we first find the minimum value of the objective function (i.e., the log-likelihood function) attained for each set of initial parameter values for each individual. Then we calculate how the objective value of each optimization run compares to that minimum; we will refer to this quantity as the objective difference. The following code chunk defines a function to perform this calculation and some other minor operations that will be helpful when plotting the results. This code chunk also applies this prep function to the pre-run benchmark data and defines some constants that we will use throughout the visualizations.

fit_prep <- function(fit) {
  nr <- nrow(fit)
  fit$Obj_diff <- rep(0, nr)

  ids <- unique(fit$ID)
  nids <- length(ids)
  algos <- unique(fit$Algorithm)
  nalgos <- length(algos)

  ninit <- nrow(subset(fit, ID == ids[1] & Algorithm == algos[1]))
  for (i in 1:nids) {
    for (j in 1:ninit) {
      min_obj <- min(fit[seq((i-1)*ninit*nalgos+j, i*ninit*nalgos, by = ninit), 4])
      for (k in 0:(nalgos-1)) {
        fit[(i-1)*(ninit*nalgos) + k*ninit + j, 8] <-
          fit[(i-1)*(ninit*nalgos) + k*ninit + j, 4] - min_obj

      }
    }
  }

  fit$BmTime <- fit$BmTime/1e6 # convert to milliseconds

  eps <- 1e-4
  
  fit[fit$Obj_diff < eps, "Obj_diff"] <- 0
  fit[(fit$Obj_diff >= eps & fit$Obj_diff <= 2), "Obj_diff"] <- 1
  fit[fit$Obj_diff > 2, "Obj_diff"] <- 3
  
  fit$Convergence <- ifelse(fit$Convergence < 1, 0, 1)

  return(fit)
}

# load data, will be in the variable 'fit'
load(system.file("extdata", "bm_fit.Rds", package = "fddm", mustWork = TRUE))
fit <- fit_prep(fit)

Names <- c("fs_SWSE_17", "fs_Gon_17", "fs_Nav_17",
           "fb_Gon_17", "fb_Nav_17", "rtdists")
Color <- c("#9900cc", "#006699", "#336600",
           "#c2a500", "#e68a00", "#990000")
Shape <- c(21, 25)
Sizes <- c(0, 3, 3)
Stroke <- c(0, 1, 1)
Fills <- c("#ffffff00", "#ffffff00", "#80808099")

We will now present two plots to illustrate how the various methods perform when used in fitting Ratcliff’s DDM to real-world data. The measures of interest for each method are the median microbenchmark timings and the number of function evaluations in the optimization process. Each one of the two plots will focus on one of these measures by showing a distribution of values of this measure in terms of violin plots and box plots. The distribution shown is the distribution across participants and starting values for each of the different methods of approximating the diffusion model; that is, for each of the 55 participants, we present results based on the \(11\) different starting values for each method (the starting values are constant across methods).

To help explain problematic fits, the plots will show any data point that represents an optimization run for which the objective difference is greater than our predefined tolerance of \(\epsilon = 0.0001\). More specifically, the objective difference is comparatively small (i.e., smaller than 2 on the log-likelihood scale)for the points shown with a completely transparent fill color; contrastingly, points shown with a gray fill color have a difference from the optimum of larger than 2 points on the log-likelihood scale (2 points on the log-likelihood scale corresponds to 1 point on the deviance scale). Data points with objective difference smaller than \(\epsilon\) will not be shown on the plot. In addition, each of these visible data points will have a shape to indicate its convergence code: a round point for successful convergence, and a triangle for failed convergence.

The first plot will show the densities of the microbenchmark timings for each method. Note that we have zoomed in and have removed one outlier time from the “fs_SWSE_17” method which takes around 330 milliseconds to finish and did not converge successfully on the maximum-likelihood estimate.

fit_mbm <- melt(fit, id.vars = c("Algorithm", "Convergence", "Obj_diff"),
                measure.vars = "BmTime", value.name = "BmTime")[,-4]
ma <- max(fit_mbm[fit_mbm$Algorithm != "fs_SWSE_17", "BmTime"]) # weird point

ggplot(fit_mbm, aes(x = factor(Algorithm, levels = Names),
                    y = BmTime)) +
  geom_violin(trim = TRUE, alpha = 0.5,
              aes(color = factor(Algorithm, levels = Names),
                  fill = factor(Algorithm, levels = Names))) +
  geom_boxplot(width = 0.2, outlier.shape = NA,
               fill = "white", alpha = 0.4,
               aes(color = factor(Algorithm, levels = Names))) +
  stat_summary(fun = mean, geom = "errorbar",
               aes(ymax = ..y.., ymin = ..y..),
               width = .5, linetype = "dashed",
               color = Color) +
  scale_color_manual(values = Color, guide = FALSE) +
  scale_shape_manual(values = Shape,
                     name = "Convergence Code",
                     breaks = c(0, 1),
                     labels = c("Success", "Failure")) +
  scale_size_manual(values = Sizes, guide = FALSE) +
  scale_discrete_manual(aesthetics = "stroke", values = Stroke, guide = FALSE) +
  scale_fill_manual(values = Color, guide = FALSE) +
  ggnewscale::new_scale_fill() +
  scale_fill_manual(values = Fills,
                    name = "Objective Difference",
                    breaks = c(1, 2, 3),
                    labels = c("< 2", "NA", "> 2")) +
  geom_point(aes(color = factor(Algorithm, levels = Names),
                 shape = factor(Convergence, levels = c(0, 1)),
                 size = factor(Obj_diff, levels = c(0, 1, 3)),
                 stroke = factor(Obj_diff, levels = c(0, 1, 3)),
                 fill = factor(Obj_diff, levels = c(0, 1, 3)))) +
  coord_cartesian(ylim = c(0, ma)) +
  labs(title = "Median microbenchmark times for data fitting",
       subtitle = paste("Dashed lines represent mean benchmark times;",
                        "Visible points have an objective difference greater than 1e-4",
                        sep = "\n"),
       x = "Method", y = "Time (milliseconds)") +
  guides(shape = guide_legend(order = 1,
                              override.aes = list(size = Sizes[c(2, 3)])),
         fill = guide_legend(order = 2,
                             override.aes = list(size = Sizes[c(2, 3)],
                                                 shape = c(21, 21),
                                                 fill = Fills[c(2, 3)]))) +
  theme_bw() +
  theme(panel.border = element_blank(),
        plot.title = element_text(size = 23),
        plot.subtitle = element_text(size = 16,
                                     margin = margin(5, 5, 10, 5, "pt")),
        axis.text.x = element_text(size = 16, angle = 90,
                                   vjust = 0.5, hjust = 1),
        axis.text.y = element_text(size = 16),
        axis.title.x = element_text(size = 20),
        axis.title.y = element_text(size = 20,
                                    margin = margin(5, 10, 5, 5, "pt")),
        legend.position = "right",
        legend.box = "vertical",
        legend.direction = "vertical",
        legend.background = element_rect(fill = "transparent"),
        legend.title = element_text(size = 14),
        legend.text = element_text(size = 13))

This plot is something of an exaggerated version of the microbenchmark violin plot from the previous section. Immediately we observe that fitting using the rtdists function is noticeably slower than fitting using the functions provided in fddm. Upon closer inspection, the means and medians of the benchmark times are shorter for the two small-time algorithms “fs_SWSE_17” and “fs_Gon_17” than for the other small-time algorithm or either of the algorithms that use both the small-time and large-time density functions. However, these two seemingly faster algorithms not only have much longer tails in their fitting times but also have more problematic data points in the optimization process. These longer tails and misfit points are likely due to the shortcomings of the approximation to the small-time density function, namely the poor support for large “effective response times” that we discussed in a previous section of this vignette. We also see that the two combined algorithms “fb_” both only show one case where the algorithm did not reach the optimum for one participant and starting value, but are still quite fast. One fitting run takes on average less than 50 milliseconds on the system used for running the tests.

The second plot will show the distribution of the number of function evaluations in the optimization process for each method.

fit_fev <- melt(fit, id.vars = c("Algorithm", "Convergence", "Obj_diff"),
                measure.vars = "FuncEvals", value.name = "FuncEvals")[,-4]

ggplot(fit_fev, aes(x = factor(Algorithm, levels = Names),
                    y = FuncEvals)) +
  geom_violin(trim = TRUE, alpha = 0.5,
              aes(color = factor(Algorithm, levels = Names),
                  fill = factor(Algorithm, levels = Names))) +
  geom_boxplot(width = 0.2, outlier.shape = NA,
               fill = "white", alpha = 0.4,
               aes(color = factor(Algorithm, levels = Names))) +
  stat_summary(fun = mean, geom = "errorbar",
               aes(ymax = ..y.., ymin = ..y..),
               width = .5, linetype = "dashed",
               color = Color) +
  scale_color_manual(values = Color, guide = FALSE) +
  scale_shape_manual(values = Shape,
                     name = "Convergence Code",
                     breaks = c(0, 1),
                     labels = c("Success", "Failure")) +
  scale_size_manual(values = Sizes, guide = FALSE) +
  scale_discrete_manual(aesthetics = "stroke", values = Stroke, guide = FALSE) +
  scale_fill_manual(values = Color, guide = FALSE) +
  ggnewscale::new_scale_fill() +
  scale_fill_manual(values = Fills,
                    name = "Objective Difference",
                    breaks = c(1, 2, 3),
                    labels = c("< 2", "NA", "> 2")) +
  geom_point(aes(color = factor(Algorithm, levels = Names),
                 shape = factor(Convergence, levels = c(0, 1)),
                 size = factor(Obj_diff, levels = c(0, 1, 3)),
                 stroke = factor(Obj_diff, levels = c(0, 1, 3)),
                 fill = factor(Obj_diff, levels = c(0, 1, 3)))) +
  coord_cartesian(ylim = c(0, ma)) +
  labs(title = "Function evaluations for data fitting",
       subtitle = paste("Dashed lines represent mean benchmark times;",
                        "Visible points have an objective difference greater than 1e-4",
                        sep = "\n"),
       x = "Method", y = "Number of function evaluations") +
  guides(shape = guide_legend(order = 1,
                              override.aes = list(size = Sizes[c(2, 3)])),
         fill = guide_legend(order = 2,
                             override.aes = list(size = Sizes[c(2, 3)],
                                                 shape = c(21, 21),
                                                 fill = Fills[c(2, 3)]))) +
  theme_bw() +
  theme(panel.border = element_blank(),
        plot.title = element_text(size = 23),
        plot.subtitle = element_text(size = 16,
                                     margin = margin(5, 5, 10, 5, "pt")),
        axis.text.x = element_text(size = 16, angle = 90,
                                   vjust = 0.5, hjust = 1),
        axis.text.y = element_text(size = 16),
        axis.title.x = element_text(size = 20),
        axis.title.y = element_text(size = 20,
                                    margin = margin(5, 10, 5, 5, "pt")),
        legend.position = "right",
        legend.box = "vertical",
        legend.direction = "vertical",
        legend.background = element_rect(fill = "transparent"),
        legend.title = element_text(size = 14),
        legend.text = element_text(size = 13))

Here we see that the algorithms that implement both the small-time and large-time density functions appear to be the most stable. They have relatively short tails, suggesting that the optimization routine has little trouble in finding the optimum parameter estimates. Moreover, these algorithms have few data points where the objective difference is high or the convergence code indicates unsuccessful convergence.

Overall the default method for fddm, “fb_Gon_17”, appears to be the most generally stable method for fitting the DDM to real-world data. The optimization routine handles this algorithm consistently well in that there are no troublesome areas of the parameter space as there are when using the algorithms that only implement the small-time density function. The default method also outperforms its counterpart, “fb_Nav_17”, because it is less reliant on the rather unstable approximation to the large-time density function. When comparing the algorithms for benchmark time, the two small-time algorithms have a faster mean and median than the default method, but their lengthy tails indicate potential stability issues in certain areas of the parameter space. On balance, the default method, “fb_Gon_17”, provides the best compromise between speed and stability of the currently available algorithms for approximating the probability density functions of the diffusion decision model. For more information about the default method, see its section in the Math Vignette.

References

Gondan, Matthias, Steven P Blurton, and Miriam Kesselmeier. 2014. “Even Faster and Even More Accurate First-Passage Time Densities and Distributions for the Wiener Diffusion Model.” Journal of Mathematical Psychology 60: 20–22.

Trueblood, Jennifer S., William R. Holmes, Adam C. Seegmiller, Jonathan Douds, Margaret Compton, Eszter Szentirmai, Megan Woodruff, Wenrui Huang, Charles Stratton, and Quentin Eichbaum. 2018. “The Impact of Speed and Bias on the Cognitive Processes of Experts and Novices in Medical Image Decision-Making.” Cognitive Research: Principles and Implications 3 (1): 28. https://doi.org/10.1186/s41235-018-0119-2.

Benchmark Testing

Kendal Foster and Henrik Singmann

July 08, 2020

Introduction

Benchmarking the Density Functions

Generating Benchmark Data

Analysis of Benchmark Results

Benchmarking Model Fitting to Real-World Data

Generating Benchmark Data for Parameter Estimation

Log-Likelihood Functions

Fitting Function

Running the Fitting

Analysis of Benchmark Results

R Session Info

References