jrt
This package provides user-friendly functions designed for the easy implementation of Item-Response Theory (IRT) models and scoring with judgment data. Although it can be used in a variety of contexts, the original motivation for implementation is to facilitate use for creativity researchers.
jrt
is not an estimation package, it provides wrapper functions that call estimation packages and extract/report/plot information from them. At this stage, jrt
uses the (excellent) package mirt
(Chalmers, 2012) as its only IRT engine. Thus, if you use jrt
for your research, please ensure to cite mirt
(https://www.jstatsoft.org/article/view/v048i06) as the estimation package/engine.
We also encourage that you cite jrt
– especially if you use the plots or the automatic model selection. Currently, this would be done with:
Ok now let’s get started…
Then, a judgment data.frame
would be provided to the function jrt
. Here we’ll use the simulated one in jrt::ratings
.
data <- jrt::ratings
It looks like this:
head(data)
#> Judge_1 Judge_2 Judge_3 Judge_4 Judge_5 Judge_6
#> 1 5 4 3 4 4 4
#> 2 3 3 2 3 2 2
#> 3 3 3 3 3 3 2
#> 4 3 2 2 3 4 2
#> 5 2 3 1 2 2 1
#> 6 3 2 2 3 2 1
jrt
is in development and these features will hopefully appear soon (check back !), but in this release:
I know, that’s a lot that you can’t do…but this covers the typical cases, at least for the Consensual Assessment Technique – which is why it was originally created.
jrt()
You will first want to first load the library.
library(jrt)
#> Loading required package: directlabels
The main function of the jrt
package is jrt()
. By default, this function will:
@factor.scores
(or @output.data
) slot of the jrt
object.Let’s do it!
fit
) to do more after. Note: There’s a progress bar by default, but it takes space in the vignette, so I’ll remove it here with progress.bar = F
.fit <- jrt(data, progress.bar = F)
#> The possible responses detected are: 1-2-3-4-5
#>
#> -== Model Selection (6 judges) ==-
#> AICc for Rating Scale Model: 4414.924 | Model weight: 0.000
#> AICc for Generalized Rating Scale Model: 4370.699 | Model weight: 0.000
#> AICc for Partial Credit Model: 4027.701 | Model weight: 0.000
#> AICc for Generalized Partial Credit Model: 4021.567 | Model weight: 0.000
#> AICc for Constrained Graded Rating Scale Model: 4400.553 | Model weight: 0.000
#> AICc for Graded Rating Scale Model: 4310.307 | Model weight: 0.000
#> AICc for Constrained Graded Response Model: 4003.993 | Model weight: 0.859
#> AICc for Graded Response Model: 4007.604 | Model weight: 0.141
#> -> The best fitting model is the Constrained Graded Response Model.
#>
#> -== General Summary ==-
#> - 6 Judges
#> - 300 Products
#> - 5 response categories (1-2-3-4-5)
#> - Mean judgment = 2.977 | SD = 0.862
#>
#> -== IRT Summary ==-
#> - Model: Constrained (equal slopes) Graded Response Model (Samejima, 1969) | doi: 10.1007/BF03372160
#> - Estimation package: mirt (Chalmers, 2012) | doi: 10.18637/jss.v048.i06
#> - Estimation algorithm: Expectation-Maximization (EM; Bock & Atkin, 1981) | doi: 10.1007/BF02293801
#> - Method of factor scoring: Expected A Posteriori (EAP)
#> - AIC = 3999.249 | AICc = 4003.993 | BIC = 4091.843 | SABIC = 3999.249
#>
#> -== Model-based reliability ==-
#> - Empirical reliability | Average in the sample: .893
#> - Expected reliability | Assumes a Normal(0,1) prior density: .894
Of course there’s more available here than one would report. If using IRT scoring (which is the main purpose of this package), we recommend reporting what IRT model was selected, along with IRT indices primarily, since the scoring is based on the estimation of the \(\theta\) abilities. In this case typically what is reported in the empirical reliability (here 0.893), which is the estimate of the reliability of the observations in the sample. It can be interpreted similarily as other more traditionnal indices of reliability (like Cronbach’s \(\alpha\)).
fit <- jrt(data, silent = T)
One may of course select a model based on assumptions on the data rather than on model fit comparisons. This is done through using the name of a model as an imput of the argument irt.model
of the jrt()
function. This bypasses the automatic model selection stage.
fit <- jrt(data, "PCM")
#> The possible responses detected are: 1-2-3-4-5
#>
#> -== General Summary ==-
#> - 6 Judges
#> - 300 Products
#> - 5 response categories (1-2-3-4-5)
#> - Mean judgment = 2.977 | SD = 0.862
#>
#> -== IRT Summary ==-
#> - Model: Partial Credit Model (Masters, 1982) | doi: 10.1007/BF02296272
#> - Estimation package: mirt (Chalmers, 2012) | doi: 10.18637/jss.v048.i06
#> - Estimation algorithm: Expectation-Maximization (EM; Bock & Atkin, 1981) | doi: 10.1007/BF02293801
#> - Method of factor scoring: Expected A Posteriori (EAP)
#> - AIC = 4022.957 | AICc = 4027.701 | BIC = 4115.551 | SABIC = 4022.957
#>
#> -== Model-based reliability ==-
#> - Empirical reliability | Average in the sample: .889
#> - Expected reliability | Assumes a Normal(0,1) prior density: .759
See the documentation for a list of available models. Most models are directly those of mirt
. Others are versions of the Graded Response Model or Generalized Partial Credit Model that are constrained in various ways (equal discriminations and/or equal category structures) through the mirt.model()
function of mirt
.
Note that they can also be called by their full names (e.g. jrt(data, "Graded Response Model")
).
@factor.scores
.head(fit@factor.scores)
#> Judgments.Factor.Score Judgments.Standard.Error Judgments.Mean.Score
#> 1 1.7072898 0.5824312 4.000000
#> 2 -0.7214498 0.5581569 2.500000
#> 3 -0.1529127 0.5119362 2.833333
#> 4 -0.4247967 0.5319672 2.666667
#> 5 -2.2557524 0.6720093 1.833333
#> 6 -1.4155774 0.6202465 2.166667
Note : If you want a more complete output with the original data, use @output.data
. If there were missing data, @output.data
also appends imputed data.
head(fit@output.data)
#> Judge_1 Judge_2 Judge_3 Judge_4 Judge_5 Judge_6 Judgments.Factor.Score
#> 1 5 4 3 4 4 4 1.7072898
#> 2 3 3 2 3 2 2 -0.7214498
#> 3 3 3 3 3 3 2 -0.1529127
#> 4 3 2 2 3 4 2 -0.4247967
#> 5 2 3 1 2 2 1 -2.2557524
#> 6 3 2 2 3 2 1 -1.4155774
#> Judgments.Standard.Error Judgments.Mean.Score
#> 1 0.5824312 4.000000
#> 2 0.5581569 2.500000
#> 3 0.5119362 2.833333
#> 4 0.5319672 2.666667
#> 5 0.6720093 1.833333
#> 6 0.6202465 2.166667
Judge characteristics can be inspected with Judge Category Curve (JCC) plots. They are computed with the function jcc.plot()
.
A basic example for Judge 3…
jcc.plot(fit, judge = 3)
Now of course, there are many options, but a few things that you could try:
judge = "all"
or simply removing the judge
argument (note that you can change the number of columns or rows, see the documentation for these advanced options).jcc.plot(fit)
jcc.plot(fit, judge = c(1,6))
jcc.plot(fit, facet.cols = 2)
greyscale = TRUE
…jcc.plot(fit, 1, greyscale = T)
overlay.reliability = TRUE
(reliability is scaled from \(0\) to \(1\), making it easier to read with probabilities than information)jcc.plot(fit, 1, overlay.reliability = TRUE)
labelled = FALSE
.jcc.plot(fit, overlay.reliability = T, labelled = F)
jcc.plot(fit, overlay.reliability = T, labelled = F, legend.position = "bottom")
column.names = "Expert"
(or whatever you want)jcc.plot(fit, 2, column.names = "Expert")
title
jcc.plot(fit, 1, title = "")
theta.span = 5
(sets the maximum, the minimum is automatically adjusted)jcc.plot(fit, 1, theta.span = 5)
color.palette
(uses the RColorBrewer palettes in ggplot2
), the background colors with theme
(uses the ggplot2
themes, use bw
, light
, grey
, etc.), and the line size with line.width
.jcc.plot(fit, 1, color.palette = "Dark2", theme = "classic", line.width = 1.5, font.family = "serif", overlay.reliability = T, name.for.reliability = "Reliability")
or
jcc.plot(fit, 1, color.palette = "Blues", theme = "grey", line.width = 3, labelled = F)
I’ve also integrated the colors of the ggsci package (npg
, aaas
, nejm
, lancet
, jama
, jco
, D3
, locuszoom
, igv
, uchicago
, startrek
, tron
, futurama
), but be careful, not all may have sufficient color values!
jcc.plot(fit, 1, color.palette = "aaas", overlay.reliability = T)
The jrt()
function already plots an information plot, but information plots can be called (as well as variants of information, like standard error and reliability), with the info.plot()
function.
info.plot(fit, 1)
judge
argument.info.plot(fit)
type
argument.(type = "reliability"
also works)
info.plot(fit, type = "r")
info.plot(fit, type = "se")
(type = "Standard Error"
also works)
info.plot(fit, type = "r", y.limits = c(0,1))
y.line
to add a horizontal line, for example for a .70 threshold, usual (though rarely used in IRT) for reliability.info.plot(fit, type = "r", y.line = .70)
type = ir
) or with standard error (type = ise
).info.plot(fit, type = "ise")
With a threshold value
info.plot(fit, type = "ir", y.line = .7)
And here again, themes are available.
info.plot(fit, type = "ir", y.line = .7, color.palette = "Dark2")
Similar customizing options than jcc.plot()
are available, here is an example:
info.plot(fit, 1, "ir",
column.names = "Rater",
theta.span = 5,
theme = "classic",
line.width = 2,
greyscale = T,
font.family = "serif")
Some polytomous IRT models (namely, the Rating Scale models) assume that judges all have the same response category structure, and so they cannot be estimated if all judges do not have the same observed categories. So, if your data includes judges with unobserved categories, how does jrt
deal with that?
For the automatic model selection stage, jrt
will by default keep all judges but, if there are judges with unobserved categories, it will not fit the Rating Scale and Generalized Rating Scale models. You will be notified in the output.
Note : The possible values are automatically detected, but it can be bypassed with the possible.values
argument.
Now, if you want instead to remove the incomplete judges to compare the models, set remove.judges.with.unobserved.categories = TRUE
(it’s a long name for an argument, so if you have a better idea of a clear but shorter name shoot me an email!). All models will be compared with only the complete judges.
After this stage:
An example with the same data as above but with remove.judges.with.unobserved.categories = TRUE
.
fit <- jrt(data,
irt.model = "RSM",
remove.judges.with.unobserved.categories = T,
additional.stats = F, #making the output cleaner for the example
progress.bar = F) #removing the progress bar for the example
#> The possible responses detected are: 1-2-3-4-5
#>
#> - Note : The Rating Scale Model requires to use only the judges with all categories observed. 1 judge(s) out of 8 removed.
#>
#> -== General Summary ==-
#> - 7 Judges | 1 Judge(s) with unobserved categories removed.
#> - 100 Products
#> - 5 response categories (1-2-3-4-5)
#> - Mean judgment = 2.841 | SD = 0.832
#>
#> -== IRT Summary ==-
#> - Model: Rating Scale Model (Andrich, 1978) | doi: 10.1007/BF02293814
#> - Estimation package: mirt (Chalmers, 2012) | doi: 10.18637/jss.v048.i06
#> - Estimation algorithm: Expectation-Maximization (EM; Bock & Atkin, 1981) | doi: 10.1007/BF02293801
#> - Method of factor scoring: Expected A Posteriori (EAP)
#> - AIC = 1723.349 | AICc = 1726.349 | BIC = 1752.005 | SABIC = 1723.349
#>
#> -== Model-based reliability ==-
#> - Empirical reliability | Average in the sample: .892
#> - Expected reliability | Assumes a Normal(0,1) prior density: .970
Additionnal statistics may be computed with additional.stats = TRUE
.
fit <- jrt(data,
additional.stats = T,
progress.bar = F) #removing the progress bar for the example
#> The possible responses detected are: 1-2-3-4-5
#> 12.5% Judges (1 out of 8) did not have all categories (1-2-3-4-5 observed). Since you set remove.judges.with.unobserved.categories = FALSE, we kept all Judges but did not estimate the Rating Scale and Generalized Rating Scale Models. Set remove.judges.with.unobserved.categories = TRUE if you want to remove the incomplete Judges and run all models instead. We also suggest to check your data for impossible values.
#>
#> -== Model Selection (8 judges) ==-
#> AICc for Graded Response Model: 1704.608 | Model weight: 0.000
#> AICc for Constrained Graded Response Model: 1685.57 | Model weight: 1.000
#> AICc for Partial Credit Model: 1707.878 | Model weight: 0.000
#> AICc for Generalized Partial Credit Model: 1717.336 | Model weight: 0.000
#> -> The best fitting model is the Constrained Graded Response Model.
#>
#> -== General Summary ==-
#> - 8 Judges
#> - 100 Products
#> - 5 response categories (1-2-3-4-5)
#> - Mean judgment = 2.841 | SD = 0.785
#>
#> -== IRT Summary ==-
#> - Model: Constrained (equal slopes) Graded Response Model (Samejima, 1969) | doi: 10.1007/BF03372160
#> - Estimation package: mirt (Chalmers, 2012) | doi: 10.18637/jss.v048.i06
#> - Estimation algorithm: Expectation-Maximization (EM; Bock & Atkin, 1981) | doi: 10.1007/BF02293801
#> - Method of factor scoring: Expected A Posteriori (EAP)
#> - AIC = 1656.393 | AICc = 1685.57 | BIC = 1737.154 | SABIC = 1656.393
#>
#> -== Model-based reliability ==-
#> - Empirical reliability | Average in the sample: .916
#> - Expected reliability | Assumes a Normal(0,1) prior density: .915
#> -== Other reliability statistics (packages "irr" and "psych") ==-
#> - Cronbach's Alpha: .903
#> - Standardized Cronbach's Alpha : .913
#> - Guttman's Lambda 4 :.939
#> - Guttman's Lambda 6 :.908
#> - Fleiss' Kappa : .153
#> - Fleiss-Conger's Exact Kappa : .164
#> - Intraclass Correlation Coefficient (One-Way Consistency model): .495
#> - Intraclass Correlation Coefficient (Two-Way Consistency model): .538
#> - Intraclass Correlation Coefficient (One-Way Agreement model): .495
#> - Intraclass Correlation Coefficient (Two-Way Agreement model): .500
The fitted model is stored in the slot @mirt.object
, so additionnal functions from mirt
can be easily used.
For example:
# Get more fit indices and compare models
mirt::anova(fit@mirt.object, verbose = F)
#> AIC AICc SABIC HQ BIC logLik
#> 1 1656.393 1685.57 1639.248 1689.079 1737.154 -797.197
# Get total information for a given vector of attributes
mirt::testinfo(fit@mirt.object, Theta = seq(from = -3, to = 3, by = 1))
#> [1] 2.800904 7.627556 11.167833 11.158276 11.904198 9.778967 4.306891
# Get the test information for case 1
mirt::testinfo(fit@mirt.object, Theta = fit@factor.scores.vector[1])
#> [1] 12.32303
# Get marginal reliability for high abilities – using a Normal(1,1) prior
mirt::marginal_rxx(fit@mirt.object,
density = function(x) {dnorm(x, mean = 1, sd = 1)})
#> [1] 0.9088782
For this example, let’s add partially missing data…
data[1, 1] <- NA # Add missing data for case 1 / judge 1 (partial missing)
…and completely missing case data.
data[3,] <- NA # All case 3 is missing (complete missing)
The data now looks like this:
head(data)
#> Judge_1 Judge_2 Judge_3 Judge_4 Judge_5 Judge_6 Judge_7 Judge_8
#> [1,] NA 1 2 3 2 3 1 2
#> [2,] 2 1 2 1 2 2 1 2
#> [3,] NA NA NA NA NA NA NA NA
#> [4,] 3 4 3 3 3 3 3 3
#> [5,] 2 1 2 2 2 1 3 3
#> [6,] 4 3 3 4 3 3 4 3
jrt
will be default impute missing data for partially missing data, but can be easily retrieved.
fit <- jrt(data, irt.model = "PCM", silent = T) #fit model
#> 1 case(s) with completely missing data detected! They were skipped for analysis.
#>
#> There are 1 (1.0%) cases with partially missing observations! Inputing...
#> - Note : Person fit statistics based on imputed data! Use with caution!
The fit@output.data
contains both the original data and the data with imputation (variable names are tagged “original”" and “imputed”), as well as the factor scores.
To retrieved them separately, the imputed data can be retrieved with fit@imputed.data
, the original data is in fit@input.data
and the factor scores can be retrieved like described previously.
jrt
for plotting?You may want to use jrt
as a plotting device only. That’s ok, because jrt
plotting functions will accept mirt
objects as input. They should be detected automatically as such.
Let’s fit a Generalized Partial Credit Model with mirt
for this example.
fit <- mirt::mirt(data = mirt::Science,
model = 1,
itemtype = "gpcm",
verbose = F)
Now jcc.plot()
can plot the category curves. Note that the default colmun names is now “Item”.
jcc.plot(fit)
For the information plot:
info.plot(fit)
For convenience the argumemnt item
can be used instead of judge
in both plotting functions:
jcc.plot(fit, item = 3)
Even though it isn’t its primary purpose, jrt
can also plot binary item response functions. They will be automatically detected and the plot will be named accordingly.
# SAT data from mirt
## Convert to binary
data <- mirt::key2binary(mirt::SAT12,
key = c(1,4,5,2,3,1,2,1,3,1,2,4,2,1,5,3,4,4,1,4,3,3,4,1,3,5,1,3,1,5,4,5))
## Fit 2PL model in mirt
fit <- mirt::mirt(data = data, model = 1, itemtype = "2PL", verbose = F)
## Plotting an item response function
jcc.plot(fit, item = 2)
## Plotting the item response functions of the first 12 items with a larger theta range
jcc.plot(fit, facet.cols = 4, item = 1:12, theta.span = 5)