rsimsum
supports custom input values for the true value of the estimand and for confidence intervals limits (used to calculate coverage probability).
To illustrate this feature, we can use the tt
dataset (bundled with rsimsum
):
library(rsimsum)
data("tt", package = "rsimsum")
head(tt)
#> diff se lower upper df repno dgm method
#> 1 -2.185467 1.130916 -4.432925 0.06199072 88.00000 1 1 1
#> 2 -3.359683 1.572366 -6.484430 -0.23493506 88.00000 1 2 1
#> 3 -2.185467 1.285290 -4.778318 0.40738411 42.53603 1 1 2
#> 4 -3.359683 2.016465 -7.458611 0.73924596 33.78117 1 2 2
#> 5 -2.989333 1.150093 -5.274900 -0.70376532 88.00000 1 3 1
#> 6 -1.152852 1.368553 -3.872563 1.56685875 88.00000 1 4 1
This includes the results of a simulation study assessing robustness of the t-test when estimating the difference between means. The t-test assumes a t distribution, hence confidence intervals for the estimated mean are generally based on the t distribution. See for instance the example from the t-test documentation (?t.test
):
t.test(extra ~ group, data = sleep)
#>
#> Welch Two Sample t-test
#>
#> data: extra by group
#> t = -1.8608, df = 17.776, p-value = 0.07939
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -3.3654832 0.2054832
#> sample estimates:
#> mean in group 1 mean in group 2
#> 0.75 2.33
We can incorporate custom confidence intervals by passing the name of two columns in data
as the ci.limits
argument:
s1 <- simsum(data = tt, estvarname = "diff", true = -1, se = "se", ci.limits = c("lower", "upper"), methodvar = "method", by = "dgm")
#> 'ref' method was not specified, 1 set as the reference
summary(s1, stats = "cover")
#> Values are:
#> Point Estimate (Monte Carlo Standard Error)
#>
#> Coverage of nominal 95% confidence interval:
#> dgm 1 2
#> 1 0.9400 (0.0106) 0.9400 (0.0106)
#> 2 0.8780 (0.0146) 0.9420 (0.0105)
#> 3 0.9380 (0.0108) 0.9500 (0.0097)
#> 4 0.9020 (0.0133) 0.9420 (0.0105)
By doing so, we can incorporate different types of confidence intervals in the analysis of Monte Carlo simulation studies. Compare with the default setting:
s2 <- simsum(data = tt, estvarname = "diff", true = -1, se = "se", methodvar = "method", by = "dgm")
#> 'ref' method was not specified, 1 set as the reference
summary(s2, stats = "cover")
#> Values are:
#> Point Estimate (Monte Carlo Standard Error)
#>
#> Coverage of nominal 95% confidence interval:
#> dgm 1 2
#> 1 0.9400 (0.0106) 0.9360 (0.0109)
#> 2 0.8680 (0.0151) 0.9320 (0.0113)
#> 3 0.9380 (0.0108) 0.9420 (0.0105)
#> 4 0.8940 (0.0138) 0.9360 (0.0109)
The ci.limits
is also useful when using non-symmetrical confidence intervals, e.g. when using bootstrapped confidence intervals.
A pair of values can also be passed to rsimsum
as the ci.limits
argument:
s3 <- simsum(data = tt, estvarname = "diff", true = -1, se = "se", ci.limits = c(-1.5, -0.5), methodvar = "method", by = "dgm")
#> 'ref' method was not specified, 1 set as the reference
summary(s3, stats = "cover")
#> Values are:
#> Point Estimate (Monte Carlo Standard Error)
#>
#> Coverage of nominal 95% confidence interval:
#> dgm 1 2
#> 1 1.0000 (0.0000) 1.0000 (0.0000)
#> 2 1.0000 (0.0000) 1.0000 (0.0000)
#> 3 1.0000 (0.0000) 1.0000 (0.0000)
#> 4 1.0000 (0.0000) 1.0000 (0.0000)
If you have a better example of the utility of this method please get in touch - I’d love to hear from you!
By default, simsum
will calculate confidence intervals using normal-theory, Wald-type intervals. It is possible to use t-based critical values by providing a column for the (replication-specific) degrees of freedom (analogously as passing confidence bounds to ci.limits
):
s4 <- simsum(data = tt, estvarname = "diff", true = -1, se = "se", df = "df", methodvar = "method", by = "dgm")
#> 'ref' method was not specified, 1 set as the reference
Given that the confidence intervals in (lower
, upper
) are obtained by using critical values from a t distribution, the results of s4
will be equivalent to the results of s1
:
Finally, we can pass a column of values for true
as well:
tt$true <- -1
s5 <- simsum(data = tt, estvarname = "diff", true = "true", se = "se", ci.limits = c("lower", "upper"), methodvar = "method", by = "dgm")
#> 'ref' method was not specified, 1 set as the reference
summary(s5, stats = "cover")
#> Values are:
#> Point Estimate (Monte Carlo Standard Error)
#>
#> Coverage of nominal 95% confidence interval:
#> dgm 1 2
#> 1 0.9400 (0.0106) 0.9400 (0.0106)
#> 2 0.8780 (0.0146) 0.9420 (0.0105)
#> 3 0.9380 (0.0108) 0.9500 (0.0097)
#> 4 0.9020 (0.0133) 0.9420 (0.0105)
Compare with the default settings:
summary(s2, stats = "cover")
#> Values are:
#> Point Estimate (Monte Carlo Standard Error)
#>
#> Coverage of nominal 95% confidence interval:
#> dgm 1 2
#> 1 0.9400 (0.0106) 0.9360 (0.0109)
#> 2 0.8680 (0.0151) 0.9320 (0.0113)
#> 3 0.9380 (0.0108) 0.9420 (0.0105)
#> 4 0.8940 (0.0138) 0.9360 (0.0109)
multisimsum
can be as flexible as simsum
. Remember the default behaviour:
data("frailty", package = "rsimsum")
ms1 <- multisimsum(
data = frailty,
par = "par", true = c(trt = -0.50, fv = 0.75),
estvarname = "b", se = "se", methodvar = "model",
by = "fv_dist"
)
#> 'ref' method was not specified, Cox, Gamma set as the reference
summary(ms1, stats = "bias")
#> Values are:
#> Point Estimate (Monte Carlo Standard Error)
#>
#>
#> Parameter: fv
#>
#> Bias in point estimate:
#> fv_dist Cox, Gamma Cox, Log-Normal RP(P), Gamma RP(P), Log-Normal
#> Gamma -0.0124 (0.0045) 0.2299 (0.0076) -0.0179 (0.0044) 0.2347 (0.0077)
#> Log-Normal -0.1064 (0.0043) -0.0175 (0.0049) -0.1066 (0.0041) -0.0152 (0.0050)
#>
#> ------------------------------------------------------------------------------------------------------------------------------------------------------
#>
#> Parameter: trt
#>
#> Bias in point estimate:
#> fv_dist Cox, Gamma Cox, Log-Normal RP(P), Gamma RP(P), Log-Normal
#> Gamma -0.0006 (0.0016) -0.0013 (0.0016) -0.0003 (0.0016) -0.0015 (0.0016)
#> Log-Normal -0.0006 (0.0015) -0.0014 (0.0015) -0.0006 (0.0015) -0.0016 (0.0015)
In this example, we pass the true values of each estimand as the named vector c(trt = -0.50, fv = 0.75)
.
Say instead we stored the true value of each estimand as a column in our dataset:
frailty$true <- ifelse(frailty$par == "trt", -0.50, 0.75)
head(frailty)
#> i b se par fv_dist model true
#> 1 1 0.6569546 0.1256964 fv Gamma Cox, Gamma 0.75
#> 2 1 0.8396248 0.1663368 fv Gamma Cox, Log-Normal 0.75
#> 3 1 0.6583130 0.1260354 fv Gamma RP(P), Gamma 0.75
#> 4 1 0.8410503 0.1804898 fv Gamma RP(P), Log-Normal 0.75
#> 5 1 0.6394722 0.1223808 fv Log-Normal Cox, Gamma 0.75
#> 6 1 0.7573628 0.1235062 fv Log-Normal Cox, Log-Normal 0.75
With this data structure, we can pass a string value to multisimsum
that will identify the true
column in our dataset:
ms2 <- multisimsum(
data = frailty,
par = "par", true = "true",
estvarname = "b", se = "se", methodvar = "model",
by = "fv_dist"
)
#> 'ref' method was not specified, Cox, Gamma set as the reference
summary(ms2, stats = "bias")
#> Values are:
#> Point Estimate (Monte Carlo Standard Error)
#>
#>
#> Parameter: fv
#>
#> Bias in point estimate:
#> fv_dist Cox, Gamma Cox, Log-Normal RP(P), Gamma RP(P), Log-Normal
#> Gamma -0.0124 (0.0045) 0.2299 (0.0076) -0.0179 (0.0044) 0.2347 (0.0077)
#> Log-Normal -0.1064 (0.0043) -0.0175 (0.0049) -0.1066 (0.0041) -0.0152 (0.0050)
#>
#> ------------------------------------------------------------------------------------------------------------------------------------------------------
#>
#> Parameter: trt
#>
#> Bias in point estimate:
#> fv_dist Cox, Gamma Cox, Log-Normal RP(P), Gamma RP(P), Log-Normal
#> Gamma -0.0006 (0.0016) -0.0013 (0.0016) -0.0003 (0.0016) -0.0015 (0.0016)
#> Log-Normal -0.0006 (0.0015) -0.0014 (0.0015) -0.0006 (0.0015) -0.0016 (0.0015)
We can confirm that we obtain the same results with the two approaches:
This approach is particularly useful when the true value might vary across replications (e.g. when it depends on the simulated dataset).
Of course, it can be combined with custom confidence interval limits for coverage as well:
frailty$lower <- frailty$b - qt(1 - 0.05 / 2, df = 10) * frailty$se
frailty$upper <- frailty$b + qt(1 - 0.05 / 2, df = 10) * frailty$se
ms3 <- multisimsum(
data = frailty,
par = "par", true = "true",
estvarname = "b", se = "se", methodvar = "model",
by = "fv_dist",
ci.limits = c("lower", "upper")
)
#> 'ref' method was not specified, Cox, Gamma set as the reference
summary(ms3, stats = "cover")
#> Values are:
#> Point Estimate (Monte Carlo Standard Error)
#>
#>
#> Parameter: fv
#>
#> Coverage of nominal 95% confidence interval:
#> fv_dist Cox, Gamma Cox, Log-Normal RP(P), Gamma RP(P), Log-Normal
#> Gamma 0.9477 (0.0071) 0.9680 (0.0056) 0.9516 (0.0069) 0.9640 (0.0059)
#> Log-Normal 0.8046 (0.0128) 0.9330 (0.0079) 0.8235 (0.0121) 0.9460 (0.0071)
#>
#> ------------------------------------------------------------------------------------------------------------------------------------------------------
#>
#> Parameter: trt
#>
#> Coverage of nominal 95% confidence interval:
#> fv_dist Cox, Gamma Cox, Log-Normal RP(P), Gamma RP(P), Log-Normal
#> Gamma 0.9730 (0.0051) 0.9710 (0.0053) 0.9732 (0.0052) 0.9710 (0.0053)
#> Log-Normal 0.9710 (0.0053) 0.9690 (0.0055) 0.9719 (0.0052) 0.9690 (0.0055)
This will be completely different than before:
summary(ms2, stats = "cover")
#> Values are:
#> Point Estimate (Monte Carlo Standard Error)
#>
#>
#> Parameter: fv
#>
#> Coverage of nominal 95% confidence interval:
#> fv_dist Cox, Gamma Cox, Log-Normal RP(P), Gamma RP(P), Log-Normal
#> Gamma 0.9201 (0.0087) 0.9220 (0.0085) 0.9300 (0.0082) 0.9030 (0.0094)
#> Log-Normal 0.7503 (0.0140) 0.9020 (0.0094) 0.7683 (0.0134) 0.9280 (0.0082)
#>
#> ------------------------------------------------------------------------------------------------------------------------------------------------------
#>
#> Parameter: trt
#>
#> Coverage of nominal 95% confidence interval:
#> fv_dist Cox, Gamma Cox, Log-Normal RP(P), Gamma RP(P), Log-Normal
#> Gamma 0.9500 (0.0069) 0.9490 (0.0070) 0.9506 (0.0070) 0.9500 (0.0069)
#> Log-Normal 0.9410 (0.0075) 0.9420 (0.0074) 0.9428 (0.0074) 0.9430 (0.0073)