The package contains functions to calculate power and estimate sample size for various study designs used in (not only bio-) equivalence studies.
Version 1.4.9.9999 built 2020-08-04 with R 4.0.2.
# design name df
# parallel 2 parallel groups n-2
# 2x2 2x2 crossover n-2
# 2x2x2 2x2x2 crossover n-2
# 3x3 3x3 crossover 2*n-4
# 3x6x3 3x6x3 crossover 2*n-4
# 4x4 4x4 crossover 3*n-6
# 2x2x3 2x2x3 replicate crossover 2*n-3
# 2x2x4 2x2x4 replicate crossover 3*n-4
# 2x4x4 2x4x4 replicate crossover 3*n-4
# 2x3x3 partial replicate (2x3x3) 2*n-3
# 2x4x2 Balaam's (2x4x2) n-2
# 2x2x2r Liu's 2x2x2 repeated x-over 3*n-2
# paired paired means n-1
Codes of designs follow this pattern: treatments x sequences x periods
.
Although some replicate designs are more ‘popular’ than others, sample size estimations are valid for all of the following designs:
design | type | sequences | periods |
---|---|---|---|
2x2x4 |
full | 2 TRTR\|RTRT |
4 |
2x2x4 |
full | 2 TRRT\|RTTR |
4 |
2x2x4 |
full | 2 TTRR\|RRTT |
4 |
2x2x3 |
full | 2 TRT\|RTR |
3 |
2x2x3 |
full | 2 TRR\|RTT |
3 |
2x3x3 |
partial | 3 TRR\|RTR\|RRT |
3 |
Whilst “2x4x4” four period full replicate designs with four sequences (TRTR|RTRT|TRRT|RTTR or TRRT|RTTR|TTRR|RRTT) are supported, they should be avoided due to confounded effects.
For various methods power can be calculated based on
For all methods the sample size can be estimated based on
Power covers balanced as well as unbalanced sequences in crossover or replicate designs and equal/unequal group sizes in two-group parallel designs. Sample sizes are always rounded up to achieve balanced sequences or equal group sizes.
θ0 0.95, target power 0.80, design “2x2” (TR|RT), exact method (Owen’s Q).
α 0.05, point estimate constraint (0.80, 1.25), homoscedasticity (CVwT = CVwR), scaling is based on CVwR, target power 0.80, design “2x3x3” (TRR|RTR|RRT), approximation by the non-central t-distribution, 100,000 simulations.
θ0 0.90 as recommended by Tóthfalusi and Endrényi (2011).
Regulatory constant 0.76
, upper cap of scaling at CVwR 50%, evaluation by ANOVA.
Regulatory constant 0.76
, upper cap of scaling at CVwR ~57.4%, evaluation by intra-subject contrasts.
Regulatory constant log(1.25)/0.25
, linearized scaled ABE (Howe’s approximation).
θ0 0.975, regulatory constant log(1.11111)/0.1
, upper cap of scaling at CVwR ~21.4%, design “2x2x4” (TRTR|RTRT), linearized scaled ABE (Howe’s approximation), upper limit of the confidence interval of swT/swR ≤2.5.
β0 (slope) 1+log(0.95)/log(rd)
where rd
is the ratio of the highest and lowest dose, target power 0.80, crossover design, details of the sample size search suppressed.
Minimum acceptable power 0.70. θ0, design, conditions, and sample size method depend on defaults of the respective approaches (ABE, ABEL, RSABE, NTID).
Before running the examples attach the library.
If not noted otherwise, defaults are employed.
Power for total CV 0.35, θ0 0.95, group sizes 52 and 49, design “parallel”.
Sample size for assumed intra-subject CV 0.20.
sampleN.TOST(CV = 0.20)
#
# +++++++++++ Equivalence test - TOST +++++++++++
# Sample size estimation
# -----------------------------------------------
# Study design: 2x2 crossover
# log-transformed data (multiplicative model)
#
# alpha = 0.05, target power = 0.8
# BE margins = 0.8 ... 1.25
# True ratio = 0.95, CV = 0.2
#
# Sample size (total)
# n power
# 20 0.834680
Sample size for equivalence of the ratio of two means with normality on original scale based on Fieller’s (‘fiducial’) confidence interval. CVw 0.20, CVb 0.40.
Note the default α 0.025 (95% CI) of this function because it is intended for studies with clinical endpoints.
sampleN.RatioF(CV = 0.20, CVb = 0.40)
#
# +++++++++++ Equivalence test - TOST +++++++++++
# based on Fieller's confidence interval
# Sample size estimation
# -----------------------------------------------
# Study design: 2x2 crossover
# Ratio of means with normality on original scale
# alpha = 0.025, target power = 0.8
# BE margins = 0.8 ... 1.25
# True ratio = 0.95, CVw = 0.2, CVb = 0.4
#
# Sample size
# n power
# 28 0.807774
Sample size for assumed intra-subject CV 0.45, θ0 0.90, three period full replicate study “2x2x3” (TRT|RTR or TRR|RTT).
sampleN.TOST(CV = 0.45, theta0 = 0.90, design = "2x2x3")
#
# +++++++++++ Equivalence test - TOST +++++++++++
# Sample size estimation
# -----------------------------------------------
# Study design: 2x2x3 (3 period full replicate)
# log-transformed data (multiplicative model)
#
# alpha = 0.05, target power = 0.8
# BE margins = 0.8 ... 1.25
# True ratio = 0.9, CV = 0.45
#
# Sample size (total)
# n power
# 124 0.800125
Note that the conventional model assumes homoscedasticity. For heteroscedasticity we can ‘switch off’ all conditions of one of the methods for reference-scaled ABE. We assume a σ2 ratio of ⅔ (i.e., T has a lower variability than R). Only relevant columns of the data.frame shown.
reg <- reg_const("USER", r_const = NA, CVswitch = Inf,
CVcap = Inf, pe_constr = FALSE)
CV <- CVp2CV(CV = 0.45, ratio = 2/3)
res <- sampleN.scABEL(CV=CV, design = "2x2x3", regulator = reg,
details = FALSE, print = FALSE)
print(res[c(3:4, 8:9)], digits = 5, row.names = FALSE)
# CVwT CVwR Sample size Achieved power
# 0.3987 0.49767 126 0.8052
Similar sample size because the pooled CV is still 0.45.
Sample size assuming homoscedasticity (CVw = 0.45).
sampleN.scABEL(CV = 0.45, details = TRUE)
#
# +++++++++++ scaled (widened) ABEL +++++++++++
# Sample size estimation
# (simulation based on ANOVA evaluation)
# ---------------------------------------------
# Study design: 2x3x3 (partial replicate)
# log-transformed data (multiplicative model)
# 1e+05 studies for each step simulated.
#
# alpha = 0.05, target power = 0.8
# CVw(T) = 0.45; CVw(R) = 0.45
# True ratio = 0.9
# ABE limits / PE constraint = 0.8 ... 1.25
# EMA regulatory settings
# - CVswitch = 0.3
# - cap on scABEL if CVw(R) > 0.5
# - regulatory constant = 0.76
# - pe constraint applied
#
#
# Sample size search
# n power
# 36 0.7755
# 39 0.8059
Iteratively adjust α to control the Type I Error (Labes, Schütz). Slight heteroscedasticity (CVwT 0.30, CVwR 0.35), four period full replicate “2x2x4” study, 30 subjects, balanced sequences.
scABEL.ad(CV = c(0.30, 0.35), design = "2x2x4", n = 30)
# +++++++++++ scaled (widened) ABEL ++++++++++++
# iteratively adjusted alpha
# (simulations based on ANOVA evaluation)
# ----------------------------------------------
# Study design: 2x2x4 (4 period full replicate)
# log-transformed data (multiplicative model)
# 1,000,000 studies in each iteration simulated.
#
# CVwR 0.35, CVwT 0.3, n(i) 15|15 (N 30)
# Nominal alpha : 0.05
# True ratio : 0.9000
# Regulatory settings : EMA (ABEL)
# Switching CVwR : 0.3
# Regulatory constant : 0.76
# Expanded limits : 0.7723 ... 1.2948
# Upper scaling cap : CVwR > 0.5
# PE constraints : 0.8000 ... 1.2500
# Empiric TIE for alpha 0.0500 : 0.06651
# Power for theta0 0.9000 : 0.814
# Iteratively adjusted alpha : 0.03540
# Empiric TIE for adjusted alpha: 0.05000
# Power for theta0 0.9000 : 0.771
With the nominal α 0.05 the Type I Error will be inflated (0.0665). With the adjusted α 0.0354 (i.e., the 92.92% confidence interval) the TIE will be controlled, although with a slight loss in power (decreases from 0.814 to 0.771).
Consider sampleN.scABEL.ad(CV = c(0.30, 0.35), design = "2x2x4")
to estimate the sample size which both controls the TIE and maintains the target power. In this example 34 subjects will be required.
Sample size for a four period full replicate “2x2x4” study (any of TRTR|RTRT, TRRT|RTTR, TTRR|RRTT) assuming heteroscedasticity (CVwT 0.40, CVwR 0.50). Details of the sample size search suppressed.
sampleN.RSABE(CV = c(0.40, 0.50), design = "2x2x4", details = FALSE)
#
# ++++++++ Reference scaled ABE crit. +++++++++
# Sample size estimation
# ---------------------------------------------
# Study design: 2x2x4 (4 period full replicate)
# log-transformed data (multiplicative model)
# 1e+05 studies for each step simulated.
#
# alpha = 0.05, target power = 0.8
# CVw(T) = 0.4; CVw(R) = 0.5
# True ratio = 0.9
# ABE limits / PE constraints = 0.8 ... 1.25
# Regulatory settings: FDA
#
# Sample size
# n power
# 20 0.81509
Sample size assuming heteroscedasticity (CVw 0.125, σ2 ratio 2.5, i.e., T has a substantially higher variability than R). TRTR|RTRT according to the FDA’s guidance. Assess additionally which one of the three components (scaled, ABE, swT/swR ratio) drives the sample size.
CV <- signif(CVp2CV(CV = 0.125, ratio = 2.5), 4)
n <- sampleN.NTIDFDA(CV = CV)[["Sample size"]]
#
# +++++++++++ FDA method for NTIDs ++++++++++++
# Sample size estimation
# ---------------------------------------------
# Study design: 2x2x4 (TRTR|RTRT)
# log-transformed data (multiplicative model)
# 1e+05 studies for each step simulated.
#
# alpha = 0.05, target power = 0.8
# CVw(T) = 0.1497, CVw(R) = 0.09433
# True ratio = 0.975
# ABE limits = 0.8 ... 1.25
# Implied scABEL = 0.9056 ... 1.1043
# Regulatory settings: FDA
# - Regulatory const. = 1.053605
# - 'CVcap' = 0.2142
#
# Sample size search
# n power
# 28 0.665530
# 30 0.701440
# 32 0.734240
# 34 0.764500
# 36 0.792880
# 38 0.816080
suppressMessages(power.NTIDFDA(CV = CV, n = n, details = TRUE))
# p(BE) p(BE-sABEc) p(BE-ABE) p(BE-sratio)
# 0.81608 0.93848 1.00000 0.85794
The swT/swR component shows the lowest power and hence, drives the sample size.
Compare that with homoscedasticity (CVwT = CVwR = 0.125):
CV <- 0.125
n <- sampleN.NTIDFDA(CV = CV, details = FALSE)[["Sample size"]]
#
# +++++++++++ FDA method for NTIDs ++++++++++++
# Sample size estimation
# ---------------------------------------------
# Study design: 2x2x4 (TRTR|RTRT)
# log-transformed data (multiplicative model)
# 1e+05 studies for each step simulated.
#
# alpha = 0.05, target power = 0.8
# CVw(T) = 0.125, CVw(R) = 0.125
# True ratio = 0.975
# ABE limits = 0.8 ... 1.25
# Regulatory settings: FDA
#
# Sample size
# n power
# 16 0.822780
suppressMessages(power.NTIDFDA(CV = CV, n = n, details = TRUE))
# p(BE) p(BE-sABEc) p(BE-ABE) p(BE-sratio)
# 0.82278 0.84869 1.00000 0.95128
Here the scaled ABE component shows the lowest power and drives the sample size, which is much lower than in the previous example.
CV 0.20, Doses 1, 2, and 8 units, β0 1, target power 0.90.
sampleN.dp(CV = 0.20, doses = c(1, 2, 8), beta0 = 1, targetpower = 0.90)
#
# ++++ Dose proportionality study, power model ++++
# Sample size estimation
# -------------------------------------------------
# Study design: crossover (3x3 Latin square)
# alpha = 0.05, target power = 0.9
# Equivalence margins of R(dnm) = 0.8 ... 1.25
# Doses = 1 2 8
# True slope = 1, CV = 0.2
# Slope acceptance range = 0.89269 ... 1.1073
#
# Sample size (total)
# n power
# 18 0.915574
Note that the acceptance range of the slope depends on the ratio of the highest and lowest doses (i.e., it gets tighter for wider dose ranges and therefore, higher sample sizes will be required).
In an exploratory setting wider equivalence margins {θ1, θ2} (0.50, 2.00) are recommended, which would translate in this example to an acceptance range of 0.66667 ... 1.3333
and a sample size of only six subjects.
Explore impact of deviations from assumptions (higher CV, higher deviation of θ0 from 1, dropouts) on power. Assumed intra-subject CV 0.20, target power 0.90. Suppress the plot.
res <- pa.ABE(CV = 0.20, targetpower = 0.90)
print(res, plotit = FALSE)
# Sample size plan ABE
# Design alpha CV theta0 theta1 theta2 Sample size Achieved power
# 2x2 0.05 0.2 0.95 0.8 1.25 26 0.9176333
#
# Power analysis
# CV, theta0 and number of subjects which lead to min. acceptable power of at least 0.7:
# CV= 0.2729, theta0= 0.9044
# n = 16 (power= 0.7354)
If the study starts with 26 subjects (power ~0.92), the CV can increase to ~0.27 or θ0 decrease to ~0.90 or the sample size decrease to 10 whilst power will still be ≥0.70.
However, this is not a substitute for the “Sensitivity Analysis” recommended in ICH-E9, since in a real study a combination of all effects occurs simultaneously. It is up to you to decide on reasonable combinations and analyze their respective power.
Performed on a Xeon E3-1245v3 3.4 GHz, 8 MB cache, 16 GB RAM, R 4.0.2 64 bit on Windows 7.
“2x2” crossover design, CV 0.17. Sample sizes and achieved power for the supported methods (the 1st one is the default).
# method n power seconds
# owenq 14 0.805683 0.0015
# mvt 14 0.805690 0.1220
# noncentral 14 0.805683 0.0010
# shifted 16 0.852301 0.0005
The 2nd exact method is substantially slower than the 1st. The approximation based on the noncentral t-distribution is slightly faster but matches the 1st exact method closely. The approximation based on the shifted central t-distribution is the fastest but might estimate a sample size higher than necessary. Hence, it should be used only for comparative purposes.
Four period full replicate study, homogenicity (CVwT = CVwR 0.45). Sample sizes and achieved power for the supported methods (‘key’ statistics or subject simulations).
# method n power seconds
# ‘key’ statistics 28 0.81116 0.16
# subject simulations 28 0.81196 2.32
Simulating via the ‘key’ statistics is the method of choice for speed reasons.
However, subject simulations are recommended if
You can install the released version of PowerTOST from CRAN with
package <- "PowerTOST"
inst <- package %in% installed.packages()
if (length(package[!inst]) > 0) install.packages(package[!inst])
… and the development version from GitHub with
# install.packages("remotes")
remotes::install_github("Detlew/PowerTOST")
Skips installation from a github remote if the SHA-1 has not changed since last install. Use force = TRUE
to force installation.