Functions

`method.A()`

A linear model of log-transformed PK responses and effects
sequence, subject(sequence), period, treatment
where all effects are fixed (i.e., ANOVA). Estimated via function lm() of library stats.

modA <- lm(log(PK) ~ sequence + subject%in%sequence + period + treatment,
                     data = data)

`method.B()`

A linear model of log-transformed PK responses and effects
sequence, subject(sequence), period, treatment
where subject(sequence) is a random effect and all others are fixed.

Three options are provided

Estimated via function lmer() of library lmerTest.

modB <- lmer(log(PK) ~ sequence + period + treatment + (1|subject),
                       data = data)

Employs Satterthwaite’s approximation⁵ of the degrees of freedom method.B(..., option = 1), which is equivalent to SAS’ DDFM=SATTERTHWAITE, Phoenix WinNonlin’s Degrees of Freedom Satterthwaite, and Stata’s dfm=Satterthwaite.
Note that this is the only available approximation in SPSS.

Estimated via function lme() of library nlme.

modB <- lme(log(PK) ~ sequence +  period + treatment, random = ~1|subject,
                      data = data)

Employs degrees of freedom equivalent to SAS’ DDFM=CONTAIN (implicitly preferred by the EMA), Phoenix WinNonlin’s Degrees of Freedom Residual, STATISTICA’s GLM containment, and Stata’s dfm=anova.
To comply with the EMA’s Q&A document, method.B(..., option = 2) is the default (i.e., if the argument option is missing).

Estimated via function lmer() of library lmerTest.

modB <- lmer(log(PK) ~ sequence + period + treatment + (1|subject),
                       data = data)

Employs the Kenward-Roger approximation⁶ of the degrees of freedom method.B(..., option = 3), which is equivalent to Stata’s dfm=Kenward Roger (EIM) and SAS’ DDFM=KENWARDROGER(FIRSTORDER), i.e., based on the expected information matrix.
Note that SAS with DDFM=KENWARDROGER and JMP calculate Satterthwaite’s [sic] degrees of freedom and apply the Kackar-Harville correction⁷ i.e., based on the observed information matrix.

`ABE()`

Average Bioequivalence, where the model is identical to Method A. By default the conventional acceptance range of 80.00 – 125.00% is used. Tighter limits (90.00 – 111.11%) for narrow therapeutic index drugs (EMA) or wider limits (75.00 – 133.33% for C_max according to the guidelines of the GCC and South Africa) can be specified by the arguments theta1 (lower limit) and/or theta2 (upper limit).

Hypotheses

The hypotheses are \[\small{H_{0}:\theta_0\notin [L,U]\:vs\:H_{1}:L<\theta_0<U}\] where $\small{\theta_0=\mu_T/\mu_R}$ and the Null hypothesis is inequivalence. In Average Bioequivalence the limits $\small{[L,U]}$ are fixed, whereas in ABEL they can be expanded based on the variability of the reference treatment.

Tested designs

Details about the reference datasets and their designs:

help("data", package = "replicateBE")
?replicateBE::data

Four period (full) replicates

Both the test and the reference treatments are administered at least once.

Two sequences

TRTR | RTRT
TRRT | RTTR
TTRR | RRTT

Four sequences

Although supported, these design are not recommended due to confounded effects.

Three period (full) replicates

The test treatment is administered at least once to ½ of the subjects and the reference treatment at least once to the respective other ½ of the subjects.

TRT | RTR
TRR | RTT

Two period (full) replicate

The test and reference treatments are administered once to ½ of the subjects (for the estimation of the CI), i.e., the first group of subjects follows a conventional 2×2×2 trial. In the second group the test and reference treatments are administered at least once to ¼ of the subjects, repectively (for the estimation of CV_wT and CV_wR).

TR | RT | TT | RR

Although supported, Balaam’s design⁸ is not recommended due to its poor power characteristics.

Three period (partial) replicates

The test treatment is administered once and the reference treatment at least once.

TRR | RTR | RRT
TRR | RTR

The latter is the so-called extra-reference design⁹ which is not recommended since it is biased in the presence of period effects.

Data structure

Columns must have the headers subject, period, sequence, treatment, PK, and/or logPK.¹⁰ Any order of columns is acceptable. Uppercase and mixed case headers will be internally converted to lowercase headers.

Format

Variable	Format
`subject`	Integer numbers or any combination of alphanumerics (A-Z, a-z, -, _, #, 0-9)
`period`	Integer numbers
`sequence`	Numbers or literal sequences not listed in the tested designs are not accepted (e.g., `ABAB`).
`treatment`	The Test treatment must be coded `T` and the Reference `R` (both uppercase).
`PK`	Real positive numbers of PK responses.
`logPK`	Real numbers of already log_e-transformed PK responses (optional and rarely needed).

Relevant data are used for the estimation of CV_wR (and CV_wT in full replicate designs) and BE, i.e., the datasets might be different (see the example below). It is good practice to state that in the Statistical Analysis Plan (SAP).

Incomplete data

Estimation of CV_w

If a subject drops out from the study in a higher period, data of repeated administrations will still be used for the estimation of CV_w, although data of the other treatment might be missing. Examples for the estimation of CV_wR (missings denoted by ·):

RTRT | RTR·
TRRT | TRR·
RRTT | RRT· or RR··
RRT | RR··

Assessment of BE

If a subject drops out from the study in a higher period, data with at least one administration of the Test and Reference will be used in the assessment of BE. Examples (missings denoted by ·):

Example of different datasets

16 subjects enrolled in the study. In sequence RTRT one dropout in the 2^nd period and one in the 4^th period. In sequence TRTR one dropout in the 3^rd period and one in the 4^th.

1 RTR. 5 RTRT 9 TRTR 13 RTRT
2 RTRT 6 TR·· 10 TRTR 14 TRT·
3 RTRT 7 RTRT 11 RTRT 15 TRTR
4 TRTR 8 R··· 12 TRTR 16 TRTR

We obtain these datasets:

Dataset	Purpose	included	excluded
#1	Estimation of CV_wR	13 who received 2 treatments `R`	6, 8, 14
#2	Assessment of BE	15 who received ≥1 treatment `T` and ≥1 treatment `R`	8
#3	Estimation of CV_wT	13 who received 2 treatments `T`	1, 6, 8

Datasets #1 and #2 are required for ABEL and all three for the WHO’s reference-scaling of AUC (see below). For ABE only dataset #2 is required.

Notes on the methods

Estimation of intra-subject variability

The EMA proposed a linear model of log-transformed PK responses of the reference treatment
sequence, subject(sequence), period
where all effects are fixed. Estimated via function lm() of library stats:

modCV <- lm(log(PK) ~ sequence + subject%in%sequence + period,
                      data = data[data$treatment = "R", ])

For informational purposes in full replicate designs (required by the WHO for reference-scaling of AUC; see below) the same model is run with data = data[data$treatment = "T", ].

Special conditions for the sample size in three period full replicate designs:

The question raised asks if it is possible to use a design where subjects are randomised to receive treatments in the order of TRT or RTR.

The CHMP bioequivalence guideline requires that at least 12 patients are needed to provide data for a bioequivalence study to be considered valid, and to estimate all the key parameters. Therefore, if a 3-period replicate design, where treatments are given in the order TRT or RTR, is to be used to justify widening of a confidence interval for C_max then it is considered that at least 12 patients would need to provide data from the RTR arm. This implies a study with at least 24 patients in total would be required if equal number of subjects are allocated to the 2 treatment sequences.

— Q&A document¹¹

If less than twelve subjects are eligible in sequence RTR of a TRT | RTR design (and in analogy in sequence TRR of a TRR | RTT design), the user is notified about the ‘uncertain’ estimate of CV_wR. However, in a sufficiently powered study such a case is extremely unlikely. Let us explore the confidence interval of the CV:

# Estimate sample sizes of full replicate designs (theta0 0.90, target
# power 0.80) and CI of the CV with library PowerTOST
CV <- 0.30
n4 <- PowerTOST::sampleN.scABEL(CV = CV, design = "2x2x4", details = FALSE,
                                print = FALSE)[["Sample size"]] # 4-period
n3 <- PowerTOST::sampleN.scABEL(CV = CV, design = "2x2x3", details = FALSE,
                                print = FALSE)[["Sample size"]] # 3-period
# 95% CI of CVs in %
round(100*PowerTOST::CVCL(CV = CV, df = 3*n4-4, "2-sided"), 2) # 4-period
# lower CL upper CL 
#    26.19    35.15
round(100*PowerTOST::CVCL(CV = CV, df = 2*n3-3, "2-sided"), 2) # 3-period
# lower CL upper CL 
#    26.18    35.18
# As above but assume that only 12 subjects remain in each sequence
round(100*PowerTOST::CVCL(CV = CV, df = 3*24-4, "2-sided"), 2) # 4-period
# lower CL upper CL 
#    25.55    36.40
round(100*PowerTOST::CVCL(CV = CV, df = 2*24-3, "2-sided"), 2) # 3-period
# lower CL upper CL 
#    24.71    38.28

It is unclear why the four period replicate is considered by the EMA to give a more ‘reliable’ estimate than the three period replicate.

Model structure

The EMA’s models assume equal [sic] intra-subject variances of Test and Reference (like in 2×2×2 trials) – even if proven false in one of the full replicate designs (were both CV_wT and CV_wR can be estimated). Hence, amongst biostatisticians they are called ‘crippled models’ because the replicative nature of the study is ignored.

The nested structure subject(sequence) of the methods leads to an over-specified model.¹² The simple model
sequence, subject, period, treatment
gives identical estimates of the residual variance and the treatment effect and hence, its confidence interval.

The same holds true for the EMA’s model to estimate CV_wR. The simple model
subject, period
gives an identical estimate of the residual variance.

Reference-scaling is acceptable for C_max (immediate release products: BE-Guideline) and C_max, C_max,ss, C_τ,ss, _partialAUC (modified release products¹³). The intention to widen the limits has to be stated in the protocol and – contrary to the FDAs RSABE – a clinical justification provided.

Those HVDP for which a wider difference in C_max is considered clinically irrelevant based on a sound clinical justification can be assessed with a widened acceptance range. The request for widened interval must be prospectively specified in the protocol.

— BE Guideline

BE limits, PE restriction, rounding issues

The limits can be expanded based on CV_wR.

CV_wR ≤ 30%
Lower cap, i.e., no scaling, conventional limits:
$\small{\left[{L,U}\right] = {80.00 - 125.00\%}}$
30% < CV_wR ≤ 50%
Expanded limits based on $s_{wR}$:
$\small{\left[{L,U}\right] = 100\,{\text{e}^{\mp 0.760 \cdot {s_{wR}}}}}$
CV_wR > 50%
Upper cap, i.e., applying $\small{s^*_{wR}=\sqrt{\text{log}(0.50^2+1)}}$ in the expansion formula or
$\small{\left[ {L,U} \right] = {69.84 - 143.19\%}}$.

In reference-scaling a so-called mixed (a.k.a. aggregate) criterion is applied. In order to pass BE,

the 90% confidence interval has to lie entirely within the acceptance range $\small{\left[ {L,U} \right]}$ and
the point estimate has to lie within 80.00 – 125.00%.

To avoid discontinuities due to double rounding, expanded limits are calculated in full numeric precision and only the confidence interval is rounded according to the guideline.

# Calculate limits with library PowerTOST
CV <- c(30, 40, 49.6, 50, 50.4)
df <- data.frame(CV = CV, L = NA, U = NA, cap = "",
                 stringsAsFactors = FALSE)
for (i in seq_along(CV)) {
  df[i, 2:3] <- sprintf("%.8f", PowerTOST::scABEL(CV[i]/100)*100)
}
df$cap[df$CV <= 30] <- "lower"
df$cap[df$CV >= 50] <- "upper"
names(df)[1:3] <- c("CV(%)", "L(%)", "U(%)")
print(df, row.names = FALSE)
#  CV(%)        L(%)         U(%)   cap
#   30.0 80.00000000 125.00000000 lower
#   40.0 74.61770240 134.01645559      
#   49.6 70.01700049 142.82245641      
#   50.0 69.83678198 143.19101936 upper
#   50.4 69.83678198 143.19101936 upper

Discrete limits resulting from rounding

Degrees of freedom, comparison of methods

The SAS code provided by the EMA in the Q&A document does not specify how the degrees of freedom should be calculated in Method B. Hence, the default in PROC MIXED, namely DDFM=CONTAIN is applied, i.e., method.B(..., option = 2). For incomplete data (missing periods) Satterthwaite’s approximation of the degrees of freedom, i.e., method.B(..., option = 1) or Kenward-Roger method.B(..., option = 3) might be a better choice – if stated as such in the SAP.
For background about approximations in different software packages see the electronic Supplementary Material of Schütz et al.

The EMA seemingly prefers Method A:

A simple linear mixed model, which assumes identical within-subject variability (Method B), may be acceptable as long as results obtained with the two methods do not lead to different regulatory decisions. However, in borderline cases […] additional analysis using Method A might be required.

— Q&A document (January 2011 and later revisions)

The half-width of the confidence interval in log-scale allows a comparison of methods (B v.s. A) where a higher value might point towards a more conservative decision.¹⁴ In the provided example datasets – with one exception – the conclusion of BE (based on the mixed criterion) agrees between Method A and Method B.
However, for the highly incomplete dataset 14 Method A was liberal (passing by ANOVA but failing by the random effects model):

# Compare Method B acc. to the GL with Method A for all reference datasets.
ds <- substr(grep("rds", unname(unlist(data(package = "replicateBE"))),
                  value = TRUE), start = 1, stop = 5)
for (i in seq_along(ds)) {
  A <- method.A(print = FALSE, details = TRUE, data = eval(parse(text = ds[i])))$BE
  B <- method.B(print = FALSE, details = TRUE, data = eval(parse(text = ds[i])))$BE
  r <- paste0("A ", A, ", B ", B, " \u2013 ")
  cat(paste0(ds[i], ":"), r)
  if (A == B) {
    cat("Methods agree.\n")
  } else {
    if (A == "fail" & B == "pass") {
      cat("Method A is conservative.\n")
    } else {
      cat("Method B is conservative.\n")
    }
  }
}
# rds01: A pass, B pass – Methods agree.
# rds02: A pass, B pass – Methods agree.
# rds03: A pass, B pass – Methods agree.
# rds04: A fail, B fail – Methods agree.
# rds05: A pass, B pass – Methods agree.
# rds06: A pass, B pass – Methods agree.
# rds07: A pass, B pass – Methods agree.
# rds08: A pass, B pass – Methods agree.
# rds09: A pass, B pass – Methods agree.
# rds10: A pass, B pass – Methods agree.
# rds11: A pass, B pass – Methods agree.
# rds12: A fail, B fail – Methods agree.
# rds13: A fail, B fail – Methods agree.
# rds14: A pass, B fail – Method B is conservative.
# rds15: A fail, B fail – Methods agree.
# rds16: A fail, B fail – Methods agree.
# rds17: A fail, B fail – Methods agree.
# rds18: A fail, B fail – Methods agree.
# rds19: A fail, B fail – Methods agree.
# rds20: A fail, B fail – Methods agree.
# rds21: A fail, B fail – Methods agree.
# rds22: A pass, B pass – Methods agree.
# rds23: A pass, B pass – Methods agree.
# rds24: A pass, B pass – Methods agree.
# rds25: A pass, B pass – Methods agree.
# rds26: A fail, B fail – Methods agree.
# rds27: A pass, B pass – Methods agree.
# rds28: A pass, B pass – Methods agree.
# rds29: A pass, B pass – Methods agree.
# rds30: A fail, B fail – Methods agree.

Exploring dataset 14:

A  <- method.A(print = FALSE, details = TRUE, data = rds14)
B1 <- method.B(print = FALSE, details = TRUE, data = rds14, option = 1)
B2 <- method.B(print = FALSE, details = TRUE, data = rds14) # apply default option
B3 <- method.B(print = FALSE, details = TRUE, data = rds14, option = 3)
# Rounding of CI according to the GL
A[15:19]  <- round(A[15:19],  2) # all effects fixed
B1[15:19] <- round(B1[15:19], 2) # Satterthwaite's df
B2[15:19] <- round(B2[15:19], 2) # df acc. to Q&A
B3[15:19] <- round(B3[15:19], 2) # Kenward-Roger df
cs <- c(2, 10, 15:23)
df <- rbind(A[cs], B1[cs], B2[cs], B3[cs])
names(df)[c(1, 3:6, 11)] <- c("Meth.", "L(%)", "U(%)",
                              "CL.lo(%)", "CL.hi(%)", "hw")
df[, c(2, 11)] <- signif(df[, c(2, 11)], 5)
print(df[order(df$BE, df$hw, decreasing = c(FALSE, TRUE)), ],
      row.names = FALSE)
#  Meth.     DF  L(%)   U(%) CL.lo(%) CL.hi(%) PE(%)   CI  GMR   BE      hw
#    B-1 197.44 69.84 143.19    69.21   121.27 91.62 fail pass fail 0.28043
#    B-2 192.00 69.84 143.19    69.21   121.28 91.62 fail pass fail 0.28046
#    B-3 195.99 69.84 143.19    69.21   121.28 91.62 fail pass fail 0.28052
#      A 192.00 69.84 143.19    69.99   123.17 92.85 pass pass pass 0.28261

All variants of Method B are more conservative than Method A. Before rounding the confidence interval, option = 2 with 192 degrees of freedom would be more conservative (lower CL 69.21029) than option = 1 with 197.44 degrees of freedom (lower CL 69.21286). Given the incompleteness of this dataset (four missings in period 2, twelve in period 3, and 19 in period 4), Satterthwaite’s or Kenward-Roger degrees of freedom are probably the better choice.

For detailed comparisons between methods based on simulations see the electronic Supplementary Material of Schütz et al.

Outlier analysis

It is an open issue how outliers should be handled.

The applicant should justify that the calculated intra-subject variability is a reliable estimate and that it is not the result of outliers.

— BE-Guideline

Box plots were ‘suggested’ by the author as a mere joke [sic] at the EGA/EMA symposium, being aware of their nonparametric nature and the EMA’s reluctance towards robust methods. Alas, this joke was included in the Q&A document.

[…] a study could be acceptable if the bioequivalence requirements are met both including the outlier subject (using the scaled average bioequivalence approach and the within-subject CV with this subject) and after exclusion of the outlier (using the within-subject CV without this subject).

An outlier test is not an expectation of the medicines agencies but outliers could be shown by a box plot. This would allow the medicines agencies to compare the data between them.

— EGA/EMA Q&A-document

With the additional argument ola = TRUE in method.A() and method.B() an outlier analysis is performed, where the default fence = 2.¹⁵

Results differ slightly depending on software’s algorithms to calculate the median and quartiles. Example with the ‘types’ implemented in R (note the differences even in the medians):

### Compare different types with some random data
x <- rnorm(48)
p <- c(25, 50, 75)/100
q <- matrix(data = "", nrow = 9, ncol = 4,
            dimnames = list(paste("type =", 1:9),
                            c("1st quart.", "median", "3rd quart.",
                              "software / default")))
for (i in 1:9) {
  q[i, 1:3] <- sprintf("%.5f", quantile(x, prob = p, type = i))
}
q[c(2, 4, 6:8), 4] <- c("SAS, Stata", "SciPy", "Phoenix, Minitab, SPSS",
                        "R, S, MATLAB, Octave, Excel", "Maple")
print(as.data.frame(q))
#          1st quart.   median 3rd quart.          software / default
# type = 1   -0.86817 -0.40487    0.29546                            
# type = 2   -0.85617 -0.38854    0.30042                  SAS, Stata
# type = 3   -0.86817 -0.40487    0.29546                            
# type = 4   -0.86817 -0.40487    0.29546                       SciPy
# type = 5   -0.85617 -0.38854    0.30042                            
# type = 6   -0.86217 -0.38854    0.30290      Phoenix, Minitab, SPSS
# type = 7   -0.85017 -0.38854    0.29794 R, S, MATLAB, Octave, Excel
# type = 8   -0.85817 -0.38854    0.30124                       Maple
# type = 9   -0.85767 -0.38854    0.30104

Box plots of studentized¹⁶ and standarized¹⁷ model residuals are constructed.¹⁸
Potential outliers are flagged based on the argument fence provided by the user.
With the additional argument verbose = TRUE detailed information is shown in the console.

Example for the reference dataset 01:

Outlier analysis
 (externally) studentized residuals
 Limits (2×IQR whiskers): -1.717435, 1.877877
 Outliers:
 subject sequence  stud.res
      45     RTRT -6.656940
      52     RTRT  3.453122

 standarized (internally studentized) residuals
 Limits (2×IQR whiskers): -1.69433, 1.845333
 Outliers:
 subject sequence stand.res
      45     RTRT -5.246293
      52     RTRT  3.214663

If based on studentized residuals outliers are detected, additionally to the expanded limits based on the complete reference data, tighter limits are calculated based on CV_wR after exclusion of outliers and BE assessed with the new limits. Note that standardized residuals are given for informational purposes only and not used for exclusion of outliers.

Output for the reference dataset 01 (re-ordered for clarity):

CVwR               :  46.96% (reference-scaling applicable)
swR                :   0.44645
Expanded limits    :  71.23% ... 140.40% [100exp(±0.760·swR)]
Assessment based on original CVwR 46.96%
────────────────────────────────────────
Confidence interval: 107.11% ... 124.89%  pass
Point estimate     : 115.66%              pass
Mixed (CI & PE)    :                      pass
 ╟────────┼─────────────────────┼───────■────────◊─────────■───────────────╢

Outlier fence      :  2×IQR of studentized residuals.
Recalculation due to presence of 2 outliers (subj. 45|52)
─────────────────────────────────────────────────────────
CVwR (outl. excl.) :  32.16% (reference-scaling applicable)
swR (recalculated) :   0.31374
Expanded limits    :  78.79% ... 126.93% [100exp(±0.760·swR)]
Assessment based on recalculated CVwR 32.16%
────────────────────────────────────────────
Confidence interval: pass
Point estimate     : pass
Mixed (CI & PE)    : pass
         ╟┼─────────────────────┼───────■────────◊─────────■─╢

Note that the PE and its CI are not affected since the entire data are used and therefore, these values not reported in the second analysis (only the conclusion of the assessment).
The ‘line plot’ is given for informational purposes since its resolution is only ~0.5%. The filled squares ■ are the lower and upper 90% confidence limits, the rhombus ◊ the point estimate, the vertical lines │ at 100% and the PE restriction (80.00 – 125.00%), and the double vertical lines ║ the expanded limits. The PE and CI take presedence over other symbols. In this case the upper limit of the PE restriction is not visible.
Since both analyses arrive at the same conclusion, the study should be acceptable according to the Q&A document.

Applicability, caveats, outlook

The EMA’s approach of reference-scaling for highly variable drugs / drug products is currently recommended in other jurisdictions as well (e.g., the WHO; the ASEAN, Australia, Brazil, the East African Community, Egypt, the EEU, New Zealand, and the Russian Federation). Health Canada accepts ABEL only for AUC with an upper cap of scaling at ~57.4% (maximum expansion to 66.7 – 150.0%) and might require a true mixed-effects model. Whether Method B is acceptable is unclear.

If degrees of freedom are approximated (Satterthwaite, Kenward-Roger), the SAP and statistical report should always specify which method will be / was used (see above) in order to allow recalculation in other software. This package uses the expected information matrix.¹⁹

The estimated CV_wR is always uncertain (the degree of uncertainty depends on the CV_wR itself, the design, and – to a minor degree – the sample size), which might lead to an inflation of the type I error (i.e., if ABEL is falsely applied although the true – but unknown – CV_wR is lower than its estimate).²⁰^, ²¹
Use the optional argument method.A(..., adjust = TRUE) to iteratively adjust α to control the type I error.²²
If you want to apply the most conservative approach of Molins et al.²³ (which corrects for CV_wR 30% instead of the observed one), get the data.frame of results with
  x <- method.A(..., details = TRUE, print = FALSE).
Adjust α in library PowerTOST and call method.A() again:
  design <- "2x2x4" # your design
  n <- as.integer(strsplit(x[[6]], "|", fixed = TRUE)[[1]]) # subjects / sequence
  y <- PowerTOST::scABEL.ad(CV = 0.3, n = n, design = design, print = FALSE)
  method.A(..., alpha = y$alpha.adj)

The WHO accepts reference-scaling for AUC (four period full replicate studies are mandatory in order to assess the variability associated with each product). It is not evident how this assessment should be done.
In Population Bioequivalence (PBE) and Individual Bioequivalence (IBE) the s_wT/s_wR ratio was assessed and ‘similar’ variability was concluded for a ratio within 0.667 – 1.500. However, the power of comparing variabilities in a study designed to compare means is low. This was one of the reasons why PBE and IBE were not implemented in regulatory practice. An alternative approach is given in the FDAs guidance on warfarin where variabilities are considered ‘comparable’ if the upper confidence limit of σ_wT/σ_wR is ≤2.5.

Cross-validation

Results of all reference datasets agree with ones obtained in SAS (9.4), Phoenix WinNonlin (6.4–8.1), STATISTICA (13), SPSS (22.0), Stata (15.0), and JMP (10.0.2).

Contributors

Helmut Schütz (Author)
Michael Tomashevskiy (Contributor)
Detlew Labes (Contributor)

License

Helmut Schütz 2020-07-24

GPL-2 | GPL-3

Disclaimer

Program offered for Use without any Guarantees and Absolutely No Warranty. No Liability is accepted for any Loss and Risk to Public Health Resulting from Use of this R-Code.

Session Information

Inspect this information for reproducibility. Of particular importance are the versions of R and the packages used to create this workflow. It is considered good practice to record this information with every analysis.

options(width = 80)
devtools::session_info()
# - Session info ---------------------------------------------------------------
#  setting  value                       
#  version  R version 4.0.2 (2020-06-22)
#  os       Windows 7 x64 SP 1          
#  system   x86_64, mingw32             
#  ui       RTerm                       
#  language EN                          
#  collate  C                           
#  ctype    German_Germany.1252         
#  tz       Europe/Vienna               
#  date     2020-07-24                  
# 
# - Packages -------------------------------------------------------------------
#  package       * version    date       lib source        
#  assertthat      0.2.1      2019-03-21 [2] CRAN (R 4.0.0)
#  backports       1.1.7      2020-05-13 [2] CRAN (R 4.0.0)
#  boot            1.3-25     2020-04-26 [2] CRAN (R 4.0.2)
#  callr           3.4.3      2020-03-28 [2] CRAN (R 4.0.0)
#  cellranger      1.1.0      2016-07-27 [2] CRAN (R 4.0.0)
#  cli             2.0.2      2020-02-28 [2] CRAN (R 4.0.0)
#  colorspace      1.4-1      2019-03-18 [2] CRAN (R 4.0.0)
#  crayon          1.3.4      2017-09-16 [2] CRAN (R 4.0.0)
#  cubature        2.0.4      2019-12-04 [2] CRAN (R 4.0.0)
#  desc            1.2.0      2018-05-01 [2] CRAN (R 4.0.0)
#  devtools        2.3.1      2020-07-21 [2] CRAN (R 4.0.2)
#  digest          0.6.25     2020-02-23 [2] CRAN (R 4.0.0)
#  dplyr           1.0.0      2020-05-29 [2] CRAN (R 4.0.0)
#  ellipsis        0.3.1      2020-05-15 [2] CRAN (R 4.0.0)
#  evaluate        0.14       2019-05-28 [2] CRAN (R 4.0.0)
#  fansi           0.4.1      2020-01-08 [2] CRAN (R 4.0.0)
#  fs              1.4.1      2020-04-04 [2] CRAN (R 4.0.0)
#  generics        0.0.2      2018-11-29 [2] CRAN (R 4.0.0)
#  ggplot2         3.3.2      2020-06-19 [2] CRAN (R 4.0.2)
#  glue            1.4.1      2020-05-13 [2] CRAN (R 4.0.0)
#  gtable          0.3.0      2019-03-25 [2] CRAN (R 4.0.0)
#  htmltools       0.5.0      2020-06-16 [2] CRAN (R 4.0.0)
#  knitr           1.29       2020-06-23 [2] CRAN (R 4.0.2)
#  lattice         0.20-41    2020-04-02 [2] CRAN (R 4.0.0)
#  lifecycle       0.2.0      2020-03-06 [2] CRAN (R 4.0.0)
#  lme4            1.1-23     2020-04-07 [2] CRAN (R 4.0.0)
#  lmerTest        3.1-2      2020-04-08 [2] CRAN (R 4.0.0)
#  magrittr        1.5        2014-11-22 [2] CRAN (R 4.0.0)
#  MASS            7.3-51.6   2020-04-26 [2] CRAN (R 4.0.0)
#  Matrix          1.2-18     2019-11-27 [2] CRAN (R 4.0.0)
#  memoise         1.1.0      2017-04-21 [2] CRAN (R 4.0.0)
#  minqa           1.2.4      2014-10-09 [2] CRAN (R 4.0.0)
#  munsell         0.5.0      2018-06-12 [2] CRAN (R 4.0.0)
#  mvtnorm         1.1-1      2020-06-09 [2] CRAN (R 4.0.0)
#  nlme            3.1-148    2020-05-24 [2] CRAN (R 4.0.0)
#  nloptr          1.2.2.1    2020-03-11 [2] CRAN (R 4.0.0)
#  numDeriv        2016.8-1.1 2019-06-06 [2] CRAN (R 4.0.0)
#  pbkrtest        0.4-8.6    2020-02-20 [2] CRAN (R 4.0.0)
#  pillar          1.4.6      2020-07-10 [2] CRAN (R 4.0.2)
#  pkgbuild        1.1.0      2020-07-13 [2] CRAN (R 4.0.2)
#  pkgconfig       2.0.3      2019-09-22 [2] CRAN (R 4.0.0)
#  pkgload         1.1.0      2020-05-29 [2] CRAN (R 4.0.0)
#  PowerTOST       1.4-9      2019-12-19 [2] CRAN (R 4.0.0)
#  prettyunits     1.1.1      2020-01-24 [2] CRAN (R 4.0.0)
#  processx        3.4.2      2020-02-09 [2] CRAN (R 4.0.0)
#  ps              1.3.3      2020-05-08 [2] CRAN (R 4.0.0)
#  purrr           0.3.4      2020-04-17 [2] CRAN (R 4.0.0)
#  R6              2.4.1      2019-11-12 [2] CRAN (R 4.0.0)
#  Rcpp            1.0.4.6    2020-04-09 [2] CRAN (R 4.0.0)
#  readxl          1.3.1      2019-03-13 [2] CRAN (R 4.0.0)
#  remotes         2.2.0      2020-07-21 [2] CRAN (R 4.0.2)
#  replicateBE   * 1.0.15     2020-07-24 [1] local         
#  rlang           0.4.7      2020-07-09 [2] CRAN (R 4.0.2)
#  rmarkdown       2.3        2020-06-18 [2] CRAN (R 4.0.0)
#  rprojroot       1.3-2      2018-01-03 [2] CRAN (R 4.0.0)
#  scales          1.1.1      2020-05-11 [2] CRAN (R 4.0.0)
#  sessioninfo     1.1.1      2018-11-05 [2] CRAN (R 4.0.0)
#  statmod         1.4.34     2020-02-17 [2] CRAN (R 4.0.0)
#  stringi         1.4.6      2020-02-17 [2] CRAN (R 4.0.0)
#  stringr         1.4.0      2019-02-10 [2] CRAN (R 4.0.0)
#  TeachingDemos   2.12       2020-04-07 [2] CRAN (R 4.0.0)
#  testthat        2.3.2      2020-03-02 [2] CRAN (R 4.0.0)
#  tibble          3.0.1      2020-04-20 [2] CRAN (R 4.0.0)
#  tidyselect      1.1.0      2020-05-11 [2] CRAN (R 4.0.0)
#  tufte           0.6        2020-05-08 [2] CRAN (R 4.0.0)
#  usethis         1.6.1      2020-04-29 [2] CRAN (R 4.0.0)
#  vctrs           0.3.1      2020-06-05 [2] CRAN (R 4.0.0)
#  withr           2.2.0      2020-04-20 [2] CRAN (R 4.0.0)
#  xfun            0.15       2020-06-21 [2] CRAN (R 4.0.2)
#  yaml            2.2.1      2020-02-01 [2] CRAN (R 4.0.0)
# 
# [1] E:/Users/HS/Documents/Rtmpgnjiup/Rinst1f304b935b2
# [2] D:/Program Files/R/R-4.0.2/library

Schütz H, Tomashevskiy M, Labes D, Shitova A, González-de la Parra M, Fuglsang A. Reference Datasets for Studies in a Replicate Design Intended for Average Bioequivalence with Expanding Limits. AAPS J. 2020; 22:44. doi:10.1208/s12248-020-0427-6.↩︎
European Medicines Agency. Annex I. London, 21 September 2016. EMA/582648/2016.↩︎
European Medicines Agency, Committee for Medicinal Products for Human Use. Guideline on the Investigation of Bioequivalence. London, 20 January 2010. CPMP/EWP/QWP/1401/98 Rev. 1/Corr **.↩︎
European Generic Medicines Association. Revised EMA Bioequivalence Guideline. 3^rd EGA Symposium on Bioequivalence. London, 1 June 2010. Questions & Answers.↩︎
Satterthwaite FE. An Approximate Distribution of Estimates of Variance Components. Biometrics Bulletin. 1946; 2(6): 110–4. doi:10.2307/3002019.↩︎
Kenward MG, Roger JH. Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood. Biometrics. 1997; 53(3): 983–97. doi:10.2307/2533558.↩︎
Kackar RN, Harville DA. Approximations for Standard Errors of Estimators of Fixed and Random Effects in Mixed Linear Models. J Am Stat Assoc. 1984; 79(388): 853–62. doi:10.1080/01621459.1984.10477102.↩︎
Balaam LN. A Two-Period Design with t² Experimental Units. Biometrics. 1968; 24(1): 61–73. doi:10.2307/2528460.↩︎
Chow, SC, Shao J, Wang H. Individual bioequivalence testing under 2×3 designs. Stat Med. 2002; 21(5): 629–48. doi:10.1002/sim.1056.↩︎
Napierian logarithm (base e). The decadic logarithm (base 10) is not supported).↩︎
European Medicines Agency. Questions & Answers: positions on specific questions addressed to the Pharmacokinetics Working Party (PKWP). London, June 2015 (and later revisons). EMA/618604/2008.↩︎
Contradics the law of parsimony. Such a nesting is superfluous since in BE trials subjects are uniquely coded. If, say, subject 1 is allocated to sequence TRTR there is not yet ‘another’ subject 1 allocated to sequence RTRT. This explains the many lines in SAS PROC GML given with . and in Phoenix WinNonlin as not estimable.↩︎
European Medicines Agency, Committee for Medicinal Products for Human Use. Guideline on the pharmacokinetic and clinical evaluation of modified release dosage forms. London, 20 November 2014. EMA/CHMP/EWP/280/96 ↩︎
Of course, only if point estimates are identical.↩︎
The fences are given by the lowest datum still within m×IQR of the lower quartile, and the highest datum still within m×IQR of the upper quartile, where IQR is the interquartile range (difference between the 3^rd and 1^st quartiles). Data outside fences are considered outliers. Decreasing the multiplier m to e.g., 1.5 might result in many outliers, whereas increasing the multiplier in only a few.
Different methods exist to calculate quartiles (nine ‘types’ are available in R, where the default is type = 7). R’s default is used by S, MATLAB, Octave, and Excel. Phoenix WinNonlin, Minitab, and SPSS use type = 6, ScyPy uses type = 4, whereas the default in SAS and Stata is type = 2 (though others are available as well).↩︎
Externally studentized: $\widehat{\sigma}_{(i)}^2={1 \over n-m-1}\sum_{\begin{smallmatrix}j = 1\\j \ne i\end{smallmatrix}}^n \widehat{\varepsilon\,}_j^{\,2}$↩︎
Internally studentized: $\widehat{\sigma}^2={1 \over n-m}\sum_{j=1}^n \widehat{\varepsilon\,}_j^{\,2}$↩︎
Both are available in SAS and R, whereas only the latter in e.g., Phoenix WinNonlin. In general the former are slightly more restrictive. Which one will be used has to be stated in the SAP.↩︎
For Satterthwaite only available in Phoenix WinNonlin, SPSS, and Stata; as an option in SAS by SCORING=1. For Kenward-Roger default option EIM in Stata. The observed information matrix is the only available in JMP and default SCORING=0 in SAS; option OIM in Stata.↩︎
Wonnemann M, Frömke C, Koch A. Inflation of the Type I Error: Investigations on Regulatory Recommendations for Bioequivalence of Highly Variable Drugs. Pharm Res. 2015; 32(1): 135–43. doi:10.1007/s11095-014-1450-z.↩︎
Muñoz J, Alcaide D, Ocaña J. Consumer’s risk in the EMA and FDA regulatory approaches for bioequivalence in highly variable drugs. Stat Med. 2016; 35(12): 1933–43. doi:10.1002/sim.6834.↩︎
Labes D, Schütz H. Inflation of Type I Error in the Evaluation of Scaled Average Bioequivalence, and a Method for its Control. Pharm Res. 2016; 33(11): 2805–14. doi:10.1007/s11095-016-2006-1.↩︎
Molins E, Cobo E, Ocaña J. Two-Stage Designs Versus European Scaled Average Designs in Bioequivalence Studies for Highly Variable Drugs: Which to Choose? Stat Med. 2017: 36(30); 4777–88. doi:10.1002/sim.7452.↩︎

replicateBE

Comparative BA-calculation for Average Bioequivalence with Expanding Limits (ABEL)

Introduction

Functions

`method.A()`

`method.B()`

`ABE()`

Hypotheses

Tested designs

Four period (full) replicates

Two sequences

Four sequences

Three period (full) replicates

Two period (full) replicate

Three period (partial) replicates

Data structure

Format

Incomplete data

Estimation of CV_w

Assessment of BE

Example of different datasets

Notes on the methods

Estimation of intra-subject variability

Model structure

BE limits, PE restriction, rounding issues

Degrees of freedom, comparison of methods

Outlier analysis

Applicability, caveats, outlook

Cross-validation

Contributors

License

Helmut Schütz 2020-07-24

Disclaimer

Session Information

replicateBE

Comparative BA-calculation for Average Bioequivalence with Expanding Limits (ABEL)

Introduction

Functions

method.A()

method.B()

ABE()

Hypotheses

Tested designs

Four period (full) replicates

Two sequences

Four sequences

Three period (full) replicates

Two period (full) replicate

Three period (partial) replicates

Data structure

Format

Incomplete data

Estimation of CVw

Assessment of BE

Example of different datasets

Notes on the methods

Estimation of intra-subject variability

Model structure

BE limits, PE restriction, rounding issues

Degrees of freedom, comparison of methods

Outlier analysis

Applicability, caveats, outlook

Cross-validation

Contributors

License

Helmut Schütz 2020-07-24

Disclaimer

Session Information

`method.A()`

`method.B()`

`ABE()`

Estimation of CV_w