Simulation of a Two-Group Parallel-Arm RCT with Interim Analyses

Zad Rafi

Setting up the Function
A Simulated Example
Examining the Results
Statistical Environment
Article Citation

Recently Andrew Althouse informed me that he was going to simulate a two-group parallel-arm randomized trial with interim analyses using the rpact R package, so I offered to also help in constructing the R code to do so. He already has a number of R scripts on his GitHub repo for doing similar simulations, which can be viewed here and a number of tweets explaining these simulations. For this example, his goal was to simulate a trial where the outcome was binary and the probability of death for each group could be tuned in addition to:

the total number of participants

the number of interim analyses

the schedule of the interim analyses

the group-sequential design used

along with the usual trial analysis parameters such as:

the \(\alpha\)-level

the type of test (1-sided vs. 2-sided).

The goal was to be able to produce a table of various statistics such as:

the odds ratio

the confidence limits

the \(P\)-value

the number of successes

for each of the interim analyses specified.

The function below is a reflection of our efforts to do so, and also returns several plots from the rpact package for the design that is chosen along with a plot comparing the design to other designs. In order to get similar results, you will need to load the R function first, and then simply enter the proper inputs. While there may be more efficient ways to write the code, for example using lapply() instead of for loops, we have chosen not to do so, and we have also tried to minimize the number of R packages necessary for the function to work but the following will be required:

rpact

stringr

You can quickly install and load both using:

req_packs <- c("rpact", "stringr", "ggplot2")
install.packages(req_packs)
lapply(req_packs, library, character.only = TRUE)

Setting up the Function

#' @title Simulation of a Two-Group Parallel-Arm Trial With Interim Analyses
#' @docType Custom function for simulation from the rpact package
#' @author Andrew Althouse with edits by Zad Rafi
#' NOTE: If you want to confirm "type 1 error" under different stopping rules,
#' make death = in the two treatment arms (e.g. no treatment effect)
#' NOTE: I have set this one up to test the power for a treatment that would reduce mortality
#' from 40% in control group (1) to 30% in treatment group (2)
#' NOTE: Trial Design Parameters - Part 1
#' Here we will specify the basics: total N patients to enroll, and death rate for each treatment arm
#' NOTE: Trial Design Parameters - Part 2
#' Here we will define the interim analysis strategy and stopping rules
#' For this trial we will include provisions for efficacy stopping only (no pre-specified futility stopping)
#' We will use the rpact package to compute the stopping/success thresholds at the interim and final analysis
#' NOTE: Required packages: rpact and stringr
#' @param nSims # The number of simulations, the default is 1000
#' @param nPatients # here is where you specify the planned max number of patients you want included in each RCT
#' @param death1 # here is where you specify the event rate for patients receiving 'treatment 1' in these trial
#' @param death2 # here is where you specify the event rate for patients receiving 'treatment 2' in these trials
#' @param nLooks # here is where you put the number of looks that will take place (INCLUDING the final analysis)
#' @param analyses_scheduled # schedule of interim analyses
#' @param sided # Whether the test is 1-sided or 2-sided
#' @param alpha # Specified alpha level, the default is 0.05
#' @param informationRates #
#' @param trials # The total number of trials you wish to load in the table results. 
#' @param typeOfDesign # The type of design.
#' @param seed # Argument to set the seed for the simulations
#' @return list of dataframes and plots
#' @example See below this code block
interim_sim <- function(nPatients = 1000, death1 = 0.4, death2 = 0.3, 
                        nLooks = 4, analyses_scheduled = c(0.25, 0.50, 0.75, 1),
                        sided = 1, alpha = 0.05, 
                        informationRates = analyses_scheduled, 
                        typeOfDesign = "asOF", nSims = 1000, trials = 10,
                        seed = 1031) {

efficacy_thresholds <- numeric(nLooks)
design <- getDesignGroupSequential(
  sided = sided, alpha = alpha,
  informationRates = analyses_scheduled,
  typeOfDesign = typeOfDesign)
design_2 <- getDesignGroupSequential(typeOfDesign = "P") # Pocock
design_3 <- getDesignGroupSequential(typeOfDesign = "asP") # Alpha-spending Pocock
design_4 <- getDesignGroupSequential(typeOfDesign = "OF") # O'Brien-Fleming
designSet <- getDesignSet(designs = c(design, design_2,
                                      design_3, design_4),
                          variedParameters = "typeOfDesign")

RNGkind(kind = "L'Ecuyer-CMRG")
set.seed(seed)
for (j in 1:nLooks) {
  efficacy_thresholds[j] <- design$stageLevels[j]
}

analyses_nPatients <- analyses_scheduled * nPatients
efficacy_thresholds
pb <- txtProgressBar(min = 0, max = nSims, initial = 0, style = 3)
trialnum <- numeric(nSims)
or <- data.frame(matrix(ncol = nLooks, nrow = nSims))
lcl <- data.frame(matrix(ncol = nLooks, nrow = nSims))
ucl <- data.frame(matrix(ncol = nLooks, nrow = nSims))
pval <- data.frame(matrix(ncol = nLooks, nrow = nSims))
success <- data.frame(matrix(ncol = nLooks, nrow = nSims))

strings <- c("OR_%d", "LCL_%d", "UCL_%d",
             "Pval_%d", "Success_%d")

colnames(or) <- sprintf("OR_%d", (1:nLooks))
colnames(lcl) <- sprintf("LCL_%d", (1:nLooks))
colnames(ucl) <- sprintf("UCL_%d", (1:nLooks))
colnames(pval) <- sprintf("Pval_%d", (1:nLooks))
colnames(success) <- sprintf("Success_%d", (1:nLooks))
overall_success <- numeric(nSims)
df <- data.frame(trialnum, or, lcl, ucl,
                 pval, success, overall_success)

RNGkind(kind = "L'Ecuyer-CMRG")
set.seed(seed)
time <- system.time(
for (i in 1:nSims) {
  trialnum[i] <- i
  pid <- seq(1, nPatients, by = 1)
  treatment <- rep(1:2, nPatients / 2)
  deathprob <- numeric(nPatients)
  deathprob[treatment == 1] <- death1
  deathprob[treatment == 2] <- death2
  death <- rbinom(nPatients, 1, deathprob)
  trialdata <- data.frame(cbind(pid, treatment, death))

  for (j in 1:nLooks) {
    analysisdata <- subset(trialdata, pid <= analyses_nPatients[j])
    model <- glm(death ~ treatment,
                 family = binomial(link = "logit"),
                 data = analysisdata)
    or[i, j] <- exp(summary(model)$coefficients[2])
    lcl[i, j] <- exp(confint.default((model))[2, 1])
    ucl[i, j] <- exp(confint.default((model))[2, 2])
    pval[i, j] <- summary(model)$coefficients[8]
    success[i, j] <- ifelse(or[i, j] < 1 & pval[i, j] < efficacy_thresholds[j], 1, 0)
  }
  overall_success[i] <- 0
  for (j in 1:nLooks) {
    if (success[i, j] == 1) {
      overall_success[i] <- 1
    }
  }
    setTxtProgressBar(pb, i)
})

df <- data.frame(trialnum, or, lcl, ucl, pval,
                 success, overall_success)
simulation_results <- data.frame(matrix(vector(),
  nrow = nPatients, ncol = (length(df))))
colnames(simulation_results) <- c("trialnum", (do.call(
  rbind,
  lapply(1:length(strings),
    FUN = function(j) {
      (do.call(rbind, lapply(1:nLooks,
        FUN = function(i) ((sprintf(strings, i)))
      )[]))[, j]
    }
  )
)), "overall_success")
simulation_results[intersect(names(df), names(simulation_results))] <- (df[intersect(
  names(df),
  names(simulation_results)
)])
simulation_results <- as.data.frame(simulation_results)

outputs <- simulation_results[, c(-1, -length(simulation_results))]

cols <- as.character(1:nLooks)
rows <- as.character(1:length(strings))
interim <- matrix(
  nrow = length(strings), ncol = nLooks,
  dimnames = list((rows), (cols)), 
  data = rep(0, (length(strings) * nLooks)))

for (i in cols) {
  interim[, i] <- str_subset(colnames(outputs), i)
}

colnames(interim) <- sprintf("Interim_Look_%d", (1:nLooks))
colnames(simulation_results)[1] <- c("TrialNum")
colnames(simulation_results)[length(simulation_results)] <- c("Overall_Success")
simulation_results_trials <- head(simulation_results, trials)
results <- list(
  summary(design),
  summary(time),
  plot(design, 1) +
    theme_less(),
  plot(designSet, type = 1) +
    theme_less(),
  table(overall_success),
  simulation_results_trials)

names(results) <- c(
  "Design Summary", "Time to complete simulation", 
  "Main Design Plot", "Plot of Various Designs", 
  "Table of Overall Success", "Simulation Results")

return(results)
}

A Simulated Example

results <- interim_sim(nPatients = 1000, death1 = 0.4, death2 = 0.3, nLooks = 4,
                       analyses_scheduled = c(0.25, 0.50, 0.75, 1),
                       sided = 1, alpha = 0.025, typeOfDesign = "asOF",
                       informationRates =  c(0.25, 0.50, 0.75, 1),
                       nSims = 1000, trials = 20, seed = 1031)

Examining the Results

results[1:5]
#> $`Design Summary`
#> 
#> $`Time to complete simulation`
#>    user  system elapsed 
#>   6.892   0.611   7.496 
#> 
#> $`Main Design Plot`
#> NULL
#> 
#> $`Plot of Various Designs`
#> NULL
#> 
#> $`Table of Overall Success`
#> overall_success
#>   0   1 
#> 134 866

To examine the results, I used the kableExtra package, though this is not necessary, and simply using the following script will suffice

table_results <- results[[6]]
View(table_results)

	Interim_Look_1					Interim_Look_2					Interim_Look_3					Interim_Look_4
TrialNum	OR_1	LCL_1	UCL_1	Pval_1	Success_1	OR_2	LCL_2	UCL_2	Pval_2	Success_2	OR_3	LCL_3	UCL_3	Pval_3	Success_3	OR_4	LCL_4	UCL_4	Pval_4	Success_4	Overall_Success
1	0.936	0.565	1.55	0.797	0	0.868	0.601	1.255	0.453	0	1.036	0.767	1.398	0.818	0	0.891	0.687	1.157	0.387	0	0
2	0.87	0.519	1.459	0.598	0	0.757	0.525	1.092	0.136	0	0.745	0.551	1.007	0.056	0	0.721	0.555	0.936	0.014	1	1
3	0.59	0.35	0.996	0.048	0	0.555	0.382	0.806	0.002	0	0.676	0.5	0.916	0.011	0	0.686	0.529	0.89	0.005	1	1
4	0.652	0.385	1.103	0.111	0	0.671	0.461	0.976	0.037	0	0.725	0.533	0.988	0.042	0	0.75	0.576	0.976	0.032	0	0
5	0.674	0.398	1.142	0.143	0	0.763	0.526	1.108	0.155	0	0.835	0.617	1.132	0.246	0	0.764	0.587	0.994	0.045	0	0
6	0.525	0.309	0.892	0.017	0	0.622	0.427	0.907	0.014	0	0.668	0.491	0.909	0.01	0	0.625	0.479	0.815	0.001	1	1
7	0.742	0.433	1.269	0.275	0	0.769	0.525	1.125	0.175	0	0.76	0.558	1.037	0.083	0	0.719	0.551	0.938	0.015	1	1
8	0.661	0.394	1.108	0.116	0	0.437	0.301	0.632	0	1	0.535	0.397	0.721	0	1	0.551	0.425	0.714	0	1	1
9	1.288	0.76	2.183	0.348	0	1.113	0.769	1.612	0.571	0	0.87	0.645	1.173	0.361	0	0.771	0.596	0.999	0.049	0	0
10	0.783	0.466	1.316	0.356	0	0.701	0.484	1.015	0.06	0	0.655	0.482	0.89	0.007	1	0.614	0.472	0.798	0	1	1
11	0.577	0.344	0.968	0.037	0	0.639	0.444	0.921	0.016	0	0.56	0.415	0.754	0	1	0.577	0.446	0.748	0	1	1
12	0.538	0.315	0.919	0.023	0	0.629	0.434	0.913	0.015	0	0.576	0.426	0.78	0	1	0.561	0.431	0.731	0	1	1
13	0.79	0.474	1.315	0.364	0	0.763	0.531	1.095	0.142	0	0.789	0.588	1.06	0.115	0	0.791	0.611	1.025	0.076	0	0
14	0.535	0.318	0.902	0.019	0	0.69	0.477	0.999	0.049	0	0.583	0.43	0.792	0.001	1	0.621	0.477	0.809	0	1	1
15	0.695	0.419	1.152	0.158	0	0.744	0.516	1.073	0.114	0	0.73	0.541	0.985	0.04	0	0.64	0.493	0.83	0.001	1	1
16	1.116	0.657	1.896	0.685	0	0.763	0.526	1.108	0.155	0	0.69	0.509	0.935	0.017	0	0.69	0.532	0.896	0.005	1	1
17	0.533	0.32	0.886	0.015	0	0.585	0.407	0.839	0.004	0	0.679	0.504	0.913	0.011	0	0.622	0.48	0.806	0	1	1
18	0.613	0.364	1.033	0.066	0	0.692	0.478	1	0.05	0	0.78	0.577	1.055	0.107	0	0.705	0.543	0.914	0.008	1	1
19	0.841	0.502	1.409	0.511	0	0.768	0.531	1.11	0.16	0	0.685	0.506	0.927	0.014	0	0.665	0.511	0.864	0.002	1	1
20	0.275	0.159	0.476	0	1	0.449	0.31	0.651	0	1	0.498	0.369	0.672	0	1	0.498	0.384	0.646	0	1	1
Abbreviations:
TrialNum: Trial Number \| OR: Odds Ratio \| LCL: Lower Confidence Level \| UCL: Upper Confidence Level \| Pval: P-value

Statistical Environment

The analyses were run on:

si <- sessionInfo()
print(si, RNG = TRUE, locale = TRUE)
#> R version 4.4.3 (2025-02-28)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sequoia 15.4
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> Random number generation:
#>  RNG:     L'Ecuyer-CMRG 
#>  Normal:  Inversion 
#>  Sample:  Rejection 
#>  
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/New_York
#> tzcode source: internal
#> 
#> attached base packages:
#>  [1] splines   grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] cli_3.6.4             texPreview_2.1.0      tinytex_0.56          rmarkdown_2.29        brms_2.22.0          
#>  [6] bootImpute_1.2.2      knitr_1.50            boot_1.3-31           gtsummary_2.1.0       reshape2_1.4.4       
#> [11] ProfileLikelihood_1.3 ImputeRobust_1.3-1    gamlss_5.4-22         gamlss.dist_6.1-1     gamlss.data_6.0-6    
#> [16] mvtnorm_1.3-3         performance_0.13.0    summarytools_1.1.3    rpact_4.2.0           tidybayes_3.0.7      
#> [21] htmltools_0.5.8.1     Statamarkdown_0.9.2   car_3.1-3             carData_3.0-5         qqplotr_0.0.6        
#> [26] ggcorrplot_0.1.4.1    mitml_0.4-5           pbmcapply_1.5.1       Amelia_1.8.3          Rcpp_1.0.14          
#> [31] blogdown_1.21.1       doParallel_1.0.17     iterators_1.0.14      foreach_1.5.2         lattice_0.22-7       
#> [36] bayesplot_1.12.0      wesanderson_0.3.7     VIM_6.2.2             colorspace_2.1-1      here_1.0.1           
#> [41] progress_1.2.3        loo_2.8.0             mi_1.1                Matrix_1.7-3          broom_1.0.8          
#> [46] yardstick_1.3.2       svglite_2.1.3         Cairo_1.6-2           cowplot_1.1.3         mgcv_1.9-3           
#> [51] nlme_3.1-168          xfun_0.52             broom.mixed_0.2.9.6   reticulate_1.42.0     kableExtra_1.4.0     
#> [56] posterior_1.6.1       checkmate_2.3.2       parallelly_1.43.0     miceFast_0.8.5        randomForest_4.7-1.2 
#> [61] missForest_1.5        miceadds_3.17-44      quantreg_6.1          SparseM_1.84-2        MCMCpack_1.7-1       
#> [66] MASS_7.3-65           coda_0.19-4.1         latex2exp_0.9.6       rstan_2.32.7          StanHeaders_2.32.10  
#> [71] lubridate_1.9.4       forcats_1.0.0         stringr_1.5.1         dplyr_1.1.4           purrr_1.0.4          
#> [76] readr_2.1.5           tibble_3.2.1          ggplot2_3.5.2         tidyverse_2.0.0       ggtext_0.1.2         
#> [81] concurve_2.8.0        showtext_0.9-7        showtextdb_3.0        sysfonts_0.8.9        future.apply_1.11.3  
#> [86] future_1.40.0         tidyr_1.3.1           magrittr_2.0.3        mice_3.17.0           rms_8.0-0            
#> [91] Hmisc_5.2-3          
#> 
#> loaded via a namespace (and not attached):
#>   [1] nnet_7.3-20             TH.data_1.1-3           vctrs_0.6.5             digest_0.6.37           png_0.1-8              
#>   [6] shape_1.4.6.1           proxy_0.4-27            magick_2.8.6            fontLiberation_0.1.0    withr_3.0.2            
#>  [11] ggpubr_0.6.0            survival_3.8-3          doRNG_1.8.6.2           emmeans_1.11.0          MatrixModels_0.5-4     
#>  [16] systemfonts_1.2.2       ragg_1.4.0              zoo_1.8-14              V8_6.0.3                ggdist_3.3.2           
#>  [21] DEoptimR_1.1-3-1        Formula_1.2-5           prettyunits_1.2.0       rematch2_2.1.2          httr_1.4.7             
#>  [26] rstatix_0.7.2           globals_0.16.3          rstudioapi_0.17.1       extremevalues_2.4.1     pan_1.9                
#>  [31] generics_0.1.3          base64enc_0.1-3         curl_6.2.2              mitools_2.4             desc_1.4.3             
#>  [36] xtable_1.8-4            svUnit_1.0.6            pracma_2.4.4            evaluate_1.0.3          hms_1.1.3              
#>  [41] glmnet_4.1-8            bookdown_0.42           lmtest_0.9-40           robustbase_0.99-4-1     matrixStats_1.5.0      
#>  [46] svgPanZoom_0.3.4        class_7.3-23            pillar_1.10.2           caTools_1.18.3          compiler_4.4.3         
#>  [51] stringi_1.8.7           jomo_2.7-6              minqa_1.2.8             plyr_1.8.9              crayon_1.5.3           
#>  [56] abind_1.4-8             metadat_1.4-0           sp_2.2-0                mathjaxr_1.6-0          rapportools_1.2        
#>  [61] twosamples_2.0.1        sandwich_3.1-1          whisker_0.4.1           codetools_0.2-20        multcomp_1.4-28        
#>  [66] textshaping_1.0.0       bcaboot_0.2-3           openssl_2.3.2           flextable_0.9.7         bslib_0.9.0            
#>  [71] QuickJSR_1.7.0          e1071_1.7-16            gridtext_0.1.5          lme4_1.1-37             fs_1.6.5               
#>  [76] itertools_0.1-3         listenv_0.9.1           Rdpack_2.6.4            pkgbuild_1.4.7          estimability_1.5.1     
#>  [81] ggsignif_0.6.4          tzdb_0.5.0              pkgconfig_2.0.3         tools_4.4.3             cachem_1.1.0           
#>  [86] rbibutils_2.3           viridisLite_0.4.2       DBI_1.2.3               numDeriv_2016.8-1.1     fastmap_1.2.0          
#>  [91] scales_1.3.0            sass_0.4.9              officer_0.6.8           opdisDownsampling_1.0.1 insight_1.1.0          
#>  [96] rpart_4.1.24            farver_2.1.2            reformulas_0.4.0        survminer_0.5.0         yaml_2.3.10            
#>  [ reached getOption("max.print") -- omitted 56 entries ]

Article Citation

American Statistical Association

Rafi, Zad. (2021).'Simulation of a Two-Group Parallel-Arm RCT with Interim Analyses'. 
Less Likely. Available at /statistics/interim-analysis/.

American Medical Association

Rafi, Zad. 'Simulation of a Two-Group Parallel-Arm RCT with Interim Analyses'. Less Likely. 
/statistics/interim-analysis/. 
Published February 1, 2021.

Help support the website!

statistical reporting statistical science statistical workflow data management

Simulation of a Two-Group Parallel-Arm RCT With Interim Analyses

Setting up the Function

A Simulated Example

Examining the Results

Statistical Environment

Article Citation

See also: