NCC: An R-package for analysis and simulation of platform trials with non-concurrent controls

Platform trials evaluate the efficacy of multiple treatments, allowing for late entry of the experimental arms and enabling efficiency gains by sharing controls. The power of individual treatment-control comparisons in such trials can be improved by utilizing non-concurrent controls (NCC) in the analysis. We present the R-package NCC for the design and analysis of platform trials using non-concurrent controls. NCC allows for simulating platform trials and evaluating the properties of analysis methods that make use of non-concurrent controls in a variety of settings. We describe the main NCC functions and show how to use the package to simulate and analyse platform trials by means of specific examples.


Motivation and significance
In recent years, there has been an increasing interest in complex clinical trials to accelerate drug development [1,2,3,4].Platform trials evaluate the efficacy of several experimental treatments simultaneously, usually compared to a shared control group, while making the design even more flexible than multi-arm trials by allowing arms to enter the trial when this is ongoing.Such designs make it possible to incorporate late-emerging treatments into the study through the common infrastructure, thus speeding up the drug evaluation process.Sharing the control arm in platform trials gives rise to a particularity with respect to the control data.For a given experimental arm, the concurrent controls (CC) are trial participants allocated to the control group while the experimental treatment is active in the platform, hence with a strictly positive probability to be randomized to the respective treatment arm [5].In contrast, the non-concurrent controls (NCC) are participants recruited prior to that treatment arm entering the platform.Over the last few years, there has been a lively discussion on whether and how to use the non-concurrent controls together with the concurrent controls in the analysis of platform trials [6].On the one hand, it may be beneficial to use both CC and NCC as the efficiency of the trial can be increased and the total sample size reduced.On the other hand, using NCC can lead to biased estimates and loss of type I error control [7].
Recently, approaches to incorporate NCC while adjusting for potential time trends to control the type I error rate were proposed [8].Frequentist methods adjust for temporal changes by adding time as a covariate to the regression model [9,10].Bayesian approaches proposed in the context of non-concurrent and historical controls include the Time Machine approach and the Meta-Analytic-Predictive (MAP) Prior approach.The Time Machine is based on a Bayesian generalized linear model that smooths the control response over time [11].The MAP Prior approach, originally proposed in the context of historical controls [12,13,14], is a Bayesian down-weighting method that estimates the control effect in the trial by using a prior distribution derived from the non-concurrent data.
In complex designs, the use of software to design the trials and investigate their operating characteristics via simulations has become paramount [15].Examples of commercial software are FACTS [16], EAST [17] and Solara [18], while open-source packages include OCTOPUS [19], SIMPLE [20], MAMS [21], gsDesign [22] and rpact [23].However, to the best of our knowledge, no software, neither commercial nor open-source, is available that implements the proposed methodologies to incorporate NCC.Moreover, there is a need to provide statistical tools that allow for assessing the properties of the methods when utilising NCC and risks of bias in the estimates under a range of situations, including time trends.
We introduce the R package NCC [24] that implements existing methods from the literature to incorporate NCC in treatment-control comparisons of platform trials and enables simulation of flexible platform trials with customizable features, such as choosing the timepoints when arms enter and how many arms are planned to be evaluated.This helps users to assess trial characteristics and, particularly, to evaluate the performance of methods that include NCC across various scenarios.

Background
We consider a platform trial design with a flexible number of treatment arms allowed to enter the platform sequentially.The duration of the trial is divided into so-called periods, with a new period starting every time an arm joins or exits the trial (see Figure 1).In this package, we implemented the following methods: • Frequentist model-based approaches [9,10], which adjust for time trends by adding time as a covariate into the respective linear or logistic regression model (in form of a fixed effect, random effect, polynomial spline or a piecewise polynomial).These approaches are based on fitting the model taking into account the data of all patients recruited until the arm under study leaves the platform to estimate the effect of time.
• The Bayesian Time Machine [11], which uses a hierarchical Bayesian model and includes a covariate adjustment for time (separating the trial into buckets of pre-defined size).This method also takes into account the data of all patients recruited until recruitment to the investigated arm is completed and the arm leaves the trial.It provides a smoothed estimate of the control response rate over time using a second-order Bayesian normal dynamic linear model.
• The Meta-Analytic-Predictive (MAP) prior approach [12,13], which derives a prior distribution for the control response in the concurrent periods from the non-concurrent control data while accounting for the between-period heterogeneity by the use of a hierarchical model.
For a detailed description of the methods, we refer the reader to the corresponding methodological articles cited above.Table 2 outlines the methods implemented together with the corresponding R-function and the main reference.Here, we focus on describing the usage of these methods employing the NCC package.
In Section 3, we describe the software, distinguishing between functions for data simulation, functions for data analysis, functions for data visualisation and wrapper functions.In Section 4, we present examples to illustrate the usage of the main functions.We finish the article with conclusions in Section 5.

Software description
The NCC package is implemented in R [25] and provides functions to simulate and analyse platform trials with continuous or binary endpoints.For successful installation of the NCC package, the external JAGS library [26] needs to be downloaded and installed first.The NCC package can then be installed from either CRAN (Comprehensive R Archive Network) or GitHub using the following commands: > # devtools::install_github("pavlakrotka/NCC") > install.packages("NCC")> library(NCC) The package has an accompanying website with additional explanations and short tutorials: https:// pavlakrotka.github.io/NCC/.
The NCC package can be applied in trials with continuous or binary endpoints, and consists of 34 functions.Functions with the suffix cont are for simulation and analyses of trials with continuous endpoints, while functions with the suffix bin are for binary endpoints.The NCC functions can be grouped into three main groups according to their functionality: data simulation, analysis, and visualization and wrappers.Figure 2 outlines the package structure.
The functions datasim bin() and datasim cont() simulate patient data from platform trials.The analysis functions include simple approaches (naive pooling or separate analysis [27]), frequentist model-based methods with adjustments for time using fixed or random effects or polynomial functions, and Bayesian approaches.This article focuses on the functions fixmodel bin(), MAPprior bin(), timemachine bin() for testing treatment efficacy compared to a control using CC and NCC data in trials with binary endpoints, with analogous functions ending in cont for continuous endpoints.Other functions such as plot trial() and sim study par() visualise platform trial data and perform simulation studies, respectively.
In what follows, we describe the usage and features of such functions, including an example.Most functions in the NCC package use common arguments.Table 2 summarises the functions described in this article, and Table 3 provides a brief description of the main arguments of these functions and their expected form.We focus mainly on the functions for binary endpoints, but the package website also details the remaining functions.In addition, further explanations regarding the methods and underlying assumptions (e.g., prior distributions in Bayesian methods) can be found in the NCC package manual.

Data simulation
Platform trials with a binary outcome are simulated using datasim bin(), as follows: The function takes several arguments including the number of experimental treatment arms (num arms), their sample size (n arm), timings of arms entering the trial (d), treatment effects in terms of odds ratios (OR), and control response (p0).Sample sizes in each experimental arm are assumed to be equal.Participants are indexed by entry order, assuming that at each time unit exactly one participant is recruited and the time of recruitment and observation of the response are equal.Participants are assigned to the arms according to block randomization (with block of sizes equal to period blocks times the number of active arms in that period) using an allocation ratio of 1:1: . . .:1 in each period.The function simulates trial data in the presence of time trends.The time trend pattern can be specified by means of the argument trend, choosing from the options linear, stepwise, inverted-u (with a peak at time N peak that then needs to be specified) and seasonal (with then the additional required argument n wave cycles), while the strength of the trend is indicated by the argument lambda, e.g. in the case of the linear trend it would refer to the slope.For more details, see the description in the corresponding functions.The argument full specifies if the output is given solely in the form of a data frame (if full=FALSE) with the trial data, or if the full output is provided in the form of a list, including the trial data and additional information (full=TRUE).Finally, the input parameters can be checked for errors by check.If check=TRUE, the function returns helpful error messages in case of a wrong input.
By default, the function returns a data frame with the simulated trial data containing the columns: • j -participant recruitment index • response -response for participant j • treatment -indicator of the treatment participant j was allocated to • period -indicator of the period in which participant j was recruited

Analysis approaches
The main analysis approaches implemented in the NCC package are the frequentist model-based approach, the Time Machine, and the MAP prior approach.The arguments common to all analysis functions are data for providing the data frame with the trial data, consisting of columns named "response", "treatment" and "period"; arm, the indicator of the experimental treatment arm to be compared to the control and alpha, the one-sided significance level for the frequentist methods or decision boundary for the Bayesian approaches.
The MAP approach requires further arguments to define the type of MAP approach to be used: opt (either 1 or 2) to specify whether the MAP prior treats the non-concurrent control data as if they are from one (if opt=1) or multiple sources (here periods) (if opt=2) for the hierarchical model, robustify to indicate whether the robustified MAP approach [12] is to be used, prior prec tau to specify the dispersion parameter for the half-normal prior for the between period heterogeneity, prior prec eta to specify the dispersion parameter of the normal prior for the log-odds of the controls; as well as some further arguments (not shown in the code example) to set up the underlying JAGS model [26].
In the Time Machine, the input arguments specify the precision parameters in the normal prior distributions for the control response (prec eta) and the treatment effect (prec theta), as well as the parameters a and b for the Gamma prior distribution regarding the time effect (tau a and tau b).Furthermore, the argument bucket size allows defining the length of the time bucket to be used to adjust for the time effect.
The functions perform the respective analysis of the given dataset to compare the efficacy of a specific treatment against control, thus testing the null hypothesis for arm of H 0 : log(OR arm ) ≤ 0 against the one-sided alternative H 1 : log(OR arm ) > 0. To test H 0 , the frequentist model-based and the time machine approaches take into account all trial data until the treatment arm under study leaves the trial (i.e., including even data from unfinished arms that joined the platform up to the final analysis of the given treatment arm).The MAP approach uses all available control data and the evaluated treatment arm to make the comparison.
The output of the analysis functions is a list containing the one-sided p-value, estimated treatment effect and (1-2•alpha)•100% confidence interval (posterior probability of H 0 , posterior mean of the effect and credible interval for the Bayesian approaches), and an indicator of whether the null hypothesis was rejected.Functions for frequentist model-based approaches additionally output the fitted model.

Trial data visualization and wrapper functions
The visualization function plot trial() uses as an argument a vector with indicators of assigned arms for each participant, ordered by time (treatments) and outputs a plot of the trial progress over time.
The main wrapper function is sim study par(), which permits to efficiently run simulation studies using parallel computing.The code is parallelized on replication level, i.e. replications of one scenario are distributed over the available cores.Using this function requires creating a data frame with the desired simulation scenarios beforehand, which is then used as input to the function (argument scenarios) as follows: > sim_study_par(nsim, scenarios, arms, models = c("fixmodel", "sepmodel", "poolmodel"), endpoint, perc_cores = 0.9) The remaining arguments specify how many times each scenario is replicated (nsim), the treatment arms that will be evaluated (arms), the considered analysis approaches (models), the type of endpoint (endpoint) and the approximate percentage of available cores that to be used for the simulations (perc cores).The output of sim study par() is a data frame with all considered scenarios and corresponding results, that is, the probability to reject the null hypothesis, the bias, and the mean squared error (MSE) of the treatment effect estimates for each evaluated treatment arm and each considered analysis method.

Illustrative Examples
Assume a platform trial with a shared control and three experimental arms entering the trial sequentially.When arm 3 ends, we want to evaluate its efficacy compared to the control.To increase the precision of the treatment effect estimate, we want to make use of the NCC data.Suppose that the data of such a hypothetical trial is given by a data frame, trial data, > head(trial_data) j response treatment period where the patient index is given in the first column, followed by the binary responses, the treatment arm indicator and finally the period allocation.We then run

> plot_trial(trial_data$treatment)
whose output is Figure 3 and visualises the entry and exit of arms over time as well as the overlaps between arms.
To compare the efficacy of treatment 3 against control, we first consider a frequentist model that adjusts for time trends.Using fixmodel bin(), we fit a logistic regression that includes the period as a categorical covariate in the model to compare arm 3 against control, utilising NCC.To do so, the user can run > fixmodel_bin(data=trial_data, arm=3, alpha=0.025)$p_val [1] The list contains the p-value (p val) corresponding to testing the null hypothesis H 0 : log(OR 3 ) ≤ 0, the estimated treatment effect (treat effect) on the log-scale (i.e., log(OR 3 )) and the respective lower and upper limits of the (1-2•alpha)•100% confidence interval (lower ci, upper ci).The list also includes a binary indicator of (p val < alpha), i.e., whether the null hypothesis can be rejected on the specified significance level (reject h0).In the considered case, the null hypothesis is rejected, implying that treatment arm 3 is efficacious.Furthermore, the output includes the fitted logistic regression model (model), here omitted for simplicity.However, the fitted model can be further analysed using the conventional R functions for generalized linear models, such as summary(fixmodel bin(data=trial data, arm=3)$model).
If, however, a Bayesian approach to down-weight the NCC over the CC data is under consideration, one could specify the prior for the control arm using the non-concurrent control data employing the MAP Prior approach.This analysis is performed with the NCC package as follows: Modeling by means of the Time Machine is enabled through timemachine bin(): In the outputs of the Bayesian approaches, the p-value (p val) is given by the posterior probability that log(OR 3 ) ≤ 0. The treatment effect (treat effect) refers to the posterior mean of the log(OR 3 ) and the lower and upper confidence limits (lower ci, upper ci) to the limits of the (1-2•alpha)•100% credible interval for log(OR 3 ).Finally, reject h0 indicates whether the posterior probability given by p val is less than alpha.
An example of how to use the NCC package to perform a simulation study can be found in the supplementary material.

Impact and conclusions
The use of non-concurrent data has been a subject of discussions in recent years [7,3,6].Modelling approaches have been proposed to include non-concurrent control (NCC) data in platform trials to deal with time trends [9,10,29,11], and methods previously considered to incorporate historical controls have been suggested in this context [8,12,28,27].The NCC package provides the implementation of methods to incorporate non-concurrent controls in platform trials.Moreover, the user can simulate a large number of trials under different scenarios and evaluate the properties and robustness of the methods according to the assumptions.The package can help statisticians in industry or regulators to check the use of NCC and provide a basis for discussing the trial design under different scenarios.
In this article, we have described the functionalities of the NCC R package for the design and analysis of platform trials using NCC.To our knowledge, this is the only R package with tools for assessing the properties of methods that incorporate NCC and simulating platform trial data in the presence of time trends.The package is available on CRAN; examples and tutorials can be found on the website.In future work, we plan to implement allocation rates other than equal allocation and add interim analyses.Furthermore, we will consider extending the models to survival platform trials and their implementation in the package.Table 3: Main input arguments together with a short description of their purpose and type, and functions included in this article using these arguments.Unless stated otherwise, the parameters are assumed to be single values.Detailed explanations can be found at https://pavlakrotka.github.io/NCC/.

Figure 1 :
Figure 1: Platform trial over time.Trial with K arms and S periods.The x-axis refers to the number of participants recruited in the trial, also interpreted as time.
MAPprior bin(), MAPprior cont() prior prec eta Double.Precision parameter of the normal hyperprior, the prior for the hyperparameter mean of the control reponse MAPprior bin(), MAPprior cont() prec theta Double.Precision of the prior regarding the treatment effect timemachine bin(), timemachine cont() prec eta Double.Precision of the prior regarding the control response timemachine bin(), timemachine cont() tau a Double.Parameter a of the Gamma distribution for the precision parameter τ in the model for the time trend timemachine bin(), timemachine cont() tau b Double.Parameter b of the Gamma distribution for the precision parameter τ in the model for the time trend timemachine bin(), timemachine cont() prec a Double.Parameter a of the Gamma distribution regarding the precision of the responses timemachine cont() prec b Double.Parameter b of the Gamma distribution regarding the precision of the responses timemachine cont() bucket size Integer.Number of participants per time bucket timemachine bin(), timemachine cont()

Figure 2 :
Figure 2: Scheme of the NCC package functions by functionality.

Figure 3 :
Figure 3: Output of the function plot trial().

Figure S1 :
Figure S1: Results of the simulation study.Type I error rate, bias and MSE of the estimates of log(OR 4 ) for treatment arm 4 with respect to the strength of the time trend.