PLoS ONE
Home Optimal two-stage design of single arm Phase II clinical trials based on median event time test
Optimal two-stage design of single arm Phase II clinical trials based on median event time test
Optimal two-stage design of single arm Phase II clinical trials based on median event time test

Competing Interests: The authors have declared that no competing interests exist.

Article Type: research-article Article History
Abstract

The Phase II clinical trials aim to assess the therapeutic efficacy of a new drug. The therapeutic efficacy has been often quantified by response rate such as overall response rate or survival probability in the Phase II setting. However, there is a strong desire to use survival time, which is the gold standard endpoint for the confirmatory Phase III study, when investigators set the primary objective of the Phase II study and test hypotheses based on the median survivals. We propose a method for median event time test to provide the sample size calculation and decision rule of testing. The decision rule is simple and straightforward in that it compares the observed median event time to the identified threshold. Moreover, it is extended to optimal two-stage design for practice, which extends the idea of Simon’s optimal two-stage design for survival endpoint. We investigate the performance of the proposed methods through simulation studies. The proposed methods are applied to redesign a trial based on median event time for trial illustration, and practical strategies are given for application of proposed methods.

Parkand Hutson: Optimal two-stage design of single arm Phase II clinical trials based on median event time test

Introduction

The primary objective of Phase II clinical trials is to test whether the therapeutic intervention actually works in treating a disease or indication. The therapeutic efficacy has been often quantified by response rate in Phase II settings, for example, overall response rate defined as the proportion of subjects who achieve a confirmed complete response or partial response determined by RECIST 1.1 [1] or survival probability at a certain year defined as the proportion of patients alive (and without recurrence) at certain year after the start of treatment. In practice, Simon’s two-stage minimax and optimal designs are widely used for binary endpoint [27]. Simon’s designs allow early trial termination due to futility while Fleming’s design allows early trial termination due to futility and superiority in Phase II trials [8, 9]. Successful results in Phase II trials lead to proceed to confirmatory Phase III trials with more extensive development. Therefore, it is necessary to validate and use the surrogate endpoint for overall survival, which is the gold standard endpoint for Phase III study. When considering the survival probability which enforces the survival time to the binary endpoint, we need much caution to analyze the results. Some designs address incomplete follow-up for some subjects by the time of the interim analysis to assess the survival probability [1013]. Data augmentation can be used to address the timing of events for late-onset outcomes by imputing missing outcomes. [14, 15].

Surprisingly, a recent FDA report shows 22 case studies investigating disagreement in the results between early and confirmatory phases [16]. Some cases showed unexpected failures in Phase III study from the promising results on clinical outcomes in the Phase II study (e.g., iniparib). Therefore, some investigators strongly desire to use time-to-event endpoint to evaluate the therapeutic efficacy of the drug for Phase II trials. They are interested in improvement of median survival compared to standard therapy. Finkelstein et al. [17], Sun et al. [18], Jung [19], Kwak and Jung [20], Wu et al. [21] propose designs with one-sample log-rank test which compares the survival distributions between prespecified null reference and the desired target. This idea comparing the survival distributions allows to conduct for hypothesis testing for the median survivals only when the survivals are assumed to follow exponential distribution. Wu [22] proposes a single-arm Phase II clinical trial design under a class of parametric cure models. Chu et al. [23] proposes the design for immunotherapy trials with random delayed treatment effect based on survival time.

In this paper, we propose new methods for a single arm Phase II study, which provide practical strategies for testing if there is an improvement of median survivals from a new drug compared to the standard therapy. The proposed median event time test uses the distribution of sample median to obtain the required sample size and threshold for the hypothesis testing at target type I and II errors. It provides the decision rule, which is very simple and straightforward, comparing the observed median survival with the identified threshold at the end of trial. This can be easily implemented with our shiny application and R codes built along with the method development. Moreover, this median event time test is extended in group sequential manner. It follows the idea of Simon’s optimal two-stage design, in that the expected sample size is minimized under the null hypothesis, but considers survival endpoint to see the improvement based on the median event time test. This allows us to monitor futility of the drug based on median survival time and stop the trial early, which makes the design more efficient and practical.

The rest of the paper is organized as follows. In Methods Section, we provide a median event time test and propose an optimal two-stage design based on median event time test. In Results Section, simulation studies are presented, and a trial example is provided to illustrate the application of our methods. Lastly, we provide some comments in Discussion and concluding remarks in Conclusion.

Methods

Median event time test

Let Y1, …, Yn be random variables from an exponential distribution with mean μ. Then, the median ϕ is equivalent to μ log 2. Motivated by single arm Phase II studies whose primary endpoint is time-to-event, we formulate a hypothesis test based on median event time with H0: ϕ = ϕ0 versus Ha: ϕ = ϕ1 for some values of ϕ0 and ϕ1. In practice for intervention study, we use median time of standard drug or therapy and the expected median time from the intervention (for improvement) to specify the values of ϕ0 and ϕ1, respectively. Let ϕ^n(y) be a sample median event time obtained from a sample of size n. Then, for any λ ≥ 0, the rejection region for the test is {y:ϕ^n(y)>λ}. For any α and β in unit interval (0, 1), this test is the level α test with power 1 − β such that Pr{ϕ^n(y)>λ|H0}=α and Pr{ϕ^n(y)λ|Ha}=β. Since the first conditional probability indicates the probability of rejecting the null hypothesis when the null hypothesis is true, the value of α indicates the type I error rate. Also, the second conditional probability indicates the probability of failure of rejecting null hypothesis when the null hypothesis is not true, and the value of β indicates the type II error rate. The distribution function of the sample median event time is derived in the following theoretical result.

Theorem 1 Let Y1, …, Yn be random variables from an exponential distribution with median ϕ. Then, the probability density function (pdf) of sample median of Y1, …, Yn is either

for n = 2k + 1 or
for n = 2k.

The proof is in Appendix A. Theorem 1 provides a cumulative distribution function (cdf) of median event time given by Gn(λ)=0λgn(m)dm, where gn denotes the pdf of the sample median, for λ ≥ 0. Since the cdf of sample median event time has no closed form, a numerical search over grid is required to identify sample size n and threshold λ such that the empirical errors α^(n,λ) and β^(n,λ) are close to the nominal target values of type I and II error rate (i.e., α and β), respectively. The empirical errors are calculated as

where Gn(⋅|H0) and Gn(⋅|Ha) denote the cdf of sample median of event time under null and alternative hypotheses, respectively. It implies that the identified sample size n justifies to achieve {1-β^(n,λ)}*100% power based on one-sided test with a significance level of α^(n,λ) to detect improvement of the sample median time of the experimental drug against the median time of standard drug. This test states that at the end of trial, i.e., based on sample of size n, the null hypothesis is rejected if ϕ^n>λ and we argue that experimental drug increases in median event time from ϕ0 to ϕ1. We call this median event time test.

Let’s consider a clinical trial with hypothesis testing of median progression free survivals (PFS) ϕ0 = 10 months versus ϕ1 = 17 months. Suppose that maximum number of patients for this study is 100. We consider a grid search over the integer n between 1 and 100 and a real number λ between 10 and 17 with the increment 0.1. For the target error rates of α = 0.05 and β = 0.2, our numerical study results in n = 42 and λ = 14.1 months minimizing deviation of the empirical errors from the target rates, i.e., {α^(n,λ)-α}2+{β^(n,λ)-β}2. Fig 1 shows results for numerical search of n and λ, and Table 1 summarizes the decision rule for testing median survival ϕ0 versus ϕ1. In Table 1, scenarios with different value of ϕ0 and ϕ1 are considered. The choice of ϕ0 and ϕ1 is determined intentionally. To see the performance of method, we fixed ϕ0 or ϕ1 to vary ϕ1 or ϕ0, respectively.

Plot of α^(n,λ) and β^(n,λ) for median event time test with ϕ0 = 10 and ϕ1 = 17 months: Left panel shows the result when n = 42 and right panel shows the results when λ = 14.1 months.
Fig 1

Plot of α^(n,λ) and β^(n,λ) for median event time test with ϕ0 = 10 and ϕ1 = 17 months: Left panel shows the result when n = 42 and right panel shows the results when λ = 14.1 months.

Table 1
Numerical search is done over the integer n between 1 and 100 and a real number λ between ϕ0 and ϕ1 with the increment 0.1.
Decision rules for hypothesis testing of median PFS ϕ0 months versus ϕ1 months at target error rates of α = 0.05 and β = 0.2.
ϕ0ϕ1nλ alternatives alternatives
10174214.10.0490.2019
8172112.90.0490.1974
8143811.50.0480.2016
37165.20.0420.2007
36244.70.0470.2027
35484.20.0420.2030

We have proposed median event time test for exponential survivals. The proposed method is extended for other parametric survival distributions. We provide the distribution function of the sample median event time for uniform and weibull survivals in Appendix B, which determines the decision rule.

This proposed median event time test is attractive with three reasons. First, this is newly proposed for an exact median event time test using the observed median event time. Second, author provides a software to calculate sample size and identify the threshold for the test, which is open for public use (https://yeonhee.shinyapps.io/METTshinyapp/). It does not require complicated statistical analysis for decision but implements easily. The simple and straightforwardly interpretable rule can save lots of time for drug development. Third, it has potential to extend for many applications. For example, the decision rule can be used to monitor futility in group sequential designs (for Phase II or III studies), and the threshold λ provides a good candidate (or range) for the Bayesian monitoring rule. Specifically, it provides foundation for optimal two-stage design based on median event time test which is described later.

Before we move to next section, we provide a remark on the median event time test. To develop the exact median event time test, we considered hypothesized values for median of survivals Yi = min(Si, Ui), where Si denote the time-to-event and Ui denote the administrative censoring time for the ith patient. Our setting in this section uses Yi and does not require information for accrual rate and follow-up time. However, when we consider a hypothesis testing with median of survivals Si, we should care the censoring information. Assume that the time to arrival of the patient and survival time (i.e., Si) follow some exponential distributions. Then, Ui, which is the time to arrival of the last patient plus follow-up time minus time to arrival of the ith patient, follows another exponential distribution. We notice that minimum of two exponential random variables (i.e., Yi) follow some exponential distribution. Therefore, this setting for hypothesis testing with Yi looks reasonable with the exponential survival assumption for Si and the right censoring.

Optimal two-stage design based on median event time test

The proposed median event time is straightforward for clinicians to interpret and justify the sample size for the exact level of errors. It is critical, especially in rare disease trials, to obtain promising evidence with the target error rates and minimize the expected sample size. Moreover, from an ethical and practical viewpoint, it is desirable to stop the trial early if the therapeutic intervention is not effective. This motivates to propose a single arm Phase II trial with an interim planned when the total accrual reached n1 patients. The final analysis will be performed after the follow-up of all planned number of n patients. A two-stage design using median event time test is proposed as follows. In the first stage (i.e., at interim), we determine go/no-go of the trial based on the observed median event time for those n1 patients. When the observed median event time is less than or equal to the threshold λ, the trial is stopped for futility. Otherwise, the trial continues to enroll (nn1) patients in the second stage. At final analysis, we argue that the drug is sufficiently promising to evaluate against the standard therapy if the observed median event time based on all trial data is larger than the threshold.

The expected sample size for the two-stage design above is EN = n1 + (1 − PET)(nn1), where PET denotes the probability of early termination after the first stage. EN depends on n, n1 and λ, and the optimal two-stage design using median event time test is proposed in that EN is minimized under the null hypothesis. In the following, we describe how to specify n, n1 and λ for the optimal two-stage design. Let M be a maximum sample size for the study, e.g., this can be specified by clinicians or by sample size calculator for one-arm nonparametric statistics (https://stattools.crab.org/Calculators/oneNonParametricSurvival.htm). Let ϵ1 and ϵ2 denote the acceptable difference between the estimated type I and II error rates from the trial design and the target error rates. For each prespecified values of ϕ0, ϕ1, α and β,

    Step 1For each n1 between 1 and M − 1, search n and λ minimizing {α^(n1,n,λ)-α}2+{β^(n1,n,λ)-β}2, where

    over n1 < nM and ϕ0 ≤ λ ≤ ϕ1.

    Step 2Choose an optimal pair of (n, n1, λ) minimizing EN among the pairs satisfying |α^(n1,n,λ)-α|<ϵ1 and |β^(n1,n,λ)-β|<ϵ2.

As seen in median event time test, we don’t have closed forms for the marginal or joint distributions of sample median event times to calculate empirical probabilities in (1) and (2). The numerical search is used in the first step. The criteria given in the second step targets at minimizing the expected sample size, EN. This optimality criteria can be modified according to the study objective, e.g., the design minimizes expected total study length. Moreover, our design is flexible to use different thresholds, λ1 for the interim analysis and λ2 for the final analysis. Investigators fix a threshold λ1 to stop early for futility and search threshold λ2 satisfying the target error rates (or they can search both thresholds):

Although the proposed method does not assume specific survival distribution, the probability calculation requires to specify the survival distribution. We can borrow information from the previous research to specify the survival distribution (e.g., exponential, uniform, or weibull distribution). As an example to illustrate the optimal two-stage design based on median event time test, we assume that the survivals follow from an exponential distribution with median ϕ and will be right-censored for subjects who have not yet met the criteria at the date of the last valid disease assessment. Setting with median survivals ϕ0 = 10 and ϕ1 = 17 for the standard therapy and a new therapy, respectively, and target error rates α = 0.05 and β = 0.2, we consider the maximum allowable sample size M = 50 (which is obtained from a sample size calculator for one-arm nonparametric statistics). Patients arrived according to a Poisson process with the accrual rate of 1.04 patients per month. We continue follow-up for 24 months after the last patient was enrolled. Empirical estimate of type I and II errors were obtained based on 1000 simulation trials. Then, our method with ϵ1 = 0.02 and ϵ2 = 0.025 attains the optimal results at n1 = 32, n = 42 and λ = 15.2. It implies that our two-stage design accrue n1 = 32 patient for the first stage. At interim, if the observed median event time based on these n1 patients is less than or equal to λ = 15.2 months, the study will be early stopped for futility. Otherwise, additional nn1 = 10 patients will be accrued in stage 2, resulting in a total sample size of n = 42. At the end of trial, if the observed median event time based on all n = 42 patients is larger than λ = 15.2 months, we reject the null hypothesis and claim that the treatment is sufficiently promising. Different null and alternative median event times can be considered, and the results of the optimal two-stage design are summarized in Table 2.

Table 2
Note M denotes the maximum sample size for the study calculated from one-arm nonparametric statistics. Notations such as EN0 and PET0 are used to denote EN and PET, respectively, under the null hypothesis.
Summary results of optimal two-stage design for hypothesis testing of median survivals ϕ0 versus ϕ1 when survivals are assumed to follow an exponential distribution.
n is searched
ϕ0ϕ1Mn1nλ alternatives alternatives EN0PET0
101750324215.20.060.22232.60.94
81726222612.10.0650.21222.260.94
81445293712.60.0670.19929.540.93
372112215.10.0630.20412.570.94
3631133050.0620.22514.050.94
355417444.90.0670.22118.810.93
n = M is prespecified
ϕ0ϕ1Mn1λ alternatives alternatives EN0PET0
1017503415.30.0620.2134.990.94
817262312.40.0660.20723.300.93
814452812.40.0620.22429.050.94
3721125.10.0620.19812.560.94
36311350.0610.22314.100.94
3554174.90.0630.22519.330.94

The proposed optimal two-stage design finds, in the first stage, the total sample size n and threshold λ for the given interim size n1 and target error rates (i.e., α and β). In other words, both n and λ are searched. In case where study has a certain planned total sample size, the proposed design is tailored to identify the threshold λ in the first stage for the given interim size n1, total sample size, and target error rates. In other words, only λ is searched, and n is prespecified. As an example, the value of M obtained from one-arm nonparametric statistics in Table 2 is used for the prespecified total sample size n in scenarios. The results are also summarized in Table 2.

Replacing M = 100 in Table 1 with the value obtained from one-arm nonparametric statistics, we obtained the same results. Tables 1 and 2 show that decision rule of two-stage design yields smaller expected sample size under the null than the one-stage design except the case with ϕ0 = 8 and ϕ1 = 17 requires 2 or 3 more patients to be enrolled. When ϕ0 = 3 and ϕ1 = 5, the total sample size n for two-stage design is smaller than one-stage design, and the expected sample size under the null is much smaller. More patients can avoid from treating the ineffective drug under the two-stage design.

Results

We investigated the performance of the proposed optimal two-stage design based on median event time test. First, back to the setting we examined for Table 2 with null median 10 months and alternative median 17 months, we are interested how the operating characteristics are changed with follow-up time and accrual rate. Given the specified rule from Table 2 (i.e., n1 = 32, n = 42 and λ = 15.2), we considered follow-up times 6, 9, 12, 15, 18, 21, 24, 27, 30, 36 months and accrual rates 0.5, 0.6, 0.7, 0.8, 0.9, 1.04, 1.1, 1.2, 1.3, 1.5 patients per month. The results are described in Fig 2. The proposed design is robust to the follow-up time but impacted by the accrual rate. As more patients are accrued and we have more available events (i.e., the observed median survival is less), it is more likely to stop earlier due to futility, which increases type II error rate but decrease type I error rate. Therefore, when the study is designed by using the optimal two-stage design based on median event time test, we need to have more reliable information for accrual rate, for example, the previous history for accrual rate and close collaboration with clinicians investigating the study can provide the right design.

Plot of operating characteristics (i.e., expected sample size and error rates) in follow-up time and accrual rate when testing null median 10 months versus alternative median 17 months.
Fig 2

Plot of operating characteristics (i.e., expected sample size and error rates) in follow-up time and accrual rate when testing null median 10 months versus alternative median 17 months.

Moreover, we examined the performance of the proposed design with non-exponential survivals. Table 3 provides the results when survivals are generated from either uniform or weibull distributions. We used uniform distribution with minimum value 0 and maximum value 2ϕ to generate flat survivals and Weibull distribution with scale parameter ϕ/(log 2)1/2 and shape parameters 2 to obtain an increasing hazard. Compared with the results in Table 2 where survivals are assumed to follow exponential distribution, we can see the different decision rule (n1 and λ) and required sample size (n) for nonexponential survival times.

Table 3
Note M denotes the maximum sample size for the study calculated from one-arm nonparametric statistics, and EN0 denotes EN under the null hypothesis.
Summary results of optimal two-stage design for hypothesis testing of median survivals ϕ0 versus ϕ1 when survivals are assumed to follow non-exponential distributions (i.e., uniform or weibull distribution).
n is searchedn = M is prespecified
ϕ0ϕ1Mn1nλ alternatives alternatives EN0n1λ alternatives alternatives EN0
Uniform101750175010.90.0520.21118.7218110.0610.19719.95
8172614269.20.0490.20314.59149.20.0590.21314.71
8144514428.70.0640.20615.79148.80.0540.20915.67
37217203.50.0490.1977.6473.50.0510.1947.71
36317303.40.0530.2048.2273.40.0530.28.27
35547463.30.0570.2259.2283.30.060.19610.76
Weibull101750182112.10.0470.20518.141611.20.0670.22418.28
8172613239.40.0510.20813.51149.60.0320.19414.38
8144514249.40.0520.20114.52149.10.0670.19916.08
37216213.60.0430.2176.6563.50.0610.2096.92
36317163.70.0520.1967.4773.60.0460.1968.10
35548163.70.0540.1998.4373.50.0430.2059.02

We compared the proposed two-stage designs (called METT2E, METT2U, and METT2W for optimal two-stage designs assuming exponential, uniform, and weibull survivals, respectively) with three designs: (1) a restricted KJ design, called r-KJ [12], which tests for whole survival curves based on one-sample log-rank test statistics proposed by Kwak and Jung [20]; (2) a two-stage design minimizing expected sample size, called OES [11]; (3) a two-stage design minimizing expected total study length, called OETSL [11]. Both OES and OETSL use the normalized Z-statistic to test and determine decision rules at each stage. Because r-KJ, OES, and OETSL require the clinically meaningful time point, in our simulations we used 6 months. The null and alternative values of the 6 months survival probability were determined by the survival distribution. We assumed the follow-up time is 24 months and accrual rate is 1.04 patients per month, which is the same as the setting of Tables 2 and 3. Table 4 provides the comparison results of EN0 and PET0 for several hypothesis testings based on median survival times ϕ0 and ϕ1. As seen in Tables 2 and 3, both METT2E and METT2U are likely to enroll more patients in the trial (i.e., n is close to M), compared to METT2W in most cases. The number of patients in the first stage (i.e., n1) obtained from METT2U or METT2W is smaller than that of METT2E. Thus, METT2W yields smaller expected sample size than METT2E and METT2U. Since r-KJ does not restrict to certain survival distribution and both OES and OETSL assume the weibull survivals, the results of METT2W are comparable with r-KJ, OES, and OETSL. In most cases, METT2W uses smaller expected sample size and stops the trial early for futility when therapeutic intervention is not effective. METT2E and METT2U also yielded smaller expected sample size and larger probability to stop trial for futility under the null hypothesis compared to r-KJ.

Table 4
Note M denotes the maximum sample size for the study calculated from one-arm nonparametric statistics, EN0 and PET0 denote EN and PET, respectively, under the null hypothesis.
Comparison results of the proposed designs (METT2E, METT2U, and METT2W) with r-KJ, OES, and OETSL.
EN0
ϕ0ϕ1MMETT2EMETT2UMETT2Wr-KJOESOETSL
10175032.618.7218.1472.3849.0849.00
8172622.2614.5913.5134.0625.7425.54
8144529.5415.7914.5254.5231.5231.38
372112.577.646.6512.837.006.00*
363114.058.227.4717.588.416.90
355418.819.228.4329.7110.609.77
PET0
ϕ0ϕ1MMETT2EMETT2UMETT2Wr-KJOESOETSL
1017500.940.950.950.570.640.61
817260.940.950.950.560.690.62
814450.930.940.950.590.690.63
37210.940.950.960.540.760.94*
36310.940.950.950.540.760.18
35540.930.940.950.560.720.40

* Clinical time point of 5.7 months is used.

Application: Trial illustration

We provide an application of the proposed methods with a trial NCT00780494. This is a Phase II, single arm, single-institution study of bevacizumab in combination with carboplatin and capecitabine for patients with unresectable or metastatisc gastroesophageal junction or gastric cancers. The study was started on February 2009 and completed on December 2017 to accrue enrollment of 35 participants. It enrolled two patients per month and follow up at time of study completion for 12 months. The study objective is to investigate the efficacy of the addition of bevacizumab to standard chemotherapy based on progression-free survival (PFS), which is defined as the duration of time from the start of treatment to time of disease progression or death. The median PFS is 5 months with the standard treatment therapy and the study team hypothesized the addition of bevacizumab to standard chemotherapy improve PFS by 90%, i.e., the median PFS is 9.5 months (https://clinicaltrials.gov/ProvidedDocs/94/NCT00780494/Prot_SAP_000.pdf).

We applied the proposed methods to redesign the trial based on median PFS. Target error rates are α = 0.05 and β = 0.2, and the maximum sample size is 35 obtained from one-arm nonparametric test. First, we considered a case where investigators want a trial without interim monitoring to collect the data with all 35 patients. The exact median event time test was applied, and provided the decision rule with threshold 7.5 months for a sample of size 29. This yielded 80.48% power. Second, we supposed that investigators want an interim for futility monitoring. Assuming the exponential survivals, we applied optimal two-stage design based on median event time test. We set ϵ1 = ϵ2 = 0.005 to get closer empirical error rates to the target rates, and it determined n1 = 33, n = 34 and λ = 8 months with α^=0.047 and β^=0.196. The rule implies that an additional patient will be enrolled after the first stage, and it is inappropriate from a practical point of view. Rather than considering two-stage design with a threshold, two-stage design with different thresholds λ1 and λ2 would be suggested to obtain the reasonable decision rule. Table 5 provides the summary results of the optimal two-stage design, which search n1, n, and λ2 for a fixed threshold λ1 for the first stage. When survivals are assumed to follow the exponential distribution and threshold for the first stage is 5.5 months, the optimal two-stage design based on median event time test determined n1 = 17, n = 32, λ2 = 8.5 months. Specifically, the interim analysis will be performed based on the first 17 enrolled patients by comparing the observed median PFS with 5.5 months. If the observed median PFS is smaller than or equal to 5.5 months, the study is stopped early for futility. Otherwise, additional 15 patients are enrolled to have the total sample size of 32. At the final analysis, the observed median PFS with 32 patients is compared with the threshold 8.5 months, and we claim the addition of bevacizumab to the standard therapy improves PFS if the observed median PFS is larger than the threshold. This trial decision rule yielded power of 80.2%. When the true median PFS is 5 months, the expected sample size for this trial (i.e., EN) is 17.77 and the probability of early stopping (i.e., PET) is 94%. The required sample size and decision rule are changed for the different choice of λ1.

Table 5
Summary results of optimal two-stage design with a fixed threshold λ1 when a maximum sample size M is 35.
n1nλ1λ2 alternatives alternatives EN0PET0
Exponential17325.58.50.0510.19817.770.949
202968.70.0500.20120.450.950
Uniform16245.55.80.0540.20216.430.946
183165.60.0510.20018.660.949
Weibull15315.55.90.0520.20215.830.948
163465.70.0540.20416.970.946

In this trial illustration, PFS was assumed to follow exponential distribution, and the decision rule was identified for the trial under the assumption. We further investigated with the nonexponential assumption. As seen in Table 5, decision rules, especially interim monitoring rules, for exponential versus nonexponential survivals are different, which implies that survival distribution assumption matters to design the clinical trials. In practice, earlier phase trials or experts’ knowledge would be able to provide information of survival distribution for Phase II study. We can borrow the information to identify the appropriate decision rule for the study.

Discussion

Most existing methods generally assume exponential survival distribution to develop statistical methods or design based on median survival time for convenience [24, 25]. From our simulation studies, we found that operating characteristics of the design depend on the survival distribution, and the decision rule of median event time test is different according to the survival assumption. Type I or II error rate can be inflated when survival distribution is misspecified. It is critical for median event time test to specify survival distribution for robust clinical trial research. The specification of survival distributions can be determined by the relevant pilot study or historical trials.

We have proposed several designs for median event time test: single stage, two stage with single threshold, and two stage with two different thresholds. According to investigators’ interest, study objective, and trial assumption, further simulation investigation may be required to explore more rules. Close collaboration with clinicians as well as statistical practice will guide the better and ethic design for the study.

Conclusion

We proposed methods to test if there is an improvement of median survivals from a new drug compared to the standard therapy. The proposed median event time test provides the required sample size to control type I and II errors for the hypothesis testing based on the median event time. It also provides a decision rule, which is very simple and straightforward, comparing the observed median survival with the identified threshold by the test. Shiny application and R codes were also built along with the method development so that users can implement easily the hypothesis test based on median event time (https://sites.google.com/view/yeonheepark/software). This approach is extended for the trial with interim at which the study monitors futility. The proposed two-stage design based on median event time test is optimal in that the expected sample size is minimized under the null hypothesis. Early stopping for futility enhances ethics in patient care and expedites the discovery of new therapies. Moreover, our methods would reduce unexpected failures in confirmatory phase after the promising results in Phase II study and improve success rate for drug development.

Appendix

A. Proof of theoretical result in methods

Proof of Theorem 1 Let F(y) and f(y) be the cdf and pdf, respectively, of the random variable whose median is ϕ (i.e., mean μ = ϕ/log 2). Then, we have f(y) = log 2 exp(−y log 2/ϕ)/ϕ and F(y) = 1 − exp(−y log 2/ϕ). Suppose that we have a sample of size n = 2k + 1 for some k = 0, 1, 2, …. Then, by Cramér [26], the pdf of the sample median is

We now derive the pdf of the sample median for a sample of size n = 2k for some k = 1, 2, …. When n is an even number, the sample median is m^=(Y(k)+Y(k+1))/2, where Y(k) denotes the kth order statistic of the sample. Note that the joint pdf of Y(k) and Y(k+1) is

where c = n!/{(k − 1)!(nk − 1)!} and μ = ϕ/log 2, for 0 < yk < yk+1 < ∞. Let R = Y(k+1)Y(k) and V = (Y(k) + Y(k+1))/2. Then, V is our interest and we have an one-to-one transformation from (Y(k), Y(k+1)) to (R, V). Since the Jacobian for this transformation is −1, the joint pdf of (R, V) is
for 0 < r < ∞ and r/2 < v < ∞. Thus, the pdf of the sample median is
for 0 < v < ∞.

B. Median event time test for other parametric survival distributions

We provide two theoretical results stating the distribution function of the sample median event time for uniform and weibull distributions.

Theorem 2

    Let Y1, …, Yn be random variables from a uniform distribution with lower bound parameter a and median ϕ (i.e., upper bound parameter is 2ϕa). Then,

    for n = 2k + 1 and
    where c = n!/{(k − 1)!(nk − 1)!}, for n = 2k.

    Let Y1, …, Yn be random variables from a uniform distribution with upper bound parameter b and median ϕ (i.e., lower bound parameter is 2ϕb). Then,

    for n = 2k + 1 and
    where c = n!/{(k − 1)!(nk − 1)!}, for n = 2k.

Proof. We first consider a uniform random variable with lower bound parameter a and median ϕ. Then, we have pdf f(y) = 1/{2(ϕa)} for ay ≤ 2ϕa and cdf

By Cramér [26], if sample size is n = 2k + 1 for some k = 0, 1, 2, …, the pdf of the sample median is

When the sample size is n = 2k for some k = 1, 2, …, we use the joint pdf of Y(k) and Y(k+1), where Y(k) denotes the kth order statistic of the sample, given by

where c = n!/{(k − 1)!(nk − 1)!}, for ayk < yk+1 ≤ 2ϕa. Let R = Y(k+1)Y(k) and V = (Y(k) + Y(k+1))/2. The one-to-one transformation from (Y(k), Y(k+1)) to (R, V) gives the joint pdf of (R, V) is
for 0 < r ≤ 2(ϕa) and r/2 + av ≤ −r/2 + 2ϕa. Thus, the pdf of the sample median is the marginal density of V, which is

Similarly, for a uniform random variable with upper bound parameter b and median ϕ, we have pdf f(y) = 1/{2(bϕ)} for 2ϕbyb and cdf

It is easy to verify that

and
n = 2k for some k = 1, 2, ….

Theorem 3 Let Y1, …, Yn be random variables from a weibull distribution with shape parameter τ and median ϕ (i.e., scale parameter is ϕ/(log 2)1/τ). Then,

for n = 2k + 1 and
where c = n!/{(k − 1)!(nk − 1)!}, for n = 2k.

Proof. Let Y be a weibull random variable with scale parameter τ and median ϕ. Then, we have pdf f(y) = τ(log 2)1/τ{(log 2)1/τ y/ϕ}τ−1 exp[−{(log 2)1/τ y/ϕ}τ]/ϕ and cdf F(y) = 1 − exp[−{(log 2)1/τ y/ϕ}τ]. By Cramér [26], if sample size is n = 2k + 1 for some k = 0, 1, 2, …, the pdf of the sample median is

When the sample size is n = 2k for some k = 1, 2, …, by proof of Theorem 1, the pdf of the sample median is

where c = n!/{(k − 1)!(nk − 1)!}, for 0 < m < ∞.

References

LHSchwartz, SLitière, Ede Vries, RFord, SGwyther, SMandrekar, et al RECIST 1.1—Update and clarification: From the RECIST committee. European journal of cancer. 2016;62:132137. 10.1016/j.ejca.2016.03.081

RSimon. Optimal two-stage designs for phase II clinical trials. Controlled clinical trials. 1989;10(1):110. 10.1016/0197-2456(89)90015-9

GShan, HZhang, TJiang, HPeterson, DYoung, CMa. Exact p-values for Simon’s two-stage designs in clinical trials. Statistics in biosciences. 2016;8(2):351357. 10.1007/s12561-016-9152-1

GShan, GEWilding, ADHutson, SGerstenberger. Optimal adaptive two-stage designs for early phase II clinical trials. Statistics in medicine. 2016;35(8):12571266. 10.1002/sim.6794

GShan, JJChen, CMa. Boundary problem in Simon’s two-stage clinical trial designs. Journal of biopharmaceutical statistics. 2017;27(1):2533. 10.1080/10543406.2016.1148716

JKim, MJSchell. Modified Simon’s minimax and optimal two-stage designs for single-arm phase II cancer clinical trials. Oncotarget. 2019;10(42):4255 10.18632/oncotarget.26981

DEdelmann, CHabermehl, RFSchlenk, ABenner. Adjusting Simon’s optimal two-stage design for heterogeneous populations based on stratification or using historical controls. Biometrical Journal. 2020;62(2):311329. 10.1002/bimj.201800390

TRFleming. One-sample multiple testing procedure for phase II clinical trials. Biometrics. 1982; p. 143151. 10.2307/2530297

AMander, SThompson. Two-stage designs optimal under the alternative hypothesis for phase II cancer clinical trials. Contemporary clinical trials. 2010;31(6):572578. 10.1016/j.cct.2010.07.008

10 

LDCase, TMMorgan. Design of phase II cancer trials evaluating survival probabilities. BMC Medical Research Methodology. 2003;3(1):6 10.1186/1471-2288-3-6

11 

BHuang, ETalukder, NThomas. Optimal two-stage phase II designs with long-term endpoints. Statistics in Biopharmaceutical Research. 2010;2(1):5161. 10.1198/sbr.2010.09001

12 

LBelin, YDe Rycke, PBroët. A two-stage design for phase II trials with time-to-event endpoint using restricted follow-up. Contemporary clinical trials communications. 2017;8:127134. 10.1016/j.conctc.2017.09.010

13 

GShan, HZhang. Two-stage optimal designs with survival endpoint when the follow-up time is restricted. BMC medical research methodology. 2019;19(1):74 10.1186/s12874-019-0696-x

14 

IHJin, SLiu, PFThall, YYuan. Using data augmentation to facilitate conduct of phase I–II clinical trials with delayed outcomes. Journal of the American Statistical Association. 2014;109(506):525536. 10.1080/01621459.2014.881740

15 

YYuan, RLin, DLi, LNie, KEWarren. Time-to-event Bayesian optimal interval design to accelerate phase I trials. Clinical Cancer Research. 2018;24(20):49214930. 10.1158/1078-0432.CCR-18-0246

16 

FDA. 22 Case studies where phase 2 and phase 3 trials had divergent results. 2017;.

17 

DMFinkelstein, AMuzikansky, DASchoenfeld. Comparing survival of a sample to that of a standard population. Journal of the National Cancer Institute. 2003;95(19):14341439. 10.1093/jnci/djg052

18 

XSun, PPeng, DTu. Phase II cancer clinical trials with a one-sample log-rank test and its corrections based on the Edgeworth expansion. Contemporary clinical trials. 2011;32(1):108113. 10.1016/j.cct.2010.09.009

19 

SHJung. Randomized phase II cancer clinical trials. CRC Press; 2013.

20 

MKwak, SHJung. Phase II clinical trials with time-to-event endpoints: optimal two-stage designs with one-sample log-rank test. Statistics in medicine. 2014;33(12):20042016. 10.1002/sim.6073

21 

JWu, LChen, JWei, HWeiss, AChauhan. Two-stage phase II survival trial design. Pharmaceutical Statistics. 2020;19(3):214229. 10.1002/pst.1983

22 

JWu. Single-arm phase II trial design under parametric cure models. Pharmaceutical statistics. 2015;14(3):226232. 10.1002/pst.1678

23 

CChu, SLiu, ARong. Study design of single-arm phase II immunotherapy trials with long-term survivors and random delayed treatment effect. Pharmaceutical Statistics. 2020;. 10.1002/pst.1976

24 

PFThall, LHWooten, NMTannir. Monitoring event times in early phase clinical trials: some practical issues. Clinical trials. 2005;2(6):467478. 10.1191/1740774505cn121oa

25 

HZhou, CChen, LSun, YYuan. Bayesian optimal phase II clinical trial design with time-to-event endpoint. Pharmaceutical Statistics. 2020;. 10.1002/pst.2030

26 

HCramér. Mathematical methods of statistics. Princeton university press 2016;9.