The other vignettes describe two-arm randomised designs. Single-arm
trials – in which every subject receives the experimental therapy and
the comparator is an external benchmark – are common in early-phase
oncology, rare-disease, and proof-of-concept studies. This vignette
shows how to set up a Goldilocks single-arm design with
survival_adapt().
Two practical constraints on single-arm designs in this package:
- A single-arm trial is signalled by setting
hazard_control = NULL. - Only
method = "bayes"is supported for single-arm trials. The frequentist tests (logrank,cox,chisq) require two arms and will raise an error if used in this mode.
The decision rule
In a single-arm trial there is no concurrent control, so the “treatment effect” is replaced by the cumulative event probability on the treatment arm itself:
The argument h0 plays the role of a benchmark on this
scale: a target failure probability (or, equivalently,
is a target survival probability) drawn from external evidence such as a
published rate, registry, or historical cohort. With
alternative = "less" and prob_ha, the trial
declares success when
i.e. when the posterior assigns enough mass to “the experimental
therapy has a lower failure rate than the benchmark”. Choosing
alternative = "greater" reverses the direction;
alternative = "two.sided" is not allowed for
method = "bayes".
The same posterior is used at each interim look to compute the
predictive probability of eventual success, which drives the futility
(Fn) and expected-success (Sn) stopping rules.
Predictive probabilities are obtained by imputing remaining follow-up
from the posterior predictive distribution of the
(piecewise-)exponential model and re-evaluating the success criterion on
each completed dataset.
Setting up the design
Suppose the existing standard of care has a 30% event probability by 24 months, and we are testing a new agent that we hope will reduce this to 20%. We use an interim look at 50 of 80 enrolled subjects:
end_of_study <- 24
benchmark <- 0.30 # external standard-of-care failure rate
target <- 0.20 # rate we hope the new therapy achieves
# Convert the target failure rate into a constant hazard (so we can simulate)
ht <- prop_to_haz(probs = target, endtime = end_of_study)
ht
#> [1] 0.009297648Now we run survival_adapt():
out <- survival_adapt(
hazard_treatment = ht,
hazard_control = NULL, # single-arm
cutpoints = 0,
N_total = 80,
lambda = 5, # enrolments per month (constant)
lambda_time = 0,
interim_look = 50,
end_of_study = end_of_study,
prior = c(0.1, 0.1), # Gamma(0.1, 0.1) on the hazard
block = 2, # default; inert in single-arm mode
rand_ratio = c(1, 1), # default; inert in single-arm mode
prop_loss = 0.05,
alternative = "less",
h0 = benchmark, # benchmark failure probability
Fn = 0.05,
Sn = 0.95,
prob_ha = 0.95,
N_impute = 50,
N_mcmc = 2000,
method = "bayes")
out
#> prob_threshold margin alternative N_treatment N_control N_enrolled N_max
#> 1 0.95 0.3 less 80 0 80 80
#> post_prob_ha est_final ppp_success stop_futility stop_expected_success
#> 1 0.9985 0.1689997 0.86 0 0A few points to highlight in the output:
-
N_control = 0: no concurrent control was simulated. -
margin = 0.30: this is the value ofh0that the trial is testing against. Note that it is on the cumulative-failure scale, not the survival scale. -
est_finalis the posterior mean of atend_of_study, not a treatment effect relative to control. -
post_prob_hais the posterior probability that .
Why block and rand_ratio still appear
survival_adapt() shares its trial-data simulator with
the two-arm case. In single-arm mode the simulator skips
randomization() entirely and assigns every subject to the
treatment arm; block and rand_ratio are
therefore inert and can be left at their defaults. The
minimum-interim_look rule
(interim_look >= max(block)) only applies to two-arm
designs, so a single-arm trial can use any interim_look
strictly less than N_total.
Operating characteristics
A single trial does not tell you whether the design is
well-calibrated. To estimate power and type I error, we run the design
under each scenario using sim_trials(). The chunks below
are not run when knitting (each takes a few minutes) but illustrate the
workflow:
# Power: simulate under the alternative (true rate = 0.20)
out_power <- sim_trials(
N_trials = 1000,
hazard_treatment = ht,
hazard_control = NULL,
cutpoints = 0,
N_total = 80,
lambda = 5,
lambda_time = 0,
interim_look = 50,
end_of_study = end_of_study,
prior = c(0.1, 0.1),
block = 2,
rand_ratio = c(1, 1),
prop_loss = 0.05,
alternative = "less",
h0 = benchmark,
Fn = 0.05,
Sn = 0.95,
prob_ha = 0.95,
N_impute = 50,
N_mcmc = 2000,
method = "bayes")
# Type I error: simulate under the null (true rate = benchmark = 0.30)
ht_null <- prop_to_haz(probs = benchmark, endtime = end_of_study)
out_t1error <- sim_trials(
N_trials = 1000,
hazard_treatment = ht_null,
hazard_control = NULL,
cutpoints = 0,
N_total = 80,
lambda = 5,
lambda_time = 0,
interim_look = 50,
end_of_study = end_of_study,
prior = c(0.1, 0.1),
block = 2,
rand_ratio = c(1, 1),
prop_loss = 0.05,
alternative = "less",
h0 = benchmark,
Fn = 0.05,
Sn = 0.95,
prob_ha = 0.95,
N_impute = 50,
N_mcmc = 2000,
method = "bayes")
summarise_sims(list(out_power$sims, out_t1error$sims))Calibration proceeds the same way as for two-arm designs: if the type
I error under the null (where the true rate equals the benchmark) is
above the desired level, raise prob_ha; if power is too
low, increase N_total or relax the
Fn/Sn thresholds.
A practical caveat on benchmarks
The validity of a single-arm Goldilocks trial rests entirely on the
benchmark h0 being a fair representation of the population
the trial is enrolling. Drift in standard of care, differences in
patient mix, and unmeasured confounding all bias the comparison in a way
that randomisation would otherwise neutralise. A Bayesian framework can
incorporate uncertainty about the benchmark itself – e.g. by replacing a
fixed h0 with a prior distribution informed by historical
data – but this is outside the scope of the simple h0
scalar that survival_adapt() exposes, and would require a
custom analysis. When in doubt, simulating the design under several
plausible values of the true rate (including ones near the benchmark) is
a useful way to characterise its sensitivity.
See also
- The “Example: Two-armed RCT” vignette covers the corresponding two-arm randomised design with a log-rank decision rule.
- The “Bayesian decisions with piecewise-exponential hazards” vignette
covers the same decision rule used here, but in a two-arm setting and
with non-constant hazards. The piecewise machinery applies directly to
single-arm trials too (just keep
hazard_control = NULLand pass a per-intervalhazard_treatmentvector). -
?survival_adaptdocuments all arguments.