bdplm is used to estimate the treatment effect in the
presence of covariates using the regression analysis implementation of the
Bayesian discount prior. summary and print methods are
supported. Currently, the function only supports a two-arm clinical trial
where all of current treatment, current control, historical treatment, and
historical control data are present
Usage
bdplm(
formula = formula,
data = data,
data0 = NULL,
prior_treatment_effect = NULL,
prior_control_effect = NULL,
prior_treatment_sd = NULL,
prior_control_sd = NULL,
prior_covariate_effect = 0,
prior_covariate_sd = 10000,
number_mcmc_alpha = 5000,
number_mcmc_sigmagrid = 5000,
number_mcmc_sigma = 100,
number_mcmc_beta = 10000,
discount_function = "identity",
alpha_max = 1,
fix_alpha = FALSE,
weibull_scale = 0.135,
weibull_shape = 3,
method = "mc"
)Arguments
- formula
an object of class "formula." See "Details" for more information, including specification of treatment data indicators.
- data
a data frame containing the current data variables in the model. A column named
treatmentmust be present;treatmentmust be binary and indicate treatment group vs. control group.- data0
a data frame containing the historical data variables in the model. The column labels of data and data0 must match.
- prior_treatment_effect
scalar. The historical adjusted treatment effect. If left
NULL, value is estimated from the historical data.- prior_control_effect
scalar. The historical adjusted control effect. If left
NULL, value is estimated from the historical data.- prior_treatment_sd
scalar. The standard deviation of the historical adjusted treatment effect. If left
NULL, value is estimated from the historical data.- prior_control_sd
scalar. The standard deviation of the historical adjusted control effect. If left
NULL, value is estimated from the historical data.- prior_covariate_effect
vector. The prior mean(s) of the covariate effect(s). Default value is zero. If a single value is input, the the scalar is repeated to the length of the input covariates. Otherwise, care must be taken to ensure the length of the input matches the number of covariates.
- prior_covariate_sd
vector. The prior standard deviation(s) of the covariate effect(s). Default value is 1e4. If a single value is input, the the scalar is repeated to the length of the input covariates. Otherwise, care must be taken to ensure the length of the input matches the number of covariates. Note that covariate effects are intentionally assigned a near-zero historical discount weight, so their priors are effectively flat: the value supplied here has negligible influence on the posterior, and the effective prior standard deviation is much larger than the nominal value (by a factor of roughly 1e6 at the default). In practice the covariate coefficients are estimated almost entirely from the current data, and the posterior is insensitive to this argument across many orders of magnitude.
- number_mcmc_alpha
scalar. Number of Monte Carlo simulations for estimating the historical data weight. Default is 5000.
- number_mcmc_sigmagrid
scalar. Grid size for computing sigma. Default is 5000. See "Details" for more information.
- number_mcmc_sigma
scalar. Number of Monte Carlo simulations for estimating sigma. Default is 1000. See "Details" for more information.
- number_mcmc_beta
scalar. Number of Monte Carlo simulations for estimating beta, the vector of regression coefficients. Default is 10000.
- discount_function
character. Specify the discount function to use. Currently supports
weibull,scaledweibull, andidentity. The discount functionscaledweibullscales the output of the Weibull CDF to have a max value of 1. Theidentitydiscount function uses the posterior probability directly as the discount weight. Default value is "identity".- alpha_max
scalar. Maximum weight the discount function can apply. Default is 1. Users may specify a vector of two values where the first value is used to weight the historical treatment group and the second value is used to weight the historical control group.
- fix_alpha
logical. Fix alpha at alpha_max? Default value is FALSE.
- weibull_scale
scalar. Scale parameter of the Weibull discount function used to compute alpha, the weight parameter of the historical data. Default value is 0.135. Users may specify a vector of two values where the first value is used to estimate the weight of the historical treatment group and the second value is used to estimate the weight of the historical control group. Not used when
discount_function= "identity".- weibull_shape
scalar. Shape parameter of the Weibull discount function used to compute alpha, the weight parameter of the historical data. Default value is 3. Users may specify a vector of two values where the first value is used to estimate the weight of the historical treatment group and the second value is used to estimate the weight of the historical control group. Not used when
discount_function= "identity".- method
character. Analysis method with respect to estimation of the weight paramter alpha. Default method "
mc" estimates alpha for each Monte Carlo iteration. Alternate value "fixed" estimates alpha once and holds it fixed throughout the analysis. See the thebdplmvignettevignette("bdplm-vignette", package="bayesDP")for more details.
Value
bdplm returns an object of class "bdplm".
An object of class "bdplm" is a list containing at least the
following components:
posteriordata frame. The posterior draws of the covariates, the intercept, and the treatment effect. The grid of sigma values are included.
alpha_discountvector. The posterior probability of the stochastic comparison between the current and historical data for each of the treatment and control arms. If
method="mc", the result is a matrix of estimates, otherwise formethod="fixed", the result is a vector of estimates.estimateslist. The posterior means and standard errors of the intercept, treatment effect, covariate effect(s) and error variance.
Details
bdplm uses a two-stage approach for determining the strength
of historical data in estimation of an adjusted mean or covariate effect.
In the first stage, a discount function is used that that defines
the maximum strength of the historical data and discounts based on
disagreement with the current data. Disagreement between current and
historical data is determined by stochastically comparing the respective
posterior distributions under noninformative priors. Here with a two-arm
regression analysis, the comparison is the proability (p) that the
covariate effect of an historical data indicator is significantly different
from zero. The comparison metric p is then input into the discount
function and the final strength of the historical data is returned
(alpha).
In the second stage, posterior estimation is performed where the discount
function parameter, alpha, is used to weight the historical data
effects.
The formula must include an intercept (i.e., do not use -1 in the
formula) and both data and data0 must be present. The column names of data
and data0 must match. See examples below for example usage.
Internally, the model is parameterized without a fixed intercept, using
separate treatment and control means rather than an intercept plus a
treatment effect. In this parameterization the arm means are the fitted
values at covariate = 0, so when covariates are not centered the
treatment- and control-mean estimators become strongly correlated and
their standard errors are extrapolation errors at covariate = 0. This
distorts the discount-prior construction and the historical borrowing. To
guard against this, bdplm automatically mean-centers each covariate
on its pooled (current plus historical) mean before fitting, and
back-transforms the reported intercept onto the original covariate scale.
Specifically, if \(\bar{x}_j\) is the pooled mean of covariate \(j\)
and \(\beta_j\) its estimated effect, the reported intercept is
\(\beta_0 = \beta_0^{c} - \sum_j \beta_j \bar{x}_j\), where
\(\beta_0^{c}\) is the control-arm mean estimated on the centered scale.
The treatment effect and covariate slopes are unchanged by centering; the
reported intercept remains the control-arm mean at covariate = 0 on
the original scale. As a result, estimates are invariant to a location
shift of any covariate, and users do not need to center covariates
themselves.
The underlying model results in a marginal posterior distribution for the
error variance sigma2 that does not have a known distribution. Thus,
we use a grid search to draw samples of sigma2. First, the marginal
posterior is evaluated at a grid of number_mcmc_sigmagrid
sigma2 values. The bounds of the grid are taken as approximately six
standard deviations from an estimate of sigma2 using the lm
function. Next, number_mcmc_sigma posterior draws of sigma2
are made by sampling with replacement from the grid with each value having
the corresponding marginal posterior probability of being selected. With
posterior draws of sigma2 in hand, we can make posterior draws of
the regression coefficients.
Examples
# Set sample sizes
n_t <- 30 # Current treatment sample size
n_c <- 30 # Current control sample size
n_t0 <- 80 # Historical treatment sample size
n_c0 <- 80 # Historical control sample size
# Treatment group vectors for current and historical data
treatment <- c(rep(1, n_t), rep(0, n_c))
treatment0 <- c(rep(1, n_t0), rep(0, n_c0))
# Simulate a covariate effect for current and historical data
x <- rnorm(n_t + n_c, 1, 5)
x0 <- rnorm(n_t0 + n_c0, 1, 5)
# Simulate outcome:
# - Intercept of 10 for current and historical data
# - Treatment effect of 31 for current data
# - Treatment effect of 30 for historical data
# - Covariate effect of 3 for current and historical data
Y <- 10 + 31 * treatment + x * 3 + rnorm(n_t + n_c, 0, 5)
Y0 <- 10 + 30 * treatment0 + x0 * 3 + rnorm(n_t0 + n_c0, 0, 5)
# Place data into separate treatment and control data frames and
# assign historical = 0 (current) or historical = 1 (historical)
df_ <- data.frame(Y = Y, treatment = treatment, x = x)
df0 <- data.frame(Y = Y0, treatment = treatment0, x = x0)
# Fit model using default settings
fit <- bdplm(
formula = Y ~ treatment + x, data = df_, data0 = df0,
method = "fixed"
)
# Look at estimates and discount weight
summary(fit)
#>
#> Call:
#> bdplm(formula = Y ~ treatment + x, data = df_, data0 = df0, method = "fixed")
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -7.096 -0.492 2.569 7.794 15.307
#>
#> Coefficients:
#> Estimate Std. Error
#> (Intercept) 9.7721 0.7334
#> treatment 30.3256 1.0342
#> x 2.8382 0.1455
#>
#> Discount function value (alpha):
#> treatment control
#> 0.2776 0.3348
#>
#> Residual standard error: 5.681
print(fit)
#>
#> Call:
#> bdplm(formula = Y ~ treatment + x, data = df_, data0 = df0, method = "fixed")
#>
#>
#> Coefficients:
#> (Intercept) treatment x
#> 9.772 30.326 2.838
#>
#>
#> Discount function value (alpha):
#> treatment control
#> 0.2776 0.3348
#>