This function takes multiple imputed datasets (as generated by
the rbmi::impute()
function) and runs an analysis function on
each of them.
Usage
rbmi_analyse(
imputations,
fun = rbmi_ancova,
delta = NULL,
...,
cluster_or_cores = 1,
.validate = TRUE
)
Arguments
- imputations
An
imputations
object as created byrbmi::impute()
.- fun
An analysis function to be applied to each imputed dataset. See details.
- delta
A
data.frame
containing the delta transformation to be applied to the imputed datasets prior to runningfun
. See details.- ...
Additional arguments passed onto
fun
.- cluster_or_cores
The number of parallel processes to use when running this function. Can also be a cluster object created by
make_rbmi_cluster()
. See the parallelisation section below.- .validate
Should
imputations
be checked to ensure it conforms to the required format (default =TRUE
) ? Can gain a small performance increase if this is set toFALSE
when analysing a large number of samples.
Value
An analysis
object, as defined by rbmi
, representing the desired
analysis applied to each of the imputed datasets in imputations
.
Details
This function works by performing the following steps:
Extract a dataset from the
imputations
object.Apply any delta adjustments as specified by the
delta
argument.Run the analysis function
fun
on the dataset.Repeat steps 1-3 across all of the datasets inside the
imputations
object.Collect and return all of the analysis results.
The analysis function fun
must take a data.frame
as its first
argument. All other options to rbmi_analyse()
are passed onto fun
via ...
.
fun
must return a named list with each element itself being a
list containing a single
numeric element called est
(or additionally se
and df
if
you had originally specified rbmi::method_bayes()
or rbmi::method_approxbayes()
)
i.e.:
myfun <- function(dat, ...) {
mod_1 <- lm(data = dat, outcome ~ group)
mod_2 <- lm(data = dat, outcome ~ group + covar)
x <- list(
trt_1 = list(
est = coef(mod_1)[['group']], # Use [[ ]] for safety
se = sqrt(vcov(mod_1)['group', 'group']), # Use ['','']
df = df.residual(mod_1)
),
trt_2 = list(
est = coef(mod_2)[['group']], # Use [[ ]] for safety
se = sqrt(vcov(mod_2)['group', 'group']), # Use ['','']
df = df.residual(mod_2)
)
)
return(x)
}
Please note that the vars$subjid
column (as defined in the original call to
rbmi::draws()
) will be scrambled in the data.frames that are provided to fun
.
This is to say they will not contain the original subject values and as such
any hard coding of subject ids is strictly to be avoided.
By default fun
is the rbmi_ancova()
function.
Please note that this function
requires that a vars
object, as created by rbmi::set_vars()
, is provided via
the vars
argument e.g. rbmi_analyse(imputeObj, vars = rbmi::set_vars(...))
. Please
see the documentation for rbmi_ancova()
for full details.
Please also note that the theoretical justification for the conditional mean imputation
method (method = method_condmean()
in rbmi::draws()
) relies on the fact that ANCOVA is
a linear transformation of the outcomes.
Thus care is required when applying alternative analysis functions in this setting.
The delta
argument can be used to specify offsets to be applied
to the outcome variable in the imputed datasets prior to the analysis.
This is typically used for sensitivity or tipping point analyses. The
delta dataset must contain columns vars$subjid
, vars$visit
(as specified
in the original call to rbmi::draws()
) and delta
. Essentially this data.frame
is merged onto the imputed dataset by vars$subjid
and vars$visit
and then
the outcome variable is modified by:
Please note that in order to provide maximum flexibility, the delta
argument
can be used to modify any/all outcome values including those that were not
imputed. Care must be taken when defining offsets. It is recommend that you
use the helper function rbmi::delta_template()
to define the delta datasets as
this provides utility variables such as is_missing
which can be used to identify
exactly which visits have been imputed.
Parallelisation
To speed up the evaluation of rbmi_analyse()
you can use the cluster_or_cores
argument to enable parallelisation.
Simply providing an integer will get rbmi
to automatically spawn that many background processes
to parallelise across. If you are using a custom analysis function then you need to ensure
that any libraries or global objects required by your function are available in the
sub-processes. To do this you need to use the make_rbmi_cluster()
function for example:
my_custom_fun <- function(...) <some analysis code>
cl <- make_rbmi_cluster(
4,
objects = list('my_custom_fun' = my_custom_fun),
packages = c('dplyr', 'nlme')
)
rbmi_analyse(
imputations = imputeObj,
fun = my_custom_fun,
cluster_or_cores = cl
)
parallel::stopCluster(cl)
Note that there is significant overhead both with setting up the sub-processes and with
transferring data back-and-forth between the main process and the sub-processes. As such
parallelisation of the rbmi_analyse()
function tends to only be worth it when you have
> 2000
samples generated by rbmi::draws()
. Conversely using parallelisation if your samples
are smaller than this may lead to longer run times than just running it sequentially.
It is important to note that the implementation of parallel processing within [rbmi::analyse()] has been optimised around the assumption that the parallel processes will be spawned on the same machine and not a remote cluster. One such optimisation is that the required data is saved to a temporary file on the local disk from which it is then read into each sub-process. This is done to avoid the overhead of transferring the data over the network. Our assumption is that if you are at the stage where you need to be parallelising your analysis over a remote cluster then you would likely be better off parallelising across multiple
rbmiruns rather than within a single
rbmi` run.
Finally, if you are doing a tipping point analysis you can get a reasonable performance
improvement by re-using the cluster between each call to rbmi_analyse()
e.g.
cl <- make_rbmi_cluster(4)
ana_1 <- rbmi_analyse(
imputations = imputeObj,
delta = delta_plan_1,
cluster_or_cores = cl
)
ana_2 <- rbmi_analyse(
imputations = imputeObj,
delta = delta_plan_2,
cluster_or_cores = cl
)
ana_3 <- rbmi_analyse(
imputations = imputeObj,
delta = delta_plan_3,
cluster_or_cores = cl
)
parallel::clusterStop(cl)
See also
rbmi::extract_imputed_dfs()
for manually extracting imputed
datasets.
rbmi::delta_template()
for creating delta data.frames.
rbmi_ancova()
for the default analysis function.
Examples
library(rbmi)
#>
#> Attaching package: ‘rbmi’
#> The following object is masked from ‘package:junco’:
#>
#> make_rbmi_cluster
library(dplyr)
dat <- antidepressant_data
dat$GENDER <- as.factor(dat$GENDER)
dat$POOLINV <- as.factor(dat$POOLINV)
set.seed(123)
pat_ids <- sample(levels(dat$PATIENT), nlevels(dat$PATIENT) / 4)
dat <- dat |>
filter(PATIENT %in% pat_ids) |>
droplevels()
dat <- expand_locf(
dat,
PATIENT = levels(dat$PATIENT),
VISIT = levels(dat$VISIT),
vars = c("BASVAL", "THERAPY"),
group = c("PATIENT"),
order = c("PATIENT", "VISIT")
)
dat_ice <- dat %>%
arrange(PATIENT, VISIT) %>%
filter(is.na(CHANGE)) %>%
group_by(PATIENT) %>%
slice(1) %>%
ungroup() %>%
select(PATIENT, VISIT) %>%
mutate(strategy = "JR")
dat_ice <- dat_ice[-which(dat_ice$PATIENT == 3618), ]
vars <- set_vars(
outcome = "CHANGE",
visit = "VISIT",
subjid = "PATIENT",
group = "THERAPY",
covariates = c("THERAPY")
)
drawObj <- draws(
data = dat,
data_ice = dat_ice,
vars = vars,
method = method_condmean(type = "jackknife", covariance = "csh"),
quiet = TRUE
)
references <- c("DRUG" = "PLACEBO", "PLACEBO" = "PLACEBO")
imputeObj <- impute(drawObj, references)
rbmi_analyse(imputations = imputeObj, vars = vars)
#>
#> Analysis Object
#> ---------------
#> Number of Results: 1 + 43
#> Analysis Function: rbmi_ancova
#> Delta Applied: FALSE
#> Analysis Estimates:
#> var_4
#> trt_PLACEBO_4
#> lsm_DRUG_4
#> lsm_PLACEBO_4
#> var_5
#> trt_PLACEBO_5
#> lsm_DRUG_5
#> lsm_PLACEBO_5
#> var_6
#> trt_PLACEBO_6
#> lsm_DRUG_6
#> lsm_PLACEBO_6
#> var_7
#> trt_PLACEBO_7
#> lsm_DRUG_7
#> lsm_PLACEBO_7
#>