Skip to contents

This function takes multiple imputed datasets (as generated by the impute() function from the rbmi package) and runs an analysis function on each of them.

Usage

rbmi_analyse(
  imputations,
  fun = rbmi_ancova,
  delta = NULL,
  ...,
  cluster_or_cores = 1,
  .validate = TRUE
)

Arguments

imputations

An imputations object as created by the impute() function from the rbmi package.

fun

An analysis function to be applied to each imputed dataset. See details.

delta

A data.frame containing the delta transformation to be applied to the imputed datasets prior to running fun. See details.

...

Additional arguments passed onto fun.

cluster_or_cores

(numeric or cluster object)
The number of parallel processes to use when running this function. Can also be a cluster object created by make_rbmi_cluster(). See the parallelisation section below.

.validate

(logical)
Should imputations be checked to ensure it conforms to the required format (default = TRUE) ? Can gain a small performance increase if this is set to FALSE when analysing a large number of samples.

Value

An analysis object, as defined by rbmi, representing the desired analysis applied to each of the imputed datasets in imputations.

Details

This function works by performing the following steps:

  1. Extract a dataset from the imputations object.

  2. Apply any delta adjustments as specified by the delta argument.

  3. Run the analysis function fun on the dataset.

  4. Repeat steps 1-3 across all of the datasets inside the imputations object.

  5. Collect and return all of the analysis results.

The analysis function fun must take a data.frame as its first argument. All other options to rbmi_analyse() are passed onto fun via .... fun must return a named list with each element itself being a list containing a single numeric element called est (or additionally se and df if you had originally specified the method_bayes() or method_approxbayes() functions from the rbmi package) i.e.:


myfun <- function(dat, ...) {
    mod_1 <- lm(data = dat, outcome ~ group)
    mod_2 <- lm(data = dat, outcome ~ group + covar)
    x <- list(
        trt_1 = list(
            est = coef(mod_1)[['group']],  # Use [[ ]] for safety
            se = sqrt(vcov(mod_1)['group', 'group']), # Use ['','']
            df = df.residual(mod_1)
        ),
        trt_2 = list(
            est = coef(mod_2)[['group']],  # Use [[ ]] for safety
            se = sqrt(vcov(mod_2)['group', 'group']), # Use ['','']
            df = df.residual(mod_2)
        )
     )
     return(x)
 }

Please note that the vars$subjid column (as defined in the original call to the draws() function from the rbmi package) will be scrambled in the data.frames that are provided to fun. This is to say they will not contain the original subject values and as such any hard coding of subject ids is strictly to be avoided.

By default fun is the rbmi_ancova() function. Please note that this function requires that a vars object, as created by the set_vars() function from the rbmi package, is provided via the vars argument e.g. rbmi_analyse(imputeObj, vars = set_vars(...)). Please see the documentation for rbmi_ancova() for full details. Please also note that the theoretical justification for the conditional mean imputation method (method = method_condmean() in the draws() function from the rbmi package) relies on the fact that ANCOVA is a linear transformation of the outcomes. Thus care is required when applying alternative analysis functions in this setting.

The delta argument can be used to specify offsets to be applied to the outcome variable in the imputed datasets prior to the analysis. This is typically used for sensitivity or tipping point analyses. The delta dataset must contain columns vars$subjid, vars$visit (as specified in the original call to the draws() function from the rbmi package) and delta. Essentially this data.frame is merged onto the imputed dataset by vars$subjid and vars$visit and then the outcome variable is modified by:

imputed_data[[vars$outcome]] <- imputed_data[[vars$outcome]] + imputed_data[['delta']]

Please note that in order to provide maximum flexibility, the delta argument can be used to modify any/all outcome values including those that were not imputed. Care must be taken when defining offsets. It is recommend that you use the helper function delta_template() from the rbmi package to define the delta datasets as this provides utility variables such as is_missing which can be used to identify exactly which visits have been imputed.

Parallelisation

To speed up the evaluation of rbmi_analyse() you can use the cluster_or_cores argument to enable parallelisation. Simply providing an integer will get rbmi to automatically spawn that many background processes to parallelise across. If you are using a custom analysis function then you need to ensure that any libraries or global objects required by your function are available in the sub-processes. To do this you need to use the make_rbmi_cluster() function for example:

my_custom_fun <- function(...) <some analysis code>
cl <- make_rbmi_cluster(
    4,
    objects = list('my_custom_fun' = my_custom_fun),
    packages = c('dplyr', 'nlme')
)
rbmi_analyse(
    imputations = imputeObj,
    fun = my_custom_fun,
    cluster_or_cores = cl
)
parallel::stopCluster(cl)

Note that there is significant overhead both with setting up the sub-processes and with transferring data back-and-forth between the main process and the sub-processes. As such parallelisation of the rbmi_analyse() function tends to only be worth it when you have > 2000 samples generated by the draws() function from the rbmi package. Conversely using parallelisation if your samples are smaller than this may lead to longer run times than just running it sequentially.

It is important to note that the implementation of parallel processing within the analyse() function from the rbmi package has been optimised around the assumption that the parallel processes will be spawned on the same machine and not a remote cluster. One such optimisation is that the required data is saved to a temporary file on the local disk from which it is then read into each sub-process. This is done to avoid the overhead of transferring the data over the network. Our assumption is that if you are at the stage where you need to be parallelising your analysis over a remote cluster then you would likely be better off parallelising across multiple rbmi runs rather than within a single rbmi run.

Finally, if you are doing a tipping point analysis you can get a reasonable performance improvement by re-using the cluster between each call to rbmi_analyse() e.g.

cl <- make_rbmi_cluster(4)
ana_1 <- rbmi_analyse(
    imputations = imputeObj,
    delta = delta_plan_1,
    cluster_or_cores = cl
)
ana_2 <- rbmi_analyse(
    imputations = imputeObj,
    delta = delta_plan_2,
    cluster_or_cores = cl
)
ana_3 <- rbmi_analyse(
    imputations = imputeObj,
    delta = delta_plan_3,
    cluster_or_cores = cl
)
parallel::clusterStop(cl)

See also

The extract_imputed_dfs() function from the rbmi package for manually extracting imputed datasets.

The delta_template() function from the rbmi package for creating delta data.frames.

rbmi_ancova() for the default analysis function.

Examples

if (requireNamespace("rbmi", quietly = TRUE)) {
  library(rbmi)
  library(dplyr)

  dat <- antidepressant_data
  dat$GENDER <- as.factor(dat$GENDER)
  dat$POOLINV <- as.factor(dat$POOLINV)
  set.seed(123)
  pat_ids <- sample(levels(dat$PATIENT), nlevels(dat$PATIENT) / 4)
  dat <- dat |>
    filter(PATIENT %in% pat_ids) |>
    droplevels()
  dat <- expand_locf(
    dat,
    PATIENT = levels(dat$PATIENT),
    VISIT = levels(dat$VISIT),
    vars = c("BASVAL", "THERAPY"),
    group = c("PATIENT"),
    order = c("PATIENT", "VISIT")
  )
  dat_ice <- dat |>
    arrange(PATIENT, VISIT) |>
    filter(is.na(CHANGE)) |>
    group_by(PATIENT) |>
    slice(1) |>
    ungroup() |>
    select(PATIENT, VISIT) |>
    mutate(strategy = "JR")
  dat_ice <- dat_ice[-which(dat_ice$PATIENT == 3618), ]
  vars <- set_vars(
    outcome = "CHANGE",
    visit = "VISIT",
    subjid = "PATIENT",
    group = "THERAPY",
    covariates = c("THERAPY")
  )
  drawObj <- draws(
    data = dat,
    data_ice = dat_ice,
    vars = vars,
    method = method_condmean(type = "jackknife", covariance = "csh"),
    quiet = TRUE
  )
  references <- c("DRUG" = "PLACEBO", "PLACEBO" = "PLACEBO")
  imputeObj <- impute(drawObj, references)

  rbmi_analyse(imputations = imputeObj, vars = vars)
}
#> 
#> Attaching package: ‘rbmi’
#> The following object is masked from ‘package:junco’:
#> 
#>     make_rbmi_cluster
#> 
#> Analysis Object
#> ---------------
#> Number of Results: 1 + 43
#> Analysis Function: rbmi_ancova
#> Delta Applied: FALSE
#> Analysis Estimates:
#>     var_4
#>     trt_PLACEBO_4
#>     lsm_DRUG_4
#>     lsm_PLACEBO_4
#>     var_5
#>     trt_PLACEBO_5
#>     lsm_DRUG_5
#>     lsm_PLACEBO_5
#>     var_6
#>     trt_PLACEBO_6
#>     lsm_DRUG_6
#>     lsm_PLACEBO_6
#>     var_7
#>     trt_PLACEBO_7
#>     lsm_DRUG_7
#>     lsm_PLACEBO_7
#>