Skip to contents

This function takes multiple imputed datasets (as generated by the [impute()] function) and runs an analysis function on each of them.

Usage

rbmi_analyse(
  imputations,
  fun = rbmi_ancova,
  delta = NULL,
  ...,
  cluster_or_cores = 1,
  .validate = TRUE
)

Arguments

imputations

An `imputations` object as created by [impute()].

fun

An analysis function to be applied to each imputed dataset. See details.

delta

A `data.frame` containing the delta transformation to be applied to the imputed datasets prior to running `fun`. See details.

...

Additional arguments passed onto `fun`.

cluster_or_cores

The number of parallel processes to use when running this function. Can also be a cluster object created by [`make_rbmi_cluster()`]. See the parallelisation section below.

.validate

Should `imputations` be checked to ensure it conforms to the required format (default = `TRUE`) ? Can gain a small performance increase if this is set to `FALSE` when analysing a large number of samples.

Details

This function works by performing the following steps:

1. Extract a dataset from the `imputations` object. 2. Apply any delta adjustments as specified by the `delta` argument. 3. Run the analysis function `fun` on the dataset. 4. Repeat steps 1-3 across all of the datasets inside the `imputations` object. 5. Collect and return all of the analysis results.

The analysis function `fun` must take a `data.frame` as its first argument. All other options to [rbmi_analyse()] are passed onto `fun` via `...`. `fun` must return a named list with each element itself being a list containing a single numeric element called `est` (or additionally `se` and `df` if you had originally specified [method_bayes()] or [method_approxbayes()]) i.e.:


myfun <- function(dat, ...) {
    mod_1 <- lm(data = dat, outcome ~ group)
    mod_2 <- lm(data = dat, outcome ~ group + covar)
    x <- list(
        trt_1 = list(
            est = coef(mod_1)[['group']],  # Use [[ ]] for safety
            se = sqrt(vcov(mod_1)['group', 'group']), # Use ['','']
            df = df.residual(mod_1)
        ),
        trt_2 = list(
            est = coef(mod_2)[['group']],  # Use [[ ]] for safety
            se = sqrt(vcov(mod_2)['group', 'group']), # Use ['','']
            df = df.residual(mod_2)
        )
     )
     return(x)
 }

Please note that the `vars$subjid` column (as defined in the original call to [draws()]) will be scrambled in the data.frames that are provided to `fun`. This is to say they will not contain the original subject values and as such any hard coding of subject ids is strictly to be avoided.

By default `fun` is the [rbmi_ancova()] function. Please note that this function requires that a `vars` object, as created by [set_vars()], is provided via the `vars` argument e.g. `rbmi_analyse(imputeObj, vars = rbmi::set_vars(...))`. Please see the documentation for [rbmi_ancova()] for full details. Please also note that the theoretical justification for the conditional mean imputation method (`method = method_condmean()` in [draws()]) relies on the fact that ANCOVA is a linear transformation of the outcomes. Thus care is required when applying alternative analysis functions in this setting.

The `delta` argument can be used to specify offsets to be applied to the outcome variable in the imputed datasets prior to the analysis. This is typically used for sensitivity or tipping point analyses. The delta dataset must contain columns `vars$subjid`, `vars$visit` (as specified in the original call to [draws()]) and `delta`. Essentially this `data.frame` is merged onto the imputed dataset by `vars$subjid` and `vars$visit` and then the outcome variable is modified by:

“` imputed_data[[vars$outcome]] <- imputed_data[[vars$outcome]] + imputed_data[['delta']] “`

Please note that in order to provide maximum flexibility, the `delta` argument can be used to modify any/all outcome values including those that were not imputed. Care must be taken when defining offsets. It is recommend that you use the helper function [delta_template()] to define the delta datasets as this provides utility variables such as `is_missing` which can be used to identify exactly which visits have been imputed.

Parallelisation

To speed up the evaluation of `rbmi_analyse()` you can use the `cluster_or_cores` argument to enable parallelisation. Simply providing an integer will get `rbmi` to automatically spawn that many background processes to parallelise across. If you are using a custom analysis function then you need to ensure that any libraries or global objects required by your function are available in the sub-processes. To do this you need to use the [`make_rbmi_cluster()`] function for example: “` my_custom_fun <- function(...) <some analysis code> cl <- make_rbmi_cluster( 4, objects = list('my_custom_fun' = my_custom_fun), packages = c('dplyr', 'nlme') ) rbmi_analyse( imputations = imputeObj, fun = my_custom_fun, cluster_or_cores = cl ) parallel::stopCluster(cl) “`

Note that there is significant overhead both with setting up the sub-processes and with transferring data back-and-forth between the main process and the sub-processes. As such parallelisation of the `rbmi_analyse()` function tends to only be worth it when you have `> 2000` samples generated by [`draws()`]. Conversely using parallelisation if your samples are smaller than this may lead to longer run times than just running it sequentially.

It is important to note that the implementation of parallel processing within [`analyse()`] has been optimised around the assumption that the parallel processes will be spawned on the same machine and not a remote cluster. One such optimisation is that the required data is saved to a temporary file on the local disk from which it is then read into each sub-process. This is done to avoid the overhead of transferring the data over the network. Our assumption is that if you are at the stage where you need to be parallelising your analysis over a remote cluster then you would likely be better off parallelising across multiple `rbmi` runs rather than within a single `rbmi` run.

Finally, if you are doing a tipping point analysis you can get a reasonable performance improvement by re-using the cluster between each call to `rbmi_analyse()` e.g. “` cl <- make_rbmi_cluster(4) ana_1 <- rbmi_analyse( imputations = imputeObj, delta = delta_plan_1, cluster_or_cores = cl ) ana_2 <- rbmi_analyse( imputations = imputeObj, delta = delta_plan_2, cluster_or_cores = cl ) ana_3 <- rbmi_analyse( imputations = imputeObj, delta = delta_plan_3, cluster_or_cores = cl ) parallel::clusterStop(cl) “`

See also

[extract_imputed_dfs()] for manually extracting imputed datasets.

[delta_template()] for creating delta data.frames.

[rbmi_ancova()] for the default analysis function.

Examples

library(rbmi)
#> 
#> Attaching package: ‘rbmi’
#> The following object is masked from ‘package:junco’:
#> 
#>     make_rbmi_cluster
library(dplyr)

dat <- antidepressant_data
dat$GENDER <- as.factor(dat$GENDER)
dat$POOLINV <- as.factor(dat$POOLINV)
dat <- expand_locf(
  dat,
  PATIENT = levels(dat$PATIENT),
  # expand by PATIENT and VISIT
  VISIT = levels(dat$VISIT),
  vars = c("BASVAL", "THERAPY"),
  # fill with LOCF BASVAL and THERAPY
  group = c("PATIENT"),
  order = c("PATIENT", "VISIT")
)
dat_ice <- dat %>%
  arrange(PATIENT, VISIT) %>%
  filter(is.na(CHANGE)) %>%
  group_by(PATIENT) %>%
  slice(1) %>%
  ungroup() %>%
  select(PATIENT, VISIT) %>%
  mutate(strategy = "JR")
dat_ice <- dat_ice[-which(dat_ice$PATIENT == 3618), ]
vars <- set_vars(
  outcome = "CHANGE",
  visit = "VISIT",
  subjid = "PATIENT",
  group = "THERAPY",
  covariates = c("THERAPY")
)
drawObj <- draws(
  data = dat,
  data_ice = dat_ice,
  vars = vars,
  method = method_condmean(type = "jackknife"),
  quiet = TRUE
)
references <- c("DRUG" = "PLACEBO", "PLACEBO" = "PLACEBO")
imputeObj <- impute(drawObj, references)

rbmi_analyse(imputations = imputeObj, vars = vars)
#> 
#> Analysis Object
#> ---------------
#> Number of Results: 1 + 172
#> Analysis Function: rbmi_ancova
#> Delta Applied: FALSE
#> Analysis Estimates:
#>     var_4
#>     trt_PLACEBO_4
#>     lsm_DRUG_4
#>     lsm_PLACEBO_4
#>     var_5
#>     trt_PLACEBO_5
#>     lsm_DRUG_5
#>     lsm_PLACEBO_5
#>     var_6
#>     trt_PLACEBO_6
#>     lsm_DRUG_6
#>     lsm_PLACEBO_6
#>     var_7
#>     trt_PLACEBO_7
#>     lsm_DRUG_7
#>     lsm_PLACEBO_7
#>