Skip to contents
library(appendMCP)
#> ---------------------------------------------------------------------------------------------------------
#> appendMCP: Tools for defining graphical multiple testing procedures in group-sequentially designed trials
#> ---------------------------------------------------------------------------------------------------------
#>                                                             _   __  __    _____   _____
#>                                                            | | |  \/  |  / ____| |  __ \
#>                   __ _   _ __    _ __     ___   _ __     __| | | \  / | | |      | |__) |
#>                  / _` | | '_ \  | '_ \   / _ \ | '_ \   / _` | | |\/| | | |      |  ___/
#>                 | (_| | | |_) | | |_) | |  __/ | | | | | (_| | | |  | | | |____  | |
#>                  \__,_| | .__/  | .__/   \___| |_| |_|  \__,_| |_|  |_|  \_____| |_|
#>                         | |     | |
#>                         |_|     |_|
#>     
#> ---------------------------------------------------------------------------------------------------------
#> 
#> v0.3.0: For an overview of the package's functionality enter: ?appendMCP
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(knitr)
library(purrr)

Introduction

This vignette provides a comprehensive guide to configuring studies in the appendMCP package. We’ll examine each component of the configuration object using the built-in example study to understand the structure and meaning of all data elements.

Configuration Overview

A complete study configuration consists of 9 main components:

# Load the example configuration
data("example_study_config")
config <- example_study_config

# Display the top-level structure
names(config)
#> [1] "study_name"        "study_description" "alpha"            
#> [4] "analyses"          "hypotheses"        "enroll_rate"      
#> [7] "distribution_tte"  "distribution_bin"  "graph"

Let’s examine each component in detail:

1. Study Metadata

Study Name and Description

cat("Study Name:", config$study_name, "\n")
#> Study Name: Example Study
cat("Description:", config$study_description, "\n")
#> Description: A 3-hypotheses group sequential design
cat("Alpha Level:", config$alpha, "\n")
#> Alpha Level: 0.025

Fields: - study_name: Character string identifying the study - study_description: Brief description of the study design - alpha: Overall Type I error rate (typically 0.025 for one-sided tests)

2. Analyses Specification

The analyses data frame defines when analyses will be conducted. Due to list columns, we display it in parts:

# Part 1: Basic trigger columns
kable(config$analyses[,c("endpoint", "strata", "treatments", "sample_size", "events")],
      caption = "Analyses Specification - Part 1: Trigger Conditions")
Analyses Specification - Part 1: Trigger Conditions
endpoint strata treatments sample_size events
CR Type_A, Type_B TRT, PBO 500 NA
OS Type_A, Type_B TRT, PBO NA 234
OS Type_A, Type_B TRT, PBO NA 284
OS Type_A, Type_B TRT, PBO NA 334
# Part 2: Power subset specifications (displayed as text due to nested lists)
cat("\nPart 2: Power Subset Specifications\n")
#> 
#> Part 2: Power Subset Specifications
cat("===================================\n")
#> ===================================
for(i in seq_len(nrow(config$analyses))) {
  cat("\nAnalysis", i, "(endpoint:", config$analyses$endpoint[i], "):\n")
  cat("  power_subsets_any:", paste(names(config$analyses$power_subsets_any[[i]]), collapse = ", "), "\n")
  cat("  power_subsets_all:", paste(names(config$analyses$power_subsets_all[[i]]), collapse = ", "), "\n")
}
#> 
#> Analysis 1 (endpoint: CR ):
#>   power_subsets_any: H1 v H2, H1 v H2 v H3 
#>   power_subsets_all: H1,H2,H3 
#> 
#> Analysis 2 (endpoint: OS ):
#>   power_subsets_any: H1 v H2, H1 v H2 v H3 
#>   power_subsets_all: H1,H2,H3 
#> 
#> Analysis 3 (endpoint: OS ):
#>   power_subsets_any: H1 v H2, H1 v H2 v H3 
#>   power_subsets_all: H1,H2,H3 
#> 
#> Analysis 4 (endpoint: OS ):
#>   power_subsets_any: H1 v H2, H1 v H2 v H3 
#>   power_subsets_all: H1,H2,H3

Row Interpretation:

  • Each row represents one planned analysis and describes the conditions for an analysis to occur
  • Analysis timing (since the first subject randomized) is determined by reaching the specified sample size or event count
  • Multiple analyses can be conducted for the same endpoint at different information levels

Column Definitions:

  • endpoint: The endpoint that triggers the analysis (e.g., “CR”, “OS”, “EFS”)
  • strata: List of patient strata (or overall population) that constitute the sample size (or number of events) that triggers the analysis
  • treatments: List of treatment arms whose patients contribute to the sample size (or number of events) that triggers the analysis. Note: In multi-armed studies, an analysis can be triggered by monitoring a subset of treatment arms
  • sample_size: Target sample size for binary endpoints (NA for time-to-event)
  • events: Target number of events for time-to-event endpoints (NA for binary)
  • power_subsets_any: Named list of hypothesis subsets for “at least one rejection” power calculations. Each element is a vector of hypothesis indices (e.g., list("H1, H2" = c(1, 2)) means power to reject at least one of H1 or H2)
  • power_subsets_all: Named list of hypothesis subsets for “all rejections” power calculations. Each element is a vector of hypothesis indices (e.g., list("H1, H2, H3" = c(1, 2, 3)) means power to reject all of H1, H2, and H3)
# Show the structure of list columns
cat("Strata for Analysis 1:", paste(config$analyses$strata[[1]], collapse = ", "), "\n")
#> Strata for Analysis 1: Type_A, Type_B
cat("Treatments for Analysis 1:", paste(config$analyses$treatments[[1]], collapse = ", "), "\n")
#> Treatments for Analysis 1: TRT, PBO
cat("\nPower subsets (any) for Analysis 1:\n")
#> 
#> Power subsets (any) for Analysis 1:
print(config$analyses$power_subsets_any[[1]])
#> $`H1 v H2`
#> [1] 1 2
#> 
#> $`H1 v H2 v H3`
#> [1] 1 2 3
cat("\nPower subsets (all) for Analysis 1:\n")
#> 
#> Power subsets (all) for Analysis 1:
print(config$analyses$power_subsets_all[[1]])
#> $`H1,H2,H3`
#> [1] 1 2 3

3. Hypotheses Definition

The hypotheses data frame specifies the statistical hypotheses to be tested. We display all columns in parts:

# Part 1: Basic hypothesis definition
kable(config$hypotheses[,c("type", "endpoint", "strata", "control", "test")],
      caption = "Hypotheses - Part 1: Basic Definition")
Hypotheses - Part 1: Basic Definition
type endpoint strata control test
Primary CR Type_A, Type_B PBO TRT
Primary OS Type_A, Type_B PBO TRT
Secondary EFS Type_A, Type_B PBO TRT
# Part 2: Analysis assignments
analyses_info <- data.frame(
  endpoint = config$hypotheses$endpoint,
  analyses_analysed = sapply(config$hypotheses$analyses_analysed,
                             function(x) paste(x, collapse = ", "))
)
kable(analyses_info, caption = "Hypotheses - Part 2: Analysis Assignments")
Hypotheses - Part 2: Analysis Assignments
endpoint analyses_analysed
CR 1
OS 1, 2, 3, 4
EFS 1, 2, 3, 4
# Part 3: Spending function details
spending_info <- config$hypotheses %>%
  select(endpoint, sf, sfpar, nominal) %>%
  mutate(
    sfpar = sapply(sfpar, function(x) if(is.null(x)) "NULL" else as.character(x)),
    nominal = sapply(nominal, function(x) if(is.null(x)) "NULL" else paste(x, collapse=", "))
  )
kable(spending_info, caption = "Hypotheses - Part 3: Spending Functions")
Hypotheses - Part 3: Spending Functions
endpoint sf sfpar nominal
CR none NULL NULL
OS asHSD -1 0.001
EFS asHSD -1 0.001
# Part 4: Test method
test_info <- config$hypotheses %>%
  select(endpoint, test_method)
kable(test_info, caption = "Hypotheses - Part 4: Test Methods")
Hypotheses - Part 4: Test Methods
endpoint test_method
CR unpooled_proportions
OS logrank
EFS logrank

Column Definitions:

  • type: Hypothesis type (“Primary” or “Secondary”)
  • endpoint: Endpoint being tested
  • strata: List of strata for this hypothesis
  • control: Control treatment arm name
  • test: Test treatment arm name
  • analyses_analysed: List or single value specifying which analyses (by row index in the analyses data frame) test this hypothesis
  • sf: Spending function type (“none”, “asHSD”, “asOF”, “asP”, “asKD”, “asUser”) for group sequential design
  • sfpar: Spending function parameter (e.g., gamma for HSD; NULL if not applicable)
  • nominal: Nominal alpha spending at interim analyses (optional; NULL if not specified)
  • test_method: Statistical test method (“logrank”, “stratified_logrank”, “unpooled_proportions”, “pooled_proportions”, “cmh”)

Spending Function Types: - "none": No group sequential testing (single analysis) - "asHSD": Hwang-Shih-DeCani spending function - "asOF": O’Brien-Fleming spending function - "asP": Pocock spending function - "asKD": Kim-DeMets spending function - "asUser": User-defined spending

Test Method Types: - "logrank": Log-rank test for time-to-event endpoints (unstratified) - "stratified_logrank": Stratified log-rank test for time-to-event endpoints - "unpooled_proportions": Two-sample test for proportions (unpooled variance) - "pooled_proportions": Two-sample test for proportions (pooled variance) - "cmh": Cochran-Mantel-Haenszel test for stratified binary data

# Show which analyses test each hypothesis
for(i in seq_len(nrow(config$hypotheses))) {
  cat("Hypothesis", i, "(", config$hypotheses$endpoint[i], "):")
  cat(" Analyses", paste(config$hypotheses$analyses_analysed[[i]], collapse = ", "), "\n")
}
#> Hypothesis 1 ( CR ): Analyses 1 
#> Hypothesis 2 ( OS ): Analyses 1, 2, 3, 4 
#> Hypothesis 3 ( EFS ): Analyses 1, 2, 3, 4

4. Enrollment Rates

The enroll_rate data frame specifies patient enrollment assumptions:

kable(config$enroll_rate, caption = "Enrollment Rate Specification")
Enrollment Rate Specification
stratum treatments rate duration ratio
Type_A PBO, TRT 17.142857 28 1, 1
Type_B PBO, TRT 4.285714 28 1, 1

Column Definitions:

  • stratum: Patient stratum identifier
  • treatments: List of treatment arms for this stratum
  • rate: Enrollment rate (patients per month) for this stratum
  • duration: Enrollment duration (months) for this stratum
  • ratio: Randomization ratio vector for treatment arms (e.g., c(1, 1) for 1:1 randomization)

Row Interpretation: - Each row represents enrollment for one stratum - Total enrollment = rate × duration for each stratum - Treatments list shows which arms patients in this stratum can be randomized to

# Calculate total enrollment
total_enrollment <- sum(config$enroll_rate$rate * config$enroll_rate$duration)
cat("Total Planned Enrollment:", total_enrollment, "patients\n")
#> Total Planned Enrollment: 600 patients

# Show treatment allocation
for(i in seq_len(nrow(config$enroll_rate))) {
  cat("Stratum", config$enroll_rate$stratum[i], "treatments:")
  cat(paste(config$enroll_rate$treatments[[i]], collapse = ", "), "\n")
}
#> Stratum Type_A treatments:PBO, TRT 
#> Stratum Type_B treatments:PBO, TRT

5. Time-to-Event Distribution Parameters

The distribution_tte data frame specifies parameters for time-to-event endpoints:

kable(config$distribution_tte, caption = "Time-to-Event Distribution Parameters", digits = 4)
Time-to-Event Distribution Parameters
endpoint stratum treatment duration fail_rate dropout_rate
EFS Type_A PBO Inf 0.0462 0.0088
EFS Type_B PBO Inf 0.1172 0.0088
EFS Type_A TRT Inf 0.0314 0.0088
EFS Type_B TRT Inf 0.0797 0.0088
OS Type_A PBO Inf 0.0289 0.0088
OS Type_B PBO Inf 0.0866 0.0088
OS Type_A TRT Inf 0.0199 0.0088
OS Type_B TRT Inf 0.0598 0.0088

Column Definitions:

  • endpoint: Time-to-event endpoint name
  • stratum: Patient stratum
  • treatment: Treatment arm
  • duration: Duration of constant hazard period (Inf = constant throughout)
  • fail_rate: Hazard rate (events per month) for this period
  • dropout_rate: Dropout hazard rate (per month)

Row Interpretation: - Each row defines hazard rates for one stratum-treatment-endpoint combination - Multiple rows per combination allow for piecewise constant hazards - Dropout is assumed constant across all periods

# Calculate hazard ratios
tte_summary <- config$distribution_tte %>%
  filter(endpoint == "OS") %>%
  select(stratum, treatment, fail_rate) %>%
  tidyr::pivot_wider(names_from = treatment, values_from = fail_rate) %>%
  mutate(HR = TRT / PBO)

kable(tte_summary, caption = "Hazard Ratios for OS Endpoint", digits = 3)
Hazard Ratios for OS Endpoint
stratum PBO TRT HR
Type_A 0.029 0.02 0.69
Type_B 0.087 0.06 0.69

6. Binary Endpoint Parameters

The distribution_bin data frame specifies binary endpoint parameters:

kable(config$distribution_bin, caption = "Binary Endpoint Parameters", digits = 3)
Binary Endpoint Parameters
endpoint stratum treatment rate maturity_time
CR Type_A PBO 0.50 4.667
CR Type_B PBO 0.40 4.667
CR Type_A TRT 0.65 4.667
CR Type_B TRT 0.55 4.667

Column Definitions:

  • endpoint: Binary endpoint name
  • stratum: Patient stratum
  • treatment: Treatment arm
  • rate: Response rate (probability of success)
  • maturity_time: Time (months) when endpoint can be evaluated

Row Interpretation: - Each row defines response rate for one stratum-treatment-endpoint combination - Maturity time determines when patients contribute to the analysis - All patients must be followed for at least maturity_time months

# Calculate odds ratios
bin_summary <- config$distribution_bin %>%
  select(stratum, treatment, rate) %>%
  tidyr::pivot_wider(names_from = treatment, values_from = rate) %>%
  mutate(
    OR = (TRT / (1 - TRT)) / (PBO / (1 - PBO)),
    Risk_Diff = TRT - PBO
  )

kable(bin_summary, caption = "Treatment Effects for Binary Endpoints", digits = 3)
Treatment Effects for Binary Endpoints
stratum PBO TRT OR Risk_Diff
Type_A 0.5 0.65 1.857 0.15
Type_B 0.4 0.55 1.833 0.15

7. Graphical Testing Procedure

The graph component defines the multiple testing strategy:

cat("Transition Matrix:\n")
#> Transition Matrix:
print(config$graph$g)
#>      [,1] [,2] [,3]
#> [1,]    0    1    0
#> [2,]    0    0    1
#> [3,]    1    0    0
cat("\nInitial Weights:\n")
#> 
#> Initial Weights:
print(config$graph$w)
#> [1] 0.4 0.6 0.0

Components:

  • g: Transition matrix (3×3 for 3 hypotheses)
    • Element g[i,j] = fraction of alpha from hypothesis i transferred to hypothesis j when i is rejected
    • Diagonal elements should be 0
    • Row sums should be ≤ 1
  • w: Initial weight vector
    • Element w[i] = initial alpha allocation to hypothesis i
    • Sum should equal 1.0
    • Determines initial local significance levels
# Verify graph properties
cat("Row sums of transition matrix:", rowSums(config$graph$g), "\n")
#> Row sums of transition matrix: 1 1 1
cat("Sum of initial weights:", sum(config$graph$w), "\n")
#> Sum of initial weights: 1
cat("Initial local alpha levels:", config$alpha * config$graph$w, "\n")
#> Initial local alpha levels: 0.01 0.015 0

Configuration Validation

The package includes validation to ensure configurations are properly specified:

# Validate the configuration
is_valid <- validate_config(config)
cat("Configuration is valid:", is_valid, "\n")
#> Configuration is valid: TRUE

Processing the Configuration

Once configured, the study can be processed to generate all analysis components:

# Process the configuration
result <- process_config(config)

# Show what gets generated
cat("Generated components:\n")
#> Generated components:
cat(paste("-", names(result), collapse = "\n"), "\n")
#> - analyses
#> - hypotheses
#> - tables
#> - config
#> - graph_figure
#> - information_figure
#> - alpha_spend_figure
#> - timeline_type1_figure
#> - timeline_type2_figure
#> - bin_figure
#> - bin_rd_figure
#> - tte_figure
#> - tte_ahr_figure
#> - tte_cumhaz_figure
#> - tte_dropout_figure
#> - tte_dropout_probability_figure
#> - tte_hazard_figure
#> - tte_hr_figure
#> - tte_median_figure
#> - tte_quantiles_figure
#> - tte_weighted_figure
#> - er_figure
#> - er_cum_figure

Summary

A complete appendMCP configuration requires:

  1. Study metadata: Name, description, alpha level
  2. Analyses schedule: When analyses occur (by sample size or events), with power subset specifications for simulation-based operating characteristics
  3. Hypotheses: What gets tested, how (spending functions), and which test method to use
  4. Enrollment: Patient accrual rates by stratum with randomization ratios
  5. TTE distributions: Survival parameters by stratum/treatment
  6. Binary distributions: Response rates by stratum/treatment
  7. Graph structure: Multiple testing procedure definition

Each component must be internally consistent and align with the others. The validate_config() function helps ensure proper specification before analysis.

Best Practices

  • Start simple: Begin with basic designs and add complexity gradually
  • Validate early: Use validate_config() frequently during development
  • Document assumptions: Clearly specify all distributional assumptions
  • Check consistency: Ensure enrollment, analyses, and hypotheses align
  • Test scenarios: Try different parameter values to understand sensitivity

For more examples and advanced configurations, see the other package vignettes and documentation.