
Analysis/statistical function for count and percentage in core columns and (optional) relative risk columns
Source:R/a_freq_j.R
a_freq_j.RdAnalysis/statistical function for count and percentage in core columns and (optional) relative risk columns
Usage
s_freq_j(
df,
.var,
.df_row,
val = NULL,
drop_levels = FALSE,
excl_levels = NULL,
alt_df,
parent_df,
id = "USUBJID",
denom = c("n_df", "n_altdf", "N_col", "n_rowdf", "n_parentdf"),
.N_col,
countsource = c("df", "altdf")
)
a_freq_j(
df,
labelstr = NULL,
.var = NA,
val = NULL,
drop_levels = FALSE,
excl_levels = NULL,
new_levels = NULL,
new_levels_after = FALSE,
addstr2levs = NULL,
.df_row,
.spl_context,
.N_col,
id = "USUBJID",
denom = c("N_col", "n_df", "n_altdf", "N_colgroup", "n_rowdf", "n_parentdf"),
riskdiff = TRUE,
ref_path = NULL,
variables = list(strata = NULL),
conf_level = 0.95,
method = c("wald", "waldcc", "cmh", "ha", "newcombe", "newcombecc", "strat_newcombe",
"strat_newcombecc"),
weights_method = "cmh",
label = NULL,
label_fstr = NULL,
label_map = NULL,
.alt_df_full = NULL,
denom_by = NULL,
.stats = c("count_unique_denom_fraction"),
.formats = NULL,
.indent_mods = NULL,
na_str = rep("NA", 3),
.labels_n = NULL,
extrablankline = FALSE,
extrablanklineafter = NULL,
restr_columns = NULL,
colgroup = NULL,
countsource = c("df", "altdf")
)Arguments
- df
(
data.frame)
data set containing all analysis variables.- .var
(
string)
single variable name that is passed byrtableswhen requested by a statistics function.- .df_row
(
data.frame)
data frame across all of the columns for the given row split.- val
(
characteror NULL)
When NULL, all levels of the incoming variable (variable used in theanalyzecall) will be considered.
When a singlestring, only that current level/value of the incoming variable will be considered.
When multiple levels, only those levels/values of the incoming variable will be considered.
When no values are observed (eg zero row input df), a row with row-labelNo data reportedwill be included in the table.- drop_levels
(
logical)
IfTRUEnon-observed levels (based upon .df_row) will not be included.
Cannot be used together withval.- excl_levels
(
characteror NULL)
When NULL, no levels of the incoming variable (variable used in theanalyzecall) will be excluded.
When multiple levels, those levels/values of the incoming variable will be excluded.
Cannot be used together withval.- alt_df
(
dataframe)
Will be derived based upon alt_df_full and denom_by within a_freq_j.- parent_df
(
dataframe)
Will be derived within a_freq_j based upon the input dataframe that goes into build_table (df) and denom_by.
It is a data frame in the higher row-space than the current input df (which underwent row-splitting by the rtables splitting machinery).- id
(
string)
subject variable name.- denom
(
string)
See Details.- .N_col
(
integer)
column-wise N (column count) for the full column being analyzed that is typically passed byrtables.- countsource
Either
dforalt_df.
Whenalt_dfthe counts will be based upon the alternative dataframealt_df.
This is useful for subgroup processing, to present counts of subjects in a subgroup from the alternative dataframe.- labelstr
An argument to ensure this function can be used as a
cfunin asummarize_row_groupscall.
It is recommended not to utilize this argument for other purposes.
The label argument could be used instead (ifvalis a single string)
An another approach could be to utilize thelabel_mapargument to control the row labels of the incoming analysis variable.- new_levels
(list(2) or NULL)
List of length 2.
First element : names of the new levels
Second element: list with values of the new levels.- new_levels_after
(
logical)
IfTRUEnew levels will be added after last level.- addstr2levs
string, if not NULL will be appended to the rowlabel for that level, eg to add ",n (percent)" at the end of the rowlabels
- .spl_context
(
data.frame)
gives information about ancestor split states that is passed byrtables.- riskdiff
(
logical)
WhenTRUE, risk difference calculations will be performed and presented (if required risk difference column splits are included).
WhenFALSE, risk difference columns will remain blank (if required risk difference column splits are included).- ref_path
(
string)
Column path specifications for the control group for the relative risk derivation.- variables
Will be passed onto the relative risk function (internal function s_rel_risk_val_j), which is based upon
tern::s_proportion_diff().
See?tern::s_proportion_difffor details.- conf_level
(
proportion)
confidence level of the interval.- method
Will be passed onto the relative risk function (internal function s_rel_risk_val_j).
- weights_method
Will be passed onto the relative risk function (internal function s_rel_risk_val_j).
- label
(
string)
Whenvalis a singlestring, the row label to be shown on the output can be specified using this argument.
Whenvalis acharacter vector, thelabel_mapargument can be specified to control the row-labels.- label_fstr
(
string)
a sprintf style format string. It can contain up to one "\ generates the row/column label.
It will be combined with thelabelstrargument, when utilizing this function as acfunin asummarize_row_groupscall.
It is recommended not to utilize this argument for other purposes. The label argument could be used instead (ifvalis a single string)- label_map
(
tibble)
A mapping tibble to translate levels from the incoming variable into a different row label to be presented on the table.- .alt_df_full
(
dataframe)
Denominator dataset for fraction and relative risk calculations.
.alt_df_full is a crucial parameter for the relative risk calculations if this parameter is not set to utilizealt_counts_df, then the values in the relative risk columns might not be correct.
Once the rtables PR is integrated, this argument gets populated by the rtables split machinery (see rtables::additional_fun_params).- denom_by
(
character)
Variables from row-split to be used in the denominator derivation.
This controls bothdenom = "n_parentdf"anddenom = "n_altdf".
Whendenom = "n_altdf", the denominator is derived from.alt_df_fullin combination withdenom_byargument- .stats
(
character)
statistics to select for the table. See Value for list of available statistics.- .formats
(named 'character' or 'list')
formats for the statistics.- .indent_mods
(named
integer)
indent modifiers for the labels. Defaults to 0, which corresponds to the unmodified default behavior. Can be negative.- na_str
(
string)
string used to replace allNAor empty values in the output.- .labels_n
(named
character)
String to control row labels for the 'n'-statistics.
Only useful when more than one 'n'-statistic is requested (rare situations only).- extrablankline
(
logical)
WhenTRUE, an extra blank line will be added after the last value.
Avoid using this in template scripts, use section_div = " " instead (once PR for rtables is available)- extrablanklineafter
(
string)
When the row-label matches the string, an extra blank line will be added after that value.- restr_columns
character
If not NULL, columns not defined inrestr_columnswill be blanked out.- colgroup
The name of the column group variable that is used as source for denominator calculation.
Required to be specified whendenom = "N_colgroup".
Value
s_freq_j: returns a list of following statisticsn_df
n_rowdf
n_parentdf
n_altdf
denom
count
count_unique
count_unique_fraction
count_unique_denom_fraction
a_freq_j: returns a list of requested statistics with formattedrtables::CellValue().
Within the relative risk difference columns, the following stats are blanked out:any of the n-statistics (n_df, n_altdf, n_parentdf, n_rowdf, denom)
count
count_unique
For the others (count_unique_fraction, count_unique_denom_fraction), the statistic is replaced by the relative risk difference + confidence interval.
Details
denom controls the denominator used to calculate proportions/percents.
It must be one of
N_col Column count,
n_df Number of patients (based upon the main input dataframe
df),n_altdf Number of patients from the secondary dataframe (
.alt_df_full),
Note that argumentdenom_bywill perform a row-split on the.alt_df_fulldataframe.
It is a requirement that variables specified indenom_byare part of the row split specifications.N_colgroup Number of patients from the column group variable (note that this is based upon the input .alt_df_full dataframe).
Note that the argumentcolgroup(column variable) needs to be provided, as it cannot be retrieved directly from the column layout definition.n_rowdf Number of patients from the current row-level dataframe (
.row_dffrom the rtables splitting machinery).n_parentdf Number of patients from a higher row-level split than the current split.
This higher row-level split is specified in the argumentdenom_by.
Examples
library(dplyr)
adsl <- ex_adsl |> select("USUBJID", "SEX", "ARM")
adae <- ex_adae |> select("USUBJID", "AEBODSYS", "AEDECOD")
adae[["TRTEMFL"]] <- "Y"
trtvar <- "ARM"
ctrl_grp <- "B: Placebo"
adsl$colspan_trt <- factor(ifelse(adsl[[trtvar]] == ctrl_grp, " ", "Active Study Agent"),
levels = c("Active Study Agent", " ")
)
adsl$rrisk_header <- "Risk Difference (%) (95% CI)"
adsl$rrisk_label <- paste(adsl[[trtvar]], paste("vs", ctrl_grp))
adae <- adae |> left_join(adsl)
#> Joining with `by = join_by(USUBJID)`
colspan_trt_map <- create_colspan_map(adsl,
non_active_grp = "B: Placebo",
non_active_grp_span_lbl = " ",
active_grp_span_lbl = "Active Study Agent",
colspan_var = "colspan_trt",
trt_var = trtvar
)
ref_path <- c("colspan_trt", " ", trtvar, ctrl_grp)
lyt <- basic_table(show_colcounts = TRUE) |>
split_cols_by("colspan_trt", split_fun = trim_levels_to_map(map = colspan_trt_map)) |>
split_cols_by(trtvar) |>
split_cols_by("rrisk_header", nested = FALSE) |>
split_cols_by(trtvar, labels_var = "rrisk_label", split_fun = remove_split_levels(ctrl_grp))
lyt1 <- lyt |>
analyze("TRTEMFL",
show_labels = "hidden",
afun = a_freq_j,
extra_args = list(
method = "wald",
.stats = c("count_unique_denom_fraction"),
ref_path = ref_path
)
)
result1 <- build_table(lyt1, adae, alt_counts_df = adsl)
result1
#> Active Study Agent Risk Difference (%) (95% CI)
#> A: Drug X C: Combination B: Placebo A: Drug X vs B: Placebo C: Combination vs B: Placebo
#> (N=134) (N=132) (N=134) (N=134) (N=132)
#> ————————————————————————————————————————————————————————————————————————————————————————————————————————————————
#> Y 122/134 (91.0%) 120/132 (90.9%) 123/134 (91.8%) -0.7 (-7.5, 6.0) -0.9 (-7.6, 5.9)
x_drug_x <- list(length(unique(subset(adae, adae[[trtvar]] == "A: Drug X")[["USUBJID"]])))
N_x_drug_x <- length(unique(subset(adsl, adsl[[trtvar]] == "A: Drug X")[["USUBJID"]]))
y_placebo <- list(length(unique(subset(adae, adae[[trtvar]] == ctrl_grp)[["USUBJID"]])))
N_y_placebo <- length(unique(subset(adsl, adsl[[trtvar]] == ctrl_grp)[["USUBJID"]]))
tern::stat_propdiff_ci(
x = x_drug_x,
N_x = N_x_drug_x,
y = y_placebo,
N_y = N_y_placebo
)
#> [[1]]
#> [1] -0.7462687 -7.4525893 5.9600520
#>
x_combo <- list(length(unique(subset(adae, adae[[trtvar]] == "C: Combination")[["USUBJID"]])))
N_x_combo <- length(unique(subset(adsl, adsl[[trtvar]] == "C: Combination")[["USUBJID"]]))
tern::stat_propdiff_ci(
x = x_combo,
N_x = N_x_combo,
y = y_placebo,
N_y = N_y_placebo
)
#> [[1]]
#> [1] -0.8819539 -7.6386167 5.8747089
#>
extra_args_rr <- list(
denom = "n_altdf",
denom_by = "SEX",
riskdiff = FALSE,
.stats = c("count_unique")
)
extra_args_rr2 <- list(
denom = "n_altdf",
denom_by = "SEX",
riskdiff = TRUE,
ref_path = ref_path,
method = "wald",
.stats = c("count_unique_denom_fraction"),
na_str = rep("NA", 3)
)
lyt2 <- basic_table(
top_level_section_div = " ",
colcount_format = "N=xx"
) |>
split_cols_by("colspan_trt", split_fun = trim_levels_to_map(map = colspan_trt_map)) |>
split_cols_by(trtvar, show_colcounts = TRUE) |>
split_cols_by("rrisk_header", nested = FALSE) |>
split_cols_by(trtvar,
labels_var = "rrisk_label", split_fun = remove_split_levels("B: Placebo"),
show_colcounts = FALSE
) |>
split_rows_by("SEX", split_fun = drop_split_levels) |>
summarize_row_groups("SEX",
cfun = a_freq_j,
extra_args = append(extra_args_rr, list(label_fstr = "Gender: %s"))
) |>
split_rows_by("TRTEMFL",
split_fun = keep_split_levels("Y"),
indent_mod = -1L,
section_div = c(" ")
) |>
summarize_row_groups("TRTEMFL",
cfun = a_freq_j,
extra_args = append(extra_args_rr2, list(
label =
"Subjects with >=1 AE", extrablankline = TRUE
))
) |>
split_rows_by("AEBODSYS",
split_label = "System Organ Class",
split_fun = trim_levels_in_group("AEDECOD"),
label_pos = "topleft",
section_div = c(" "),
nested = TRUE
) |>
summarize_row_groups("AEBODSYS",
cfun = a_freq_j,
extra_args = extra_args_rr2
) |>
analyze("AEDECOD",
afun = a_freq_j,
extra_args = extra_args_rr2
)
result2 <- build_table(lyt2, adae, alt_counts_df = adsl)