Skip to contents

Introduction

TrueType fonts (i.e., those where different characters have different printed widths) complicate the calculation of column widths based on the contents of a table or listing, particularly when combined with verbose human readable column and-or row labels.

junco provides default algorithms for calculating appropriate column widths for both tables and listings when exporting to RTF via tt_to_tlgrtf.

These can be invoked explicitly by calling the def_colwidths function on a TableTree or listng_df object, along with a font specification.

Tables

Many tables have column labels many times longer than the data in that column’s cells; the width of cell data tends to be bounded by the fact it is a set of one to three numbers interspersed with punctuation, rather than words as is the case for labels.

Pagination Assumptions

tt_to_tlgrtf allows for horizontal rtables-style pagination, but does not perform vertical pagination; each vertical strip of the table (which, mind, comes from horizontal pagination) is written to a separate file. The combined_rtf argument indicates whether a single combined rtf should also be generated by stacking those separate sections of the table into a single RTF (as different table objects).

Algorithm And Optimality Criterion

The column-width algorithm for tables is relatively simple. For table columns, it calculates the widths required so that no cell values will be word-wrapped. This is essentially what rtables:::propose_column_widths does, with the exception that it does so including the column labels, which we have found in practice to be much wider than the cells. def_colwidths also constrains the maximum width of the row labels to the width (in inches) specified via label_width_ins, with a default of two inches.

Examples

We can see this by tables with the same structure and value contents but varying verbosity with column and row labels.

library(junco)
#> Loading required package: formatters
#> 
#> Attaching package: 'formatters'
#> The following object is masked from 'package:base':
#> 
#>     %||%
#> Loading required package: rtables
#> Loading required package: magrittr
#> 
#> Attaching package: 'rtables'
#> The following object is masked from 'package:utils':
#> 
#>     str
#> Registered S3 method overwritten by 'tern':
#>   method   from 
#>   tidy.glm broom

adsl2 <- ex_adsl
adsl2$ARM2 <- adsl2$ARM
levels(adsl2$ARM2) <- c("A", "B", "C")
adsl2$ARM3 <- adsl2$ARM
levels(adsl2$ARM3) <- c("Full Drug Name Of Drug X", "Current Best-Practice Standard Of Care", "The Weird Other Arm")

## col-labels unmodified (middling width)
lyt1 <- basic_table() |>
  split_cols_by("ARM") |>
  split_rows_by("RACE") |>
  summarize_row_groups(format = "xx (xx.xx%)") |>
  analyze("DCSREAS")

tbl1 <- build_table(lyt1, adsl2)

head(tbl1) 
#>                                    A: Drug X    B: Placebo    C: Combination
#> ————————————————————————————————————————————————————————————————————————————
#> ASIAN                             68 (50.75%)   67 (50.00%)    73 (55.30%)  
#>   ADVERSE EVENT                        4             4              5       
#>   LACK OF EFFICACY                     5             5              2       
#>   PHYSICIAN DECISION                   2             4              4       
#>   PROTOCOL VIOLATION                   1             7              5       
#>   WITHDRAWAL BY PARENT/GUARDIAN        3             1              2

## super narrow column labels
lyt2 <- basic_table() |>
  split_cols_by("ARM", labels_var = "ARM2") |>
  split_rows_by("RACE") |>
  summarize_row_groups(format = "xx (xx.xx%)") |>
  analyze("DCSREAS")

tbl2 <- build_table(lyt2, adsl2)

head(tbl2)
#>                                        A             B             C     
#> —————————————————————————————————————————————————————————————————————————
#> ASIAN                             68 (50.75%)   67 (50.00%)   73 (55.30%)
#>   ADVERSE EVENT                        4             4             5     
#>   LACK OF EFFICACY                     5             5             2     
#>   PHYSICIAN DECISION                   2             4             4     
#>   PROTOCOL VIOLATION                   1             7             5     
#>   WITHDRAWAL BY PARENT/GUARDIAN        3             1             2

## super wide column labels
lyt3 <-  basic_table() |>
  split_cols_by("ARM", labels_var = "ARM3") |>
  split_rows_by("RACE") |>
  summarize_row_groups(format = "xx (xx.xx%)") |>
  analyze("DCSREAS")

tbl3 <- build_table(lyt3, adsl2)
head(tbl3)
#>                                   Full Drug Name Of Drug X   Current Best-Practice Standard Of Care   The Weird Other Arm
#> —————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
#> ASIAN                                   68 (50.75%)                       67 (50.00%)                     73 (55.30%)    
#>   ADVERSE EVENT                              4                                 4                               5         
#>   LACK OF EFFICACY                           5                                 5                               2         
#>   PHYSICIAN DECISION                         2                                 4                               4         
#>   PROTOCOL VIOLATION                         1                                 7                               5         
#>   WITHDRAWAL BY PARENT/GUARDIAN              3                                 1                               2

rtables’ default column widths (implemented via formatters::propose_column_widths) takes the maximum width required for a label or value for each column (and the row-label pseudo column):

propose_column_widths(tbl1)
#> [1] 41 11 11 14

Which means that width of the third column will be slightly smaller for tbl2 as the column label is no longer wider than the group summary cell values. The first and second columns remain the same as the cell value widths were already slightly larger than the labels in tbl1.

propose_column_widths(tbl2)
#> [1] 41 11 11 11

Meanwhile, the verbose column labels in tbl3 result in dramatically wider column widths, as propose_column_widths enforces no wrapping even within column labels:

propose_column_widths(tbl3)
#> [1] 41 24 38 19

Meanwhile, def_colwidths gives the same widths for the 3 columns as with tbl2 for all 3 tables:

def_colwidths(tbl1, fontspec = font_spec(), label_width_ins = 2, col_gap = 0)
#> [1] 30 11 11 11
def_colwidths(tbl2, fontspec = font_spec(), label_width_ins = 2, col_gap = 0)
#> [1] 30 11 11 11
def_colwidths(tbl3, fontspec = font_spec(), label_width_ins = 2, col_gap = 0)
#> [1] 30 11 11 11

We see, however, that the label-row width has been reduced due to the label_width_ins constraint, which we can vary up to the maximum width the row labels need with no wrapping:

## bigger than 2, but not what we got from propose_column_labels
def_colwidths(tbl1, fontspec = font_spec(), label_width_ins = 2.2, col_gap = 0)
#> [1] 33 11 11 11
## bigger than required so we get same row label width as propose_column_widths
def_colwidths(tbl1, fontspec = font_spec(), label_width_ins = 6, col_gap = 0)
#> [1] 41 11 11 11

While we have done these examples with the default monospace font used by rtables and formatters, the difference is often particularly large when using a TrueType font with verbose labels, as many letters have larger print widths than punctuation and numeric digit characters:

fspec_times <- font_spec("Times", 9)
propose_column_widths(tbl3, fontspec = fspec_times )
#> [1] 93 45 65 36
def_colwidths(tbl3, fontspec = fspec_times, label_width_ins = 2, col_gap = 0)
#> [1] 64 20 20 20

We note here that for our (fictional but realistically verbose) column labels in tbl3, the default behavior from formatters will not fit on a single page as even without padding between the columns, those widths take up

sum(propose_column_widths(tbl3, fontspec = fspec_times))
#> [1] 239

space-character widths (which is the unit formatters calculates widths in) while a standard page only has

formatters::page_lcpp(fontspec = fspec_times )$cpp
#> [1] 224

spaces of width available.

The column widths calculated by def_colwidths, however, easily fit on a single page.

Listings

Listings, unlike tables, often have text in their cell values, sometimes even concatenations of multiple demographic variables into a single column. They also do not have the row-labels pseudo-column present in tables. As such, we need a different, and much more complicated, algorithm to calculate good column widths.

Pagination Assumptions

def_colwidths assumes that listings should not be horizontally paginated, so all columns, and any gaps between them, must fit within the width of a single page.

Optimality Criterion

For listings, we optimize the number of total lines a listing will require to print, including repetition of the table header. This helps control the total size of the resulting RTF file, as well as generally providing a better reading experience for the listing.

We further constrain our column widths such that no words within cell values will need to be broken up by word wrapping, if possible. We define “words” for this purpose as a string of characters separated by space(s) or “-”.

For this reason, we recommend concatenation of values into listing column values to be split by e.g., " / " rather than "/", as even though that makes the value slightly longer it gives the algorithm much more flexibility to find column widths that don’t break up individual “words”.

This translates, generally to finding widths where after wrapping, a single column isn’t wrapped many more times than others within the majority of rows. In practice, we have found that this results in listings that are both legible and aesthetically reasonable.

Algorithm

The algorithm for selecting column widths has two parts. First, for each column individually, all widths that would result in different numbers of total lines for the cells in the columns are determined; the constraint that words within cells not be broken up is key here, as it dramatically reduces the number of widths that actually result in different numbers of lines. The second step is to search the space of candidate column widths collectively for the optimal set, which combines to less than the total available space.

We will use the following data to illustrate:

library(rlistings)
#> Loading required package: tibble

adae <- pharmaverseadam::adae
adae$AEOUT <- gsub("/", " / ", adae$AEOUT)
adsl <-  pharmaverseadam::adsl

adsl <- adsl[, c("USUBJID", setdiff(names(adsl), names(adae)))]

lstdat <- merge(adae, adsl, by = "USUBJID")
var_labels(lstdat) <- c(var_labels(adae), var_labels(adsl)[-1])
lstdat$demog <- with_label(paste(lstdat$RACE, lstdat$SEX, lstdat$AGE, sep = " / "), "Demographic Information")

lsting <- as_listing(lstdat,key_cols = c("USUBJID"),
                     disp_cols = c("ACTARM", "COUNTRY", "demog", "AESEV", "AEBODSYS", "AEDECOD", "ASTDTM", "AENDTM", "AEOUT", "EOSSTT"))

Candidate Column Widths

For example, the last cell in the demographics column contains the value

demcell <- lstdat$demog[nrow(lstdat)]
demcell
#> [1] "BLACK OR AFRICAN AMERICAN / F / 74"

Broken up according to our definition, it contains the following “words” which must remain whole during column width selection.

wrds <- strsplit(demcell, "[ -]")[[1]]
wrds
#> [1] "BLACK"    "OR"       "AFRICAN"  "AMERICAN" "/"        "F"        "/"       
#> [8] "74"

Assuming a monospace font for simplicity, then, the smallest possible width of the column is

max(nchar(wrds))
#> [1] 8

And using that width, the first two words fit into a line, the third into another, the fourth in its own, and “words” five through 8 all fit into a final line, for a total of four lines. We call this packing lines


packed_widths <- function(...) {
  lst <- list(...)
  nchar(vapply(lst, paste, collapse = " ", ""))
}
packed_widths(wrds[1:2],
              wrds[3],
              wrds[4],
              wrds[5:8])
#> [1] 8 7 8 8

Recall that we do not care which words are allocated where, only the total number of lines required, so a colwidth of 10, which would allow the fifth word (/) to be packed into the same line as the fourth, resulting in AMERICAN /, results in the same number of total lines, so will not be considered a distinct possible column width with respect to that cell.

The next column width that results in fewer lines for that cell is one where words one through three are all able to be packed into a single line, with spaces between them, 16 in this case.

With that column width, we get three lines as we do not have enough room for the space required to consolidate the final two lines into one.

packed_widths(wrds[1:3], wrds[4], wrds[5:8])
#> [1] 16  8  8

Increasing the column width to 17, however, allows us to get down to two lines:

packed_widths(wrds[1:3],
              wrds[4:8])
#> [1] 16 17

Finally, the last possible width with a different line total is the smallest width that will fit the entire value, i.e., 34.

So for this cell, there are four, and only four, candidate column widths.

Selecting The Optimal Set Of Widths

Once we have the full set of candidate widths for each column individually, the algorithm for selecting the optimal collective set is as follows:

  1. Initialize
    1. Remove candidate widths which result in column labels requiring more than allowable lines (default 3)
    2. Initialize with smallest candidate width for each column
  2. Determine column which requires the largest total lines
  3. Check if total space allows for changing to next candidate width for that column
    1. If it does, select that column width and goto step (1)
    2. otherwise, end search and spread any remaining available space equally among columns

We are able to end the search at step (2b) because even if another column has a candidate width available that would require less lines, the total lines for the document are determined solely by the column which requires the most lines, so changing it as such won’t affect the outcome.

Example

def_colwidths calls down to listing_column_widths with default values when passed a listing_df object. We will call the latter directly here for explicitness, and to make the column widths more directly comparable via export_as_txt output.


fspec_times8 <- font_spec("Times", 8, 1)
cw <- listing_column_widths(lsting, col_gap = 0, fontspec = fspec_times8, verbose = TRUE)
#> Optimizng Column Widths
#> Initial lines required: 3979
#> Available adjustment: 33 spaces
#> COL 10 width: 26->51 lines req: 3825->1914
#> COL 6 width: 40->48 lines req: 3415->2974

txt <- export_as_txt(lsting, pg_width = inches_to_spaces(8.88, fontspec = fspec_times8),
                     lpp = NULL, colwidths = cw,
                     fontspec = fspec_times8, col_gap = 0)

txt2 <- strsplit(txt, "\n", fixed = FALSE)[[1]]
head(txt2)
#> [1] "Unique Subject    Description of                                                                                                                                                                              Analysis Start    Analysis End                                                                                    "
#> [2] "  Identifier        Actual Arm        Country    Demographic Information      Severity/Intensity                 Body System or Organ Class                           Dictionary-Derived Term                    Date/Time        Date/Time                 Outcome of Adverse Event                   End of Study Status      "
#> [3] "————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————"
#> [4] "  01-701-1015        Placebo            USA           WHITE / F / 63                 MILD                GENERAL DISORDERS AND ADMINISTRATION SITE                   APPLICATION SITE ERYTHEMA                  2014-01-03           NA                   NOT RECOVERED / NOT RESOLVED                      COMPLETED           "
#> [5] "                                                                                                                         CONDITIONS                                                                                                                                                                                             "
#> [6] "                     Placebo            USA           WHITE / F / 63                 MILD                GENERAL DISORDERS AND ADMINISTRATION SITE                   APPLICATION SITE PRURITUS                  2014-01-03           NA                   NOT RECOVERED / NOT RESOLVED                      COMPLETED           "
length(txt2)
#> [1] 2096

Versus giving each column an equal portion of the width (admittedly an ill-conceived strategy)

txtbad <- export_as_txt(lsting, pg_width = inches_to_spaces(8.88, fontspec = fspec_times8),
                     lpp = NULL, colwidths = rep(floor(320/11), 11),
                     fontspec = fspec_times8, col_gap = 0)
txt2bad <- strsplit(txtbad, "\n", fixed = TRUE)[[1]]
head(txt2bad)
#> [1] "  Unique Subject Identifier    Description of Actual Arm             Country              Demographic Information        Severity/Intensity       Body System or Organ Class     Dictionary-Derived Term     Analysis Start Date/Time      Analysis End Date/Time      Outcome of Adverse Event        End of Study Status     "
#> [2] "———————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————"
#> [3] "         01-701-1015                    Placebo                        USA                    WHITE / F / 63                    MILD                 GENERAL DISORDERS AND      APPLICATION SITE ERYTHEMA           2014-01-03                       NA              NOT RECOVERED / NOT RESOLVED           COMPLETED          "
#> [4] "                                                                                                                                                      ADMINISTRATION SITE                                                                                                                                                      "
#> [5] "                                                                                                                                                          CONDITIONS                                                                                                                                                           "
#> [6] "                                        Placebo                        USA                    WHITE / F / 63                    MILD                 GENERAL DISORDERS AND      APPLICATION SITE PRURITUS           2014-01-03                       NA              NOT RECOVERED / NOT RESOLVED           COMPLETED          "
length(txt2bad)
#> [1] 2306

So we see that our algorithm saved 9.11 percent of the total lines required by (a set of) naive column widths in this instance.