This vignette is meant for those who wish to contribute to {gtsummary}, or users who wish to gain an understanding of the inner-workings of a {gtsummary} object so they may more easily modify them to suit your own needs. If this does not describe you, please refer to the {gtsummary} website to an introduction on how to use the package’s functions and tutorials on advanced use.
Every {gtsummary} object has a few characteristics common among all objects. Here, we review those characteristics, and provide instructions on how to construct a {gtsummary} object.
Every {gtsummary} object is a list comprising of, at minimum, these elements:
The .$table_body
object is the data frame that will ultimately be printed as the output. The table must include columns "label"
, "row_type"
, and "variable"
. The "label"
column is printed, and the other two are hidden from the final output.
tbl_summary_ex$table_body
#> # A tibble: 8 x 5
#> variable row_type label stat_1 stat_2
#> <chr> <chr> <chr> <chr> <chr>
#> 1 age label Age, yrs 46 (37, 59) 48 (39, 56)
#> 2 age missing Unknown 7 4
#> 3 grade label Grade <NA> <NA>
#> 4 grade level I 35 (36%) 33 (32%)
#> 5 grade level II 32 (33%) 36 (35%)
#> 6 grade level III 31 (32%) 33 (32%)
#> 7 response label Tumor Response 28 (29%) 33 (34%)
#> 8 response missing Unknown 3 4
The .$table_header
object is a data frame containing information about each of the columns in .$table_body
(one row per column in .$table_body
). The table header has the following columns:
Column | Description |
---|---|
column | Column name from table_body |
label | Label that will be displayed (if column is displayed in output) |
hide | Logical indicating whether the column is hidden in the output |
align | Specifies the alignment/justification of the column, e.g. ‘center’ or ‘left’ |
missing_emdash | Indicates the rows to include an em-dash for missing cells. For example row_ref == TRUE in tbl_regression() |
indent | String of R code that results in a logical vector that specifies rows to indent, e.g. row_type != 'label' |
text_interpret | the {gt} function that is used to interpret the column label |
bold | For columns that bold rows conditionally, the column includes a string of R code that results in a logical vector indicating the rows to bold For example, row_type == 'label' |
italic | For columns that italicize rows conditionally, the column includes a string of R code that results in a logical vector indicating the rows to italicize. For example, row_type == 'label' |
fmt_fun | If the column needs to be formatted, this list column contains the function that performs the formatting. Note, this is the function object; not the character name of a function. |
footnote_abbrev | Lists the abbreviation footnotes for a table. All abbreviation footnotes are collated into a single footnote. For example, ‘OR = Odds Ratio’ and ‘CI = Confidence Interval’ appear in a single footnote. |
footnote | Lists the footnotes that will appear for each column. |
spanning_header | Includes text printed above columns as spanning headers. See tbl_merge(...)$table_header output for example of use. |
NOTE: Columns ‘hide’, ‘align’, ‘missing_emdash’, ‘indent’, ‘bold’, and ‘italic’ MUST follow the tidyverse style guidelines and include spaces around any variable names, e.g. row_type == 'label'
(NOT row_type=='label'
).
Example from tbl_regression()
tbl_regression_ex$table_header
#> # A tibble: 13 x 13
#> column label hide align missing_emdash indent text_interpret bold italic
#> <chr> <chr> <lgl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 varia~ vari~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 2 var_t~ var_~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 3 row_r~ row_~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 4 row_t~ row_~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 5 label **Ch~ FALSE left <NA> row_t~ gt::md <NA> <NA>
#> 6 N N TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 7 estim~ **Be~ FALSE cent~ row_ref == TR~ <NA> gt::md <NA> <NA>
#> 8 std.e~ std.~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 9 stati~ stat~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 10 conf.~ conf~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 11 conf.~ conf~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 12 ci **95~ FALSE cent~ row_ref == TR~ <NA> gt::md <NA> <NA>
#> 13 p.val~ **p-~ FALSE cent~ <NA> <NA> gt::md p.va~ <NA>
#> # ... with 4 more variables: fmt_fun <list>, footnote_abbrev <chr>,
#> # footnote <chr>, spanning_header <chr>
When constructing a {gtsummary} object, the author will begin with the .$table_body
object. Recall the .$table_body
data frame must include columns "label"
, "row_type"
, and "variable"
. Of these columns, only the "label"
column will be printed with the final results. The "row_type"
column typically will control whether or not the label column is indented. The "variable"
column is often used in the inline_text()
family of functions, and merging {gtsummary} tables with tbl_merge()
.
tbl_regression_ex %>%
pluck("table_body") %>%
select(variable, row_type, label)
#> # A tibble: 5 x 3
#> variable row_type label
#> <chr> <chr> <chr>
#> 1 grade label Grade
#> 2 grade level I
#> 3 grade level II
#> 4 grade level III
#> 5 marker label Marker Level, ng/mL
The other columns in .$table_body
are created by the user and are likely printed in the output. Formatting and printing instructions for these columns is stored in .$table_header
.
The .$table_header
has one row for every column in .$table_body
containing instructions how to format each column, the column headers, and more. There are a few internal {gtsummary} functions to assist in constructing and modifying a .$table_header
data frame.
First is the table_header_fill_missing()
function. This function ensures .$table_header
contains a row for every column of .$table_body
. If a column does not exist, it is populated with appropriate default values.
gtsummary:::table_header_fill_missing(
table_header = tibble(column = names(tbl_regression_ex$table_body))
)
#> # A tibble: 13 x 13
#> column label hide align missing_emdash indent text_interpret bold italic
#> <chr> <chr> <lgl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 varia~ vari~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 2 var_t~ var_~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 3 row_r~ row_~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 4 row_t~ row_~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 5 label label TRUE left <NA> row_t~ gt::md <NA> <NA>
#> 6 N N TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 7 estim~ esti~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 8 std.e~ std.~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 9 stati~ stat~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 10 conf.~ conf~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 11 conf.~ conf~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 12 ci ci TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> 13 p.val~ p.va~ TRUE cent~ <NA> <NA> gt::md <NA> <NA>
#> # ... with 4 more variables: fmt_fun <list>, footnote_abbrev <chr>,
#> # footnote <chr>, spanning_header <chr>
The modify_header_internal()
is useful for assigning column headers. The function accepts a complete {gtsummary} object as its input, and returns an updated version where the column labels have been added to .$table_header
. The function also switches the default .$table_header$hide
from TRUE
to FALSE
, resulting in column with labels being printed.
All {gtsummary} objects are printed with print.gtsummary()
. But before a {gtsummary} object is printed, it is converted to a {gt} object using as_gt()
. This function takes the {gtsummary} object as its sole input, and uses the information in .$table_header
to construct a list of {gt} calls that will be executed on .$table_body
. After the {gtsummary} object is converted to {gt}, it is then printed as any other {gt} object.
In some cases, the package defaults to printing with knitr::kable()
utilizing the as_kable()
function.
While the actual print function is slightly more involved, it is basically this: