Sourcing Titles and Footnotes From External Files

Mike Stackhouse

2020-06-17

We recommend that before reviewing this vignette, you review the basics of pharmaRTF and the more advanced features.

Working pharmaRTF Into Your Process

In this vignette, we will take you through some final features of pharmaRTF to help tie together how this package can centralize titles and footnotes within your organization, and show you the tools you need to embed this into your own internal process.

Being able to specify everything directly in a program is useful. For adhoc tables or to quickly put together an output, keeping everything central and in one spot is nice. But in a mature process, companies typically have a strategy to centralize the assignment of titles and footnotes. Keeping all of your titles and footnotes in one place has many advantages. You can global replace changes that impact numerous outputs. You can keep separate repetitive text instead of repeating it for each table. Updates take place in one file instead of having to open each individual impacted program. In short, centralizing this process is highly desirable within an organization.

pharmaRTF has the tools you need to make this process simple, and this comes from the function titles_and_footnotes_from_df. This function allows you to import your titles and footnotes from a data frame. That data frame can either come directly from your R session, or you can build a custom reader function to import the data from an external file. So let’s take a look.

A Small Note

Some of the syntax we use throughout this document uses the %>% pipe operator. This is to display support for a tidyverse style of coding. You can find the %>% operator in the magittr package.

How It Works

The idea behind titles_and_footnotes_from_df is simply that there is a set of expectations of what the data looks like that flows into the function. What that data looks like within an external file is irrelevant, as long as it can be transformed to meet the needs of titles_and_footnotes_from_df on the way in. This is possible because of the reader parameter. The reader parameter is just some function that returns a data frame, and that data frame must be compliant with the required variables and types expected by titles_and_footnotes_from_df. Those variables, types, and requirements are as follows.

Variable Column.Types Required
type character Yes
text1 character Yes
text2 character No
align character No
bold logical No
italic logical No
font character No
font_size numeric No
index numeric No

These variables and types should all look familiar because they’re exactly what’s required for the object hf_line. In fact, titles_and_footnotes_from_df simply takes this data and applies the add_titles and add_footnotes function to the rtf_doc object to attach titles and footnotes.

type and text1 are the only required values and this is because titles_and_foonotes_from_df needs to know where to attach the text, and needs to have text to attach. For variables that aren’t required, they may be omitted from the data returned by the reader function and the default values from the hf_line object will be used.

Custom Reader Functions

The entire concept behind titles_and_footnotes_from_df is allowing the freedom of creating your own process. By allowing the user to pass in a custom reader function to the reader parameter, the source of given titles and footnotes can be anything the user wants. You could read from flat text, CSV, Excel, Access, any remote file that R can process and return a data frame meeting the above requirements is on the table.

With this in mind, let’s consider a simple example. Assume that I have the excel file titles.xlsx. This file looks as follows:

table_number index type text1 text2 align bold italic
14-2.01 1 title Protocol: CDISCPILOT01 PAGE_FORMAT: Page %s of %s split TRUE TRUE
14-2.01 2 title Population: Intent-to-Treat NA left TRUE TRUE
14-2.01 3 title Table 14-2.01 NA center TRUE TRUE
14-2.01 4 title Summary of Demographic and Baseline Characteristics NA center TRUE TRUE
14-2.01 1 footnote [1] P-values are results of ANOVA treatment group comparison for continuous variable and Pearson’s chisquare test for categorical variables. NA left FALSE TRUE
14-2.01 2 footnote NOTE: Duration of disease is computed as months between date of enrollment and date of onset of the first definite symptoms of Alzheimer’s disease. NA left FALSE TRUE
14-2.01 3 footnote FILE_PATH: Source: %s DATE_FORMAT: %H:%M %A, %B %d, %Y split FALSE TRUE
14-4.01 1 title Protocol: CDISCPILOT01 PAGE_FORMAT: Page %s of %s split TRUE TRUE
14-4.01 2 title Population: Safety NA left TRUE TRUE
14-4.01 3 title Table 14-4.01 NA center TRUE TRUE
14-4.01 4 title Summary of Planned Exposure to Study Drug, as of End of Study NA center TRUE TRUE
14-4.01 1 footnote [1] Includes completers and early terminators. NA left FALSE TRUE
14-4.01 2 footnote [2] End of Study refers to week 26/Early Termination. NA left FALSE TRUE
14-4.01 3 footnote FILE_PATH: Source: %s DATE_FORMAT: %H:%M %A, %B %d, %Y split FALSE TRUE

(For more information about the special text fields you see here, see our advanced usage vignette)

Here I have everything I need to generate my titles. I just need to make sure of two things:

So overall this will be a pretty simple function. Let’s create it.

example_custom_reader <- function(..., table_number=NULL) {

  # If a column isn't populated then the type may be guessed wrong so force it
  col_types <- c('text', 'numeric', 'text', 'text', 'text', 'text', 'logical', 'logical')
  # pass through arguments from ...
  df <- readxl::read_excel(..., col_types=col_types)

  # Subset and return that dataframe
  df[df$table_number==table_number, !names(df) == 'table_number']
}

Within a few steps I have everything I need:

Note that here I use the ... parameter. This is because

  1. the from.file parameter is passed forward into the reader function
  2. the ... option on titles_and_foonotes_from_df is also passed forward, allowing you to pass whatever arguments necessary from the titles_and_footnotes_from_df call into your custom reader.

With this function now available, let’s see what the function returns.

example_custom_reader('titles.xlsx', table_number = '14-2.01')
#> # A tibble: 7 x 7
#>   index type    text1                          text2          align bold  italic
#>   <dbl> <chr>   <chr>                          <chr>          <chr> <lgl> <lgl> 
#> 1     1 title   Protocol: CDISCPILOT01         PAGE_FORMAT: … split TRUE  TRUE  
#> 2     2 title   Population: Intent-to-Treat    <NA>           left  TRUE  TRUE  
#> 3     3 title   Table 14-2.01                  <NA>           cent… TRUE  TRUE  
#> 4     4 title   Summary of Demographic and Ba… <NA>           cent… TRUE  TRUE  
#> 5     1 footno… [1] P-values are results of A… <NA>           left  FALSE TRUE  
#> 6     2 footno… NOTE: Duration of disease is … <NA>           left  FALSE TRUE  
#> 7     3 footno… FILE_PATH: Source: %s          DATE_FORMAT: … split FALSE TRUE

Here can see the data frame that the function processes and returns. In this example, the data processing was very simple - but in a more advanced case this function could read from multiple sheets, merge in default titles and footnotes, read from different data sources - whatever suits your process.

Putting It Together

Now with an understanding of how a custom reader function works, let’s tie it all together. Let’s assume that our table has been programmed, and that the huxtable table is ready in our session (see the huxtable tips vignette to for tips on how to prepare your table). The huxtable table is our session looks like this:

rowlbl1rowlbl20_C0_S
Completers at Week 24Safety Population [1]
Placebo\line (N=60)Xanomeline\line Low Dose\line (N=28)Xanomeline\line High Dose\line (N=30)Placebo\line (N=86)Xanomeline\line Low Dose\line (N=84)Xanomeline\line High Dose\line (N=84)
Average daily dose (mg)n 60 28 30 86 84 84
Mean 0.0 54.0 77.0 0.0 54.0 71.6
SD 0.00 0.00 0.58 0.00 0.00 8.11
Median 0.0 54.0 76.9 0.0 54.0 75.1
Min 0.0 54.0 76.1 0.0 54.0 54.0
Max 0.0 54.0 78.6 0.0 54.0 78.6
Cumulative dose at end of study [2]n 60 28 30 86 84 84
Mean 0.0 9918.6 14089.5 0.0 5347.3 7551.0
SD 0.00 603.84 481.01 0.00 3680.35 5531.04
Median 0.0 9936.0 14080.5 0.0 4455.0 5778.0
Min 0.0 7884.0 12960.0 0.0 108.0 54.0
Max 0.0 11448.0 15417.0 0.0 11448.0 15417.0

Note: The data here comes from the CDISC Pilot, which you can read about here. Also, see our replication of the tables from this CDISC pilot in our Github repository.

Here’s how you would go about creating your rtf_doc object. (For more information about the functions being used here, see our advanced usage vignette)

doc <- rtf_doc(ht, header_rows = 2) %>% titles_and_footnotes_from_df(
  from.file='../data/titles.xlsx',
  reader=example_custom_reader,
  table_number='14-4.01') %>%
  set_column_header_buffer(top=1) %>%
  set_font_size(10)

write_rtf(doc, file="table16.rtf")

Within the titles_and_footnotes_from_df function you can see the parameters used:

In the function call, from.file and table_number are passed forward into the custom reader. From there, validation of the incoming data frame happens, any missing variables have their defaults applied, and then add_titles and add_footnotes are called on the titles and footnotes within the data respectively. You can see in the resulting table that the titles and footnotes have been applied

And there you have it. By using this process, the addition of titles and footnotes to your rtf_doc greatly simplifies, and this code becomes very reusable from program to program, allowing you to focus on the nuance of your table creation rather than the redundant application of creating titles and footnotes. As mentioned, it’s necessary to be able to manually specify your titles and footnotes as needed - but automating the process of applying them is much more desirable when your studies reach a certain scale.

Further Reading

If you have not already reviewed them, be sure to check out our other vignettes