Descriptives statistics for Table 1

Max Gordon

2020-07-03

The purpose of the first table in a medical paper is most often to describe your population. In an RCT the table frequently compares the baseline characteristics between the randomized groups, while an observational study will often compare exposed with unexposed. In this vignette I will show how I use the functions to quickly generate a descriptive table.

We will use the mtcars dataset and compare the groups with automatic transmission to those without. First we will start by loading the dataset and labeling.

data(mtcars)
# For labeling we use the label()
# function from the Hmisc package
library(Hmisc)

# Style all the table output using htmlTable's theme handler
htmlTable::setHtmlTableTheme(css.rgroup = "")

label(mtcars$mpg) <- "Gas"
units(mtcars$mpg) <- "Miles/(US) gallon"

label(mtcars$wt) <- "Weight"
units(mtcars$wt) <- "10<sup>3</sup> kg" # not sure the unit is correct

mtcars$am <- factor(mtcars$am, levels = 0:1, labels = c("Automatic", "Manual"))
label(mtcars$am) <- "Transmission"

mtcars$gear <- factor(mtcars$gear)
label(mtcars$gear) <- "Gears"

# Make up some data for making it slightly more interesting
mtcars$col <- factor(sample(c("red", "black", "silver"),
                     size = NROW(mtcars), replace = TRUE))
label(mtcars$col) <- "Car color"

The basics of getDescriptionStatsBy

The function getDescriptionStatsBy is a simple way to do basic descriptive statistics. Mandatory inputs are the first two, x and by:

library(Gmisc)
## Loading required package: Rcpp
## Loading required package: htmlTable
getDescriptionStatsBy(x = mtcars$mpg, 
                      by = mtcars$am)
## Gas 
##     Automatic            Manual              
## Gas "17.1 (&plusmn;3.8)" "24.4 (&plusmn;6.2)"
## attr(,"class")
## [1] "matrix" "array"

If we prefer median we can simply specify the statistic used with continuous variables:

getDescriptionStatsBy(x = mtcars$mpg, 
                      by = mtcars$am,
                      continuous_fn = describeMedian)
## Gas 
##     Automatic            Manual              
## Gas "17.3 (14.9 - 19.2)" "22.8 (21.0 - 30.4)"
## attr(,"class")
## [1] "matrix" "array"

As I always have the same by-variable and have several customization, I often wire a small wrapper:

getTable1Stats <- function(x, digits = 0, ...){
  getDescriptionStatsBy(x = x, 
                        by = mtcars$am,
                        digits = digits,
                        continuous_fn = describeMedian,
                        header_count = TRUE,
                        ...)
  
}
getTable1Stats(mtcars$mpg)
## Gas 
##     Automatic<br />\n No. 19 Manual<br />\n No. 13
## Gas "17 (15 - 19)"           "23 (21 - 30)"       
## attr(,"class")
## [1] "matrix" "array"

The dot-argument (…) is useful for adding additional customization:

getTable1Stats(mtcars$mpg, use_units = TRUE)
## Gas 
##     Automatic<br />\n No. 19 Manual<br />\n No. 13 units              
## Gas "17 (15 - 19)"           "23 (21 - 30)"        "Miles/(US) gallon"
## attr(,"class")
## [1] "matrix" "array"

Magic with mergeDesc

In order to quickly assemble a “Table 1” I’ve generated a helper function mergeDesc that closely integrates with the htmlTable:

t1 <- list()
t1[["Gas"]] <-
  getTable1Stats(mtcars$mpg)
  
t1[["Weight&dagger;"]] <-
  getTable1Stats(mtcars$wt)

t1[["Color"]] <- 
  getTable1Stats(mtcars$col)

# If we want to use the labels set in the beginning
# we add an element without a name
t1 <- c(t1,
        list(getTable1Stats(mtcars$gear)))

mergeDesc(t1,
          htmlTable_args = list(caption  = "Basic descriptive statistics from the mtcars dataset",
                                tfoot = "&dagger; The weight is in 10<sup>3</sup> kg"))
Basic descriptive statistics from the mtcars dataset
Automatic
No. 19
Manual
No. 13
Gas 17 (15 - 19) 23 (21 - 30)
Weight† 4 (3 - 4) 2 (2 - 3)
Color
  black 5 (26%) 7 (54%)
  red 6 (32%) 4 (31%)
  silver 8 (42%) 2 (15%)
Gears
  3 15 (79%) 0 (0%)
  4 4 (21%) 8 (62%)
  5 0 (0%) 5 (38%)
† The weight is in 103 kg

I like generating lists but it is optional, you can achieve the same through:

mergeDesc(getTable1Stats(mtcars$mpg),
          `Weight&dagger;` = getTable1Stats(mtcars$wt),
          Color = getTable1Stats(mtcars$col),
          getTable1Stats(mtcars$gear),
          htmlTable_args = list(caption  = "Basic descriptive statistics from the mtcars dataset",
                                tfoot = "&dagger; The weight is in 10<sup>3</sup> kg"))
Basic descriptive statistics from the mtcars dataset
Automatic
No. 19
Manual
No. 13
Gas 17 (15 - 19) 23 (21 - 30)
Weight† 4 (3 - 4) 2 (2 - 3)
Color
  black 5 (26%) 7 (54%)
  red 6 (32%) 4 (31%)
  silver 8 (42%) 2 (15%)
Gears
  3 15 (79%) 0 (0%)
  4 4 (21%) 8 (62%)
  5 0 (0%) 5 (38%)
† The weight is in 103 kg

P-values

Event though p-values are discouraged in the Table 1, they are not uncommon. I have therefore added basic statistics consisting that defaults to Fisher’s exact test for proportions and Wilcoxon rank sum test for continuous values.

getTable1Stats <- function(x, digits = 0, ...){
  getDescriptionStatsBy(x = x, 
                        by = mtcars$am,
                        digits = digits,
                        continuous_fn = describeMedian,
                        header_count = TRUE,
                        statistics = TRUE,
                        ...)
  
}

t1 <- list()
t1[["Gas"]] <-
  getTable1Stats(mtcars$mpg)
  
t1[["Weight&dagger;"]] <-
  getTable1Stats(mtcars$wt)

t1[["Color"]] <- 
  getTable1Stats(mtcars$col)

library(magrittr)
mergeDesc(t1,
          getTable1Stats(mtcars$gear)) %>%
  htmlTable(caption  = "Basic descriptive statistics from the mtcars dataset",
            tfoot = "&dagger; The weight is in 10<sup>3</sup> kg")
Basic descriptive statistics from the mtcars dataset
Automatic
No. 19
Manual
No. 13
P-value
Gas 17 (15 - 19) 23 (21 - 30) 0.002
Weight† 4 (3 - 4) 2 (2 - 3) < 0.0001
Color 0.22
  black 5 (26%) 7 (54%)
  red 6 (32%) 4 (31%)
  silver 8 (42%) 2 (15%)
Gears < 0.0001
  3 15 (79%) 0 (0%)
  4 4 (21%) 8 (62%)
  5 0 (0%) 5 (38%)
† The weight is in 103 kg

Custom p-values

By popular demand I’ve expanded with the option of having custom p-values. All you need to do is to provide a function that takes two values and exports a single p-value. There are several prepared functions that you can use or use as a template for your own p-value function. They all start with getPval.., e.g. getPvalKruskal. You can either provide a single function or you can set the defaults depending on the variable type:

getTable1Stats <- function(x, digits = 0, ...){
  getDescriptionStatsBy(x = x, 
                        by = mtcars$am,
                        digits = digits,
                        continuous_fn = describeMedian,
                        header_count = TRUE,
                        statistics = list(continuous = getPvalChiSq, 
                                          factor = getPvalChiSq, 
                                          proportion = getPvalFisher),
                        ...)
  
}

t1 <- list()
t1[["Gas"]] <-
  getTable1Stats(mtcars$mpg)
  
t1[["Weight&dagger;"]] <-
  getTable1Stats(mtcars$wt)

t1[["Color"]] <- 
  getTable1Stats(mtcars$col)

mergeDesc(t1,
          getTable1Stats(mtcars$gear)) %>%
  htmlTable(caption  = "P-values generated from a custom set of values",
            tfoot = "&dagger; The weight is in 10<sup>3</sup> kg")
P-values generated from a custom set of values
Automatic
No. 19
Manual
No. 13
P-value
Gas 17 (15 - 19) 23 (21 - 30) 0.27
Weight† 4 (3 - 4) 2 (2 - 3) 0.37
Color 0.19
  black 5 (26%) 7 (54%)
  red 6 (32%) 4 (31%)
  silver 8 (42%) 2 (15%)
Gears < 0.0001
  3 15 (79%) 0 (0%)
  4 4 (21%) 8 (62%)
  5 0 (0%) 5 (38%)
† The weight is in 103 kg