Formatted Summary Statistics and Data Summary Tables with qwraps2

Peter DeWitt

2019-12-02

1 Introduction

It is common for a manuscript to require a data summary table. The table might include simple summary statistics for the whole sample and for subgroups. There are several tools available to build such tables. In my opinion, though, most of those tools have nuances imposed by the creators/authors such that other users need not only understand the tool, but also think like the authors. I wrote this package to be as flexible and general as possible. I hope you like these tools and will be able to use them in your work.

This vignette presents the use of the summary_table, qsummary, and qable functions for quickly building data summary tables. These functions implicitly use the mean_sd, median_iqr, and n_perc0 functions from qwraps2 as well.

1.1 Prerequisites Example Data Set

We will use a modified version of the mtcars data set for examples throughout this vignette. The following packages are required to run the code in this vignette and to construct the mtcars2 data.frame.

The mtcars2 data frame will have three versions of the cyl vector: the original numeric values in cyl, a character version, and a factor version.

set.seed(42)
library(magrittr)
library(qwraps2)

# define the markup language we are working in.
# options(qwraps2_markup = "latex") is also supported.
options(qwraps2_markup = "markdown")

data(mtcars)

mtcars2 <-
  dplyr::mutate(mtcars,
                cyl_factor = factor(cyl,
                                    levels = c(6, 4, 8),
                                    labels = paste(c(6, 4, 8), "cylinders")),
                cyl_character = paste(cyl, "cylinders"))

str(mtcars2)
## 'data.frame':    32 obs. of  13 variables:
##  $ mpg          : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl          : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp         : num  160 160 108 258 360 ...
##  $ hp           : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat         : num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt           : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec         : num  16.5 17 18.6 19.4 17 ...
##  $ vs           : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am           : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear         : num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb         : num  4 4 1 1 2 1 4 2 2 4 ...
##  $ cyl_factor   : Factor w/ 3 levels "6 cylinders",..: 1 1 2 1 3 1 3 2 2 1 ...
##  $ cyl_character: chr  "6 cylinders" "6 cylinders" "4 cylinders" "6 cylinders" ...

Notice that the construction of the cyl_factor and cyl_character vectors was done such that the coercion of cyl_character to a factor will not be the same as the cyl_factor vector; the levels are in a different order.

with(mtcars2, table(cyl_factor, cyl_character))
##              cyl_character
## cyl_factor    4 cylinders 6 cylinders 8 cylinders
##   6 cylinders           0           7           0
##   4 cylinders          11           0           0
##   8 cylinders           0           0          14
with(mtcars2, all.equal(factor(cyl_character), cyl_factor))
## [1] "Attributes: < Component \"levels\": 2 string mismatches >"

2 Review of Summary Statistic Functions and Formatting

2.1 Means and Standard Deviations

mean_sd will return the (arithmetic) mean and standard deviation for numeric vector. For example, mean_sd(mtcars2$mpg) will return the formatted string.

mean_sd(mtcars2$mpg)
## [1] "20.09 &plusmn; 6.03"
mean_sd(mtcars2$mpg, denote_sd = "paren")
## [1] "20.09 (6.03)"

The default setting for mean_sd is to return the mean ± sd. In a table this default is helpful because the default table formatting for counts and percentages is n (%).

mean_sd and other functions are helpful for in-line text too:

The 32 vehicles in the `mtcars` data set had an average fuel
economy of 20.09 &plusmn; 6.03 miles per gallon.

produces

The 32 vehicles in the mtcars data set had an average fuel economy of 20.09 ± 6.03 miles per gallon.

2.2 Mean and Confidence intervals

If you need the mean and a confidence interval there is mean_ci. mean_ci returns a qwraps2_mean_ci object which is a named vector with the mean, lower confidence limit, and the upper confidence limit. The printing method for qwraps2_mean_ci objects is a call to the frmtci function. You an modify the formatting of printed result by adjusting the arguments pasted to frmtci.

mci <- mean_ci(mtcars2$mpg)
mci
## [1] "20.09 (18.00, 22.18)"
print(mci, show_level = TRUE)
## [1] "20.09 (95% CI: 18.00, 22.18)"

2.3 Median and Inner Quartile Range

Similar to the mean_sd function, the median_iqr returns the median and the inner quartile range (IQR) of a data vector.

median_iqr(mtcars2$mpg)
## [1] "19.20 (15.43, 22.80)"

2.4 Count and Percentages

The n_perc function is the workhorse, but n_perc0 is also provided for ease of use in the same way that base R has paste and paste0. n_perc returns the n (%) with the percentage sign in the string, n_perc0 omits the percentage sign from the string. The latter is good for tables, the former for in-line text.

n_perc(mtcars2$cyl == 4)
## [1] "11 (34.38%)"
n_perc0(mtcars2$cyl == 4)
## [1] "11 (34)"

n_perc(mtcars2$cyl_factor == 4)  # this returns 0 (0.00%)
## [1] "0 (0.00%)"
n_perc(mtcars2$cyl_factor == "4 cylinders")
## [1] "11 (34.38%)"
n_perc(mtcars2$cyl_factor == levels(mtcars2$cyl_factor)[2])
## [1] "11 (34.38%)"

# The count and percentage of 4 or 6 cylinders vehicles in the data set is
n_perc(mtcars2$cyl %in% c(4, 6))
## [1] "18 (56.25%)"

2.5 Geometric Means and Standard Deviations

Let \(\left\{x_1, x_2, x_3, \ldots, x_n \right\}\) be a sample of size \(n\) with \(x_i > 0\) for all \(i.\) Then the geometric mean, \(\mu_g,\) and geometric standard deviation are

\[ \begin{equation} \mu_g = \left( \prod_{i = 1}^{n} x_i \right)^{\frac{1}{n}} = b^{ \sum_{i = 1}^{n} \log_{b} x_i }, \end{equation} \] and \[ \begin{equation} \sigma_g = b ^ { \sqrt{ \frac{\sum_{i = 1}^{n} \left( \log_{b} \frac{x_i}{\mu_g} \right)^2}{n} } } \end{equation} \] or, for clarity, \[ \begin{equation} \log_{b} \sigma_g = \sqrt{ \frac{\sum_{i = 1}^{n} \left( \log_{b} \frac{x_i}{\mu_g} \right)^2}{n}} \end{equation} \]

When looking for the geometric standard deviation in R, the simple exp(sd(log(x))) is not exactly correct. The geometric standard deviation uses \(n,\) the full sample size, in the denominator, where as the sd and var functions in R use the denominator \(n - 1.\) To get the geometric standard deviation one should adjust the result by multiplying the variance by \((n - 1) / n\) or the standard deviation by \(\sqrt{(n - 1) / n}.\) See the example below.

x <- runif(6, min = 4, max = 70)

# geometric mean
mu_g <- prod(x) ** (1 / length(x))
mu_g
## [1] 46.50714
exp(mean(log(x)))
## [1] 46.50714
1.2 ** mean(log(x, base = 1.2))
## [1] 46.50714

# geometric standard deviation
exp(sd(log(x)))  ## This is wrong
## [1] 1.500247

# these equations are correct
sigma_g <- exp(sqrt(sum(log(x / mu_g) ** 2) / length(x)))
sigma_g
## [1] 1.448151

exp(sqrt((length(x) - 1) / length(x)) * sd(log(x)))
## [1] 1.448151

The functions gmean, gvar, and gsd in the package, provide the geometric mean, variance, and standard deviation for a sample.

gmean(x)
## [1] 46.50714
all.equal(gmean(x), mu_g)
## [1] TRUE

gvar(x)
## [1] 1.146958
all.equal(gvar(x), sigma_g^2)  # This is supposed to be FALSE
## [1] "Mean relative difference: 0.8284385"
all.equal(gvar(x), exp(log(sigma_g)^2))
## [1] TRUE

gsd(x)
## [1] 1.448151
all.equal(gsd(x), sigma_g)
## [1] TRUE

gmean_sd will provide a quick way for reporting the geometric mean and geometric standard deviation in the same way that mean_sd does for the arithmetic mean and arithmetic standard deviation:

gmean_sd(x)
## [1] "46.51 &plusmn; 1.45"

3 Building a Data Summary Table

Objective: build a table reporting summary statistics for some of the variables in the mtcars2 data.frame overall and within subgroups. We’ll start with something very simple and build up to something bigger.

Let’s report the min, max, and mean (sd) for continuous variables and n (%) for categorical variables. We will report mpg, disp, wt, and gear overall and by number of cylinders.

The function summary_table, along with some dplyr functions will do the work for us. summary_table takes two arguments:

  1. x a (grouped_df) data.frame.
  2. summaries a list of summaries. This is a list-of-lists. The outer list defines the row groups and the inner lists define the specif summaries. The default is generated by the qsummary function.
args(summary_table)
## function (x, summaries = qsummary(x)) 
## NULL

Let’s build a list-of-lists to pass to the summaries argument of summary_table. Additional examples and tools for building the list-of-lists are given in the following section. The immediate example is provided to demonstrate how to use the summary_table method.

The inner lists are named formulae defining the wanted summary. These formulae are passed through dplyr::summarize to generate the table. The names are important, as they are used to label row groups and row names in the table. The arguemnt for the functions below use the .data pronoun for tidy evaluation (see help(topic = ".data", package = "rlang")). The use of this pronoun is not mandatory, however, the use of the pronoun is strongly encouraged.

our_summary1 <-
  list("Miles Per Gallon" =
       list("min" = ~ min(.data$mpg),
            "max" = ~ max(.data$mpg),
            "mean (sd)" = ~ qwraps2::mean_sd(.data$mpg)),
       "Displacement" =
       list("min" = ~ min(.data$disp),
            "median" = ~ median(.data$disp),
            "max" = ~ max(.data$disp),
            "mean (sd)" = ~ qwraps2::mean_sd(.data$disp)),
       "Weight (1000 lbs)" =
       list("min" = ~ min(.data$wt),
            "max" = ~ max(.data$wt),
            "mean (sd)" = ~ qwraps2::mean_sd(.data$wt)),
       "Forward Gears" =
       list("Three" = ~ qwraps2::n_perc0(.data$gear == 3),
            "Four"  = ~ qwraps2::n_perc0(.data$gear == 4),
            "Five"  = ~ qwraps2::n_perc0(.data$gear == 5))
       )

Building the table is done with a call to summary_table:

### Overall
whole <- summary_table(mtcars2, our_summary1)
whole
mtcars2 (N = 32)
Miles Per Gallon   
   min 10.4
   max 33.9
   mean (sd) 20.09 ± 6.03
Displacement   
   min 71.1
   median 196.3
   max 472
   mean (sd) 230.72 ± 123.94
Weight (1000 lbs)   
   min 1.513
   max 5.424
   mean (sd) 3.22 ± 0.98
Forward Gears   
   Three 15 (47)
   Four 12 (38)
   Five 5 (16)

The summary_table will work with grouped data frames too.

### By number of Cylinders
by_cyl <- summary_table(dplyr::group_by(mtcars2, cyl_factor), our_summary1)
by_cyl
cyl_factor: 6 cylinders (N = 7) cyl_factor: 4 cylinders (N = 11) cyl_factor: 8 cylinders (N = 14)
Miles Per Gallon         
   min 17.8 21.4 10.4
   max 21.4 33.9 19.2
   mean (sd) 19.74 ± 1.45 26.66 ± 4.51 15.10 ± 2.56
Displacement         
   min 145.0 71.1 275.8
   median 167.6 108.0 350.5
   max 258.0 146.7 472.0
   mean (sd) 183.31 ± 41.56 105.14 ± 26.87 353.10 ± 67.77
Weight (1000 lbs)         
   min 2.620 1.513 3.170
   max 3.460 3.190 5.424
   mean (sd) 3.12 ± 0.36 2.29 ± 0.57 4.00 ± 0.76
Forward Gears         
   Three 2 (29) 1 (9) 12 (86)
   Four 4 (57) 8 (73) 0 (0)
   Five 1 (14) 2 (18) 2 (14)

To report a table with both the whole sample summary and conditional columns together:

both <- cbind(whole, by_cyl)
both
mtcars2 (N = 32) cyl_factor: 6 cylinders (N = 7) cyl_factor: 4 cylinders (N = 11) cyl_factor: 8 cylinders (N = 14)
Miles Per Gallon            
   min 10.4 17.8 21.4 10.4
   max 33.9 21.4 33.9 19.2
   mean (sd) 20.09 ± 6.03 19.74 ± 1.45 26.66 ± 4.51 15.10 ± 2.56
Displacement            
   min 71.1 145.0 71.1 275.8
   median 196.3 167.6 108.0 350.5
   max 472 258.0 146.7 472.0
   mean (sd) 230.72 ± 123.94 183.31 ± 41.56 105.14 ± 26.87 353.10 ± 67.77
Weight (1000 lbs)            
   min 1.513 2.620 1.513 3.170
   max 5.424 3.460 3.190 5.424
   mean (sd) 3.22 ± 0.98 3.12 ± 0.36 2.29 ± 0.57 4.00 ± 0.76
Forward Gears            
   Three 15 (47) 2 (29) 1 (9) 12 (86)
   Four 12 (38) 4 (57) 8 (73) 0 (0)
   Five 5 (16) 1 (14) 2 (18) 2 (14)

If you want to change the column names, do so via the cnames argument to qable via the print method for qwraps2_summary_table objects. Any argument that you want to send to qable can be sent there when explicitly using the print method for qwraps2_summary_table objects.

print(both,
      rtitle = "Summary Statistics",
      cnames = c("Col 0", "Col 1", "Col 2", "Col 3"))
Summary Statistics Col 0 Col 1 Col 2 Col 3
Miles Per Gallon            
   min 10.4 17.8 21.4 10.4
   max 33.9 21.4 33.9 19.2
   mean (sd) 20.09 ± 6.03 19.74 ± 1.45 26.66 ± 4.51 15.10 ± 2.56
Displacement            
   min 71.1 145.0 71.1 275.8
   median 196.3 167.6 108.0 350.5
   max 472 258.0 146.7 472.0
   mean (sd) 230.72 ± 123.94 183.31 ± 41.56 105.14 ± 26.87 353.10 ± 67.77
Weight (1000 lbs)            
   min 1.513 2.620 1.513 3.170
   max 5.424 3.460 3.190 5.424
   mean (sd) 3.22 ± 0.98 3.12 ± 0.36 2.29 ± 0.57 4.00 ± 0.76
Forward Gears            
   Three 15 (47) 2 (29) 1 (9) 12 (86)
   Four 12 (38) 4 (57) 8 (73) 0 (0)
   Five 5 (16) 1 (14) 2 (18) 2 (14)

3.1 Easy building of the summaries

The task of building the summaries list-of-lists can be tedious. qsummary is designed to make it easier. qsummary will use a set of predefined functions to summarize numeric columns of a data.frame, a set of arguments to pass to qwraps2::n_perc for categorical (character and factors) variables.

By default, calling summary_table will use the default summary metrics defined by qsummary. The purpose of qsummary is to provide the same summary for all numeric variables within a data.frame and a single style of summary for categorical variables within the data.frame. For example, the default summary for a set of variables from the the mtcars2 data set is

mtcars2 %>%
  dplyr::select(.data$mpg, .data$cyl_factor, .data$wt) %>%
  qsummary(.)
## $mpg
## $mpg$minimum
## ~qwraps2::frmt(min(.data[["mpg"]]))
## <environment: 0x555ad3e3a688>
## 
## $mpg$`median (IQR)`
## ~qwraps2::median_iqr(.data[["mpg"]])
## <environment: 0x555ad3e3a688>
## 
## $mpg$`mean (sd)`
## ~qwraps2::mean_sd(.data[["mpg"]])
## <environment: 0x555ad3e3a688>
## 
## $mpg$maximum
## ~qwraps2::frmt(max(.data[["mpg"]]))
## <environment: 0x555ad3e3a688>
## 
## 
## $cyl_factor
## $cyl_factor$`6 cylinders`
## ~qwraps2::n_perc(.data[["cyl_factor"]] == "6 cylinders", digits = 0, 
##     show_symbol = FALSE)
## <environment: 0x555ad3e3a688>
## 
## $cyl_factor$`4 cylinders`
## ~qwraps2::n_perc(.data[["cyl_factor"]] == "4 cylinders", digits = 0, 
##     show_symbol = FALSE)
## <environment: 0x555ad3e3a688>
## 
## $cyl_factor$`8 cylinders`
## ~qwraps2::n_perc(.data[["cyl_factor"]] == "8 cylinders", digits = 0, 
##     show_symbol = FALSE)
## <environment: 0x555ad3e3a688>
## 
## 
## $wt
## $wt$minimum
## ~qwraps2::frmt(min(.data[["wt"]]))
## <environment: 0x555ad3e3a688>
## 
## $wt$`median (IQR)`
## ~qwraps2::median_iqr(.data[["wt"]])
## <environment: 0x555ad3e3a688>
## 
## $wt$`mean (sd)`
## ~qwraps2::mean_sd(.data[["wt"]])
## <environment: 0x555ad3e3a688>
## 
## $wt$maximum
## ~qwraps2::frmt(max(.data[["wt"]]))
## <environment: 0x555ad3e3a688>

That default summary is used for a table as follows:

mtcars2 %>%
  dplyr::select(.data$mpg, .data$cyl_factor, .data$wt) %>%
  summary_table(.)
. (N = 32)
mpg   
   minimum 10.40
   median (IQR) 19.20 (15.43, 22.80)
   mean (sd) 20.09 ± 6.03
   maximum 33.90
cyl_factor   
   6 cylinders 7 (22)
   4 cylinders 11 (34)
   8 cylinders 14 (44)
wt   
   minimum 1.51
   median (IQR) 3.33 (2.58, 3.61)
   mean (sd) 3.22 ± 0.98
   maximum 5.42

Now, say we want to only report the minimum and maximum for each of the numeric variables and for the categorical variables we want two show the denominator for each category and for the percentage, to one digit with the percent symbol in the table. Note that when defining the list of numeric_summaries that the argument place holder is the %s character.

new_summary <-
  mtcars2 %>%
  dplyr::select(.data$mpg, .data$cyl_factor, .data$wt) %>%
  qsummary(.,
           numeric_summaries = list("Minimum" = "~ min(%s)",
                                    "Maximum" = "~ max(%s)"),
           n_perc_args = list(digits = 1, show_symbol = TRUE, show_denom = "always"))

The resulting table is:

summary_table(mtcars2, new_summary)
mtcars2 (N = 32)
mpg   
   Minimum 10.4
   Maximum 33.9
cyl_factor   
   6 cylinders 7/32 (21.9%)
   4 cylinders 11/32 (34.4%)
   8 cylinders 14/32 (43.8%)
wt   
   Minimum 1.513
   Maximum 5.424

The summary can easily be used on a grouped data.frame.

mtcars2 %>%
  dplyr::group_by(.data$am) %>%
  summary_table(., new_summary)
am: 0 (N = 19) am: 1 (N = 13)
mpg      
   Minimum 10.4 15
   Maximum 24.4 33.9
cyl_factor      
   6 cylinders 4/19 (21.1%) 3/13 (23.1%)
   4 cylinders 3/19 (15.8%) 8/13 (61.5%)
   8 cylinders 12/19 (63.2%) 2/13 (15.4%)
wt      
   Minimum 2.465 1.513
   Maximum 5.424 3.57

3.2 Adding P-values to a Summary Table

There are many, many different ways to format data summary tables. Adding p-values to a table is just one thing that can be done in more than one way. For example, if a row group reports the counts and percentages for each level of a categorical variable across multiple (column) groups, then I would argue that the p-value resulting from a chi square test or a Fisher exact test would be best placed on the line of the table labeling the row group. However, say we reported the minimum, median, mean, and maximum with in a row group for one variable. The p-value from a t-test, or other meaningful test, for the difference in mean I would suggest should be reported on the line of the summary table for the mean, not the row group itself.

With so many possibilities I have reserved construction of a p-value column to be ad hoc. Perhaps an additional column wouldn’t be used and the p-values are edited into row group labels, for example.

If you want to add a p-value column to a qwraps2_summary_table object you can with some degree of ease. Note that qwraps2_summary_table objects are just character matrices.

both %>% str
##  'qwraps2_summary_table' chr [1:13, 1:4] "10.4" "33.9" ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:13] "min" "max" "mean (sd)" "min" ...
##   ..$ : chr [1:4] "mtcars2 (N = 32)" "cyl_factor: 6 cylinders (N = 7)" "cyl_factor: 4 cylinders (N = 11)" "cyl_factor: 8 cylinders (N = 14)"
##  - attr(*, "rgroups")= Named int [1:4] 3 4 3 3
##   ..- attr(*, "names")= chr [1:4] "Miles Per Gallon" "Displacement" "Weight (1000 lbs)" "Forward Gears"

Let’s added p-values for testing the difference in the mean between the three cylinder groups.

# difference in means
mpvals <-
  list(lm(mpg ~ cyl_factor,  data = mtcars2),
       lm(disp ~ cyl_factor, data = mtcars2),
       lm(wt ~ cyl_factor,   data = mtcars2)) %>%
  lapply(aov) %>%
  lapply(summary) %>%
  lapply(function(x) x[[1]][["Pr(>F)"]][1]) %>%
  lapply(frmtp) %>%
  do.call(c, .)

# Fisher test
fpval <- frmtp(fisher.test(table(mtcars2$gear, mtcars2$cyl_factor))$p.value)

Adding the p-value column is done as follows:

both <- cbind(both, "P-value" = "")
both[grepl("mean \\(sd\\)", rownames(both)), "P-value"] <- mpvals
a <- capture.output(print(both))
a[grepl("Forward Gears", a)] %<>% sub("&nbsp;&nbsp;\\ \\|$", paste(fpval, "|"), .)

and the resulting table is:

cat(a, sep = "\n")
mtcars2 (N = 32) cyl_factor: 6 cylinders (N = 7) cyl_factor: 4 cylinders (N = 11) cyl_factor: 8 cylinders (N = 14) P-value
Miles Per Gallon               
   min 10.4 17.8 21.4 10.4
   max 33.9 21.4 33.9 19.2
   mean (sd) 20.09 ± 6.03 19.74 ± 1.45 26.66 ± 4.51 15.10 ± 2.56 P < 0.0001
Displacement               
   min 71.1 145.0 71.1 275.8
   median 196.3 167.6 108.0 350.5
   max 472 258.0 146.7 472.0
   mean (sd) 230.72 ± 123.94 183.31 ± 41.56 105.14 ± 26.87 353.10 ± 67.77 P < 0.0001
Weight (1000 lbs)               
   min 1.513 2.620 1.513 3.170
   max 5.424 3.460 3.190 5.424
   mean (sd) 3.22 ± 0.98 3.12 ± 0.36 2.29 ± 0.57 4.00 ± 0.76 P < 0.0001
Forward Gears             P < 0.0001
   Three 15 (47) 2 (29) 1 (9) 12 (86)
   Four 12 (38) 4 (57) 8 (73) 0 (0)
   Five 5 (16) 1 (14) 2 (18) 2 (14)

Another option you might consider is to have the p-value in the row group name. Consider the following construction. The p-values are added to the names of the row groups when building the summary table.

gear_summary <-
  list("Forward Gears" =
       list("Three" = ~ qwraps2::n_perc0(.data$gear == 3),
            "Four"  = ~ qwraps2::n_perc0(.data$gear == 4),
            "Five"  = ~ qwraps2::n_perc0(.data$gear == 5)),
       "Transmission" =
       list("Automatic" = ~ qwraps2::n_perc0(.data$am == 0),
            "Manual"  = ~ qwraps2::n_perc0(.data$am == 1))
       ) %>%
setNames(.,
         c(
         paste("Forward Gears: ", frmtp(fisher.test(xtabs( ~ gear + cyl_factor, data = mtcars2))$p.value)),
         paste("Transmission: ",  frmtp(fisher.test(xtabs( ~ am + cyl_factor, data = mtcars2))$p.value)))
         )

summary_table(dplyr::group_by(mtcars2, cyl_factor), gear_summary)
cyl_factor: 6 cylinders (N = 7) cyl_factor: 4 cylinders (N = 11) cyl_factor: 8 cylinders (N = 14)
Forward Gears: P < 0.0001         
   Three 2 (29) 1 (9) 12 (86)
   Four 4 (57) 8 (73) 0 (0)
   Five 1 (14) 2 (18) 2 (14)
Transmission: P = 0.0091         
   Automatic 4 (57) 3 (27) 12 (86)
   Manual 3 (43) 8 (73) 2 (14)

3.3 Using Variable Labels

Some data management paradigms will use attributes to keep a label associated with a variable in a data.frame. Notable examples are the Hmisc and sjPlot. If you associate a label with a variable in the data frame the that label will be used when building a summary table. This feature was suggested https://github.com/dewittpe/qwraps2/issues/74 and implemented thusly:

new_data_frame <-
  data.frame(age = c(18, 20, 24, 17, 43),
             edu = c(1, 3, 1, 5, 2),
             rt  = c(0.01, 0.04, 0.02, 0.10, 0.06))

# Set a label for the variables
attr(new_data_frame$age, "label") <- "Age in years"
attr(new_data_frame$rt,  "label") <- "Reaction time"

# mistakenly set the attribute to name instead of label
attr(new_data_frame$edu, "name") <- "Education"

When calling qsummary the provide labels for the age and rt variables will be used. Since the attribute “label” does not exist for the edu variable, edu will be used in the output.

qsummary(new_data_frame)
## $`Age in years`
## $`Age in years`$minimum
## ~qwraps2::frmt(min(.data[["age"]]))
## 
## $`Age in years`$`median (IQR)`
## ~qwraps2::median_iqr(.data[["age"]])
## 
## $`Age in years`$`mean (sd)`
## ~qwraps2::mean_sd(.data[["age"]])
## 
## $`Age in years`$maximum
## ~qwraps2::frmt(max(.data[["age"]]))
## 
## 
## $edu
## $edu$minimum
## ~qwraps2::frmt(min(.data[["edu"]]))
## 
## $edu$`median (IQR)`
## ~qwraps2::median_iqr(.data[["edu"]])
## 
## $edu$`mean (sd)`
## ~qwraps2::mean_sd(.data[["edu"]])
## 
## $edu$maximum
## ~qwraps2::frmt(max(.data[["edu"]]))
## 
## 
## $`Reaction time`
## $`Reaction time`$minimum
## ~qwraps2::frmt(min(.data[["rt"]]))
## 
## $`Reaction time`$`median (IQR)`
## ~qwraps2::median_iqr(.data[["rt"]])
## 
## $`Reaction time`$`mean (sd)`
## ~qwraps2::mean_sd(.data[["rt"]])
## 
## $`Reaction time`$maximum
## ~qwraps2::frmt(max(.data[["rt"]]))

This behavior is also seen with the summary_table call

summary_table(new_data_frame)
new_data_frame (N = 5)
Age in years   
   minimum 17.00
   median (IQR) 20.00 (18.00, 24.00)
   mean (sd) 24.40 ± 10.74
   maximum 43.00
edu   
   minimum 1.00
   median (IQR) 2.00 (1.00, 3.00)
   mean (sd) 2.40 ± 1.67
   maximum 5.00
Reaction time   
   minimum 0.01
   median (IQR) 0.04 (0.02, 0.06)
   mean (sd) 0.05 ± 0.04
   maximum 0.10

4 Session Info

print(sessionInfo(), local = FALSE)
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] qwraps2_0.4.2 magrittr_1.5 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.3       crayon_1.3.4     digest_0.6.23    dplyr_0.8.3     
##  [5] assertthat_0.2.1 R6_2.4.1         evaluate_0.14    highr_0.8       
##  [9] pillar_1.4.2     rlang_0.4.2      stringi_1.4.3    rmarkdown_1.18  
## [13] tools_3.6.1      stringr_1.4.0    glue_1.3.1       purrr_0.3.3     
## [17] xfun_0.11        yaml_2.2.0       compiler_3.6.1   pkgconfig_2.0.3 
## [21] htmltools_0.4.0  tidyselect_0.2.5 knitr_1.26       tibble_2.1.3