This document mainly contains examples using recommended styles for Rmarkdown documents. Available styles in summarytools are the same as pander
’s:
For freq()
, descr()
(and ctable()
, although with caveats), rmarkdown style is recommended. For dfSummary()
, grid is recommended.
Starting with freq()
, we’ll review the recommended methods and styles to quickly get satisfying results in Rmarkdown documents.
To see how this vignette is configured, see this section.
Jump to…
freq()
is best used with `style = ‘rmarkdown’; html rendering is also possible.
explicit NA's detected - temporarily setting 'report.nas' to FALSE
tobacco$gender
Type: Factor
Freq | % | % Cum. | |
---|---|---|---|
F | 489 | 48.90 | 48.90 |
M | 489 | 48.90 | 97.80 |
(Missing) | 22 | 2.20 | 100.00 |
Total | 1000 | 100.00 | 100.00 |
explicit NA's detected - temporarily setting 'report.nas' to FALSE
gender | Freq | % | % Cum. |
---|---|---|---|
F | 489 | 48.90 | 48.90 |
M | 489 | 48.90 | 97.80 |
(Missing) | 22 | 2.20 | 100.00 |
Total | 1000 | 100.00 | 100.00 |
If you find the table too large, you can use table.classes = 'st-small'
- an example is provided further below.
Tables with heading spanning over 2 rows are not fully supported in markdown (yet), but the result is getting close to acceptable. This, however, is not true for all themes. That is why the rendering method is preferred.
gender * smoker
Data Frame: tobacco
smoker | Yes | No | Total | |
gender | ||||
F | 147 (30.1%) | 342 (69.9%) | 489 (100.0%) | |
M | 143 (29.2%) | 346 (70.8%) | 489 (100.0%) | |
(Missing) | 8 (36.4%) | 14 (63.6%) | 22 (100.0%) | |
Total | 298 (29.8%) | 702 (70.2%) | 1000 (100.0%) |
For best results, use this method.
smoker | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
gender | Yes | No | Total | |||||||||
F | 147 | ( | 30.1% | ) | 342 | ( | 69.9% | ) | 489 | ( | 100.0% | ) |
M | 143 | ( | 29.2% | ) | 346 | ( | 70.8% | ) | 489 | ( | 100.0% | ) |
(Missing) | 8 | ( | 36.4% | ) | 14 | ( | 63.6% | ) | 22 | ( | 100.0% | ) |
Total | 298 | ( | 29.8% | ) | 702 | ( | 70.2% | ) | 1000 | ( | 100.0% | ) |
descr()
is also best used with style = 'rmarkdown'
, and HTML rendering is also supported.
Non-numerical variable(s) ignored: gender, age.gr, smoker, diseased, disease
tobacco
N: 1000
BMI | age | cigs.per.day | samp.wgts | |
---|---|---|---|---|
Mean | 25.73 | 49.60 | 6.78 | 1.00 |
Std.Dev | 4.49 | 18.29 | 11.88 | 0.08 |
Min | 8.83 | 18.00 | 0.00 | 0.86 |
Q1 | 22.93 | 34.00 | 0.00 | 0.86 |
Median | 25.62 | 50.00 | 0.00 | 1.04 |
Q3 | 28.65 | 66.00 | 11.00 | 1.05 |
Max | 39.44 | 80.00 | 40.00 | 1.06 |
MAD | 4.18 | 23.72 | 0.00 | 0.01 |
IQR | 5.72 | 32.00 | 11.00 | 0.19 |
CV | 0.17 | 0.37 | 1.75 | 0.08 |
Skewness | 0.02 | -0.04 | 1.54 | -1.04 |
SE.Skewness | 0.08 | 0.08 | 0.08 | 0.08 |
Kurtosis | 0.26 | -1.26 | 0.90 | -0.90 |
N.Valid | 974.00 | 975.00 | 965.00 | 1000.00 |
Pct.Valid | 97.40 | 97.50 | 96.50 | 100.00 |
We’ll use table.classes = ‘st-small’ to show how it affects the table’s size, compared to the freq()
table rendered earlier.
Non-numerical variable(s) ignored: gender, age.gr, smoker, diseased, disease
BMI | age | cigs.per.day | samp.wgts | |
---|---|---|---|---|
Mean | 25.73 | 49.60 | 6.78 | 1.00 |
Std.Dev | 4.49 | 18.29 | 11.88 | 0.08 |
Min | 8.83 | 18.00 | 0.00 | 0.86 |
Q1 | 22.93 | 34.00 | 0.00 | 0.86 |
Median | 25.62 | 50.00 | 0.00 | 1.04 |
Q3 | 28.65 | 66.00 | 11.00 | 1.05 |
Max | 39.44 | 80.00 | 40.00 | 1.06 |
MAD | 4.18 | 23.72 | 0.00 | 0.01 |
IQR | 5.72 | 32.00 | 11.00 | 0.19 |
CV | 0.17 | 0.37 | 1.75 | 0.08 |
Skewness | 0.02 | -0.04 | 1.54 | -1.04 |
SE.Skewness | 0.08 | 0.08 | 0.08 | 0.08 |
Kurtosis | 0.26 | -1.26 | 0.90 | -0.90 |
N.Valid | 974 | 975 | 965 | 1000 |
Pct.Valid | 97.40 | 97.50 | 96.50 | 100.00 |
Don’t forget to specify plain.ascii = FALSE
(or set it as a global option with st_options(plain.ascii = FALSE)
), or you won’t get good results.
This method also works really well, and not having to specify the tmp.img.dir
parameter is a plus.
No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Valid | Missing | ||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | gender [factor] | 1. F 2. M 3. (Missing) |
|
1000 (100%) | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
2 | age [numeric] | Mean (sd) : 49.6 (18.3) min < med < max: 18 < 50 < 80 IQR (CV) : 32 (0.4) | 63 distinct values | 975 (97.5%) | 25 (2.5%) | |||||||||||||||||||||||||||||||||||||||||||||
3 | age.gr [factor] | 1. 18-34 2. 35-50 3. 51-70 4. 71 + |
|
975 (97.5%) | 25 (2.5%) | |||||||||||||||||||||||||||||||||||||||||||||
4 | BMI [numeric] | Mean (sd) : 25.7 (4.5) min < med < max: 8.8 < 25.6 < 39.4 IQR (CV) : 5.7 (0.2) | 974 distinct values | 974 (97.4%) | 26 (2.6%) | |||||||||||||||||||||||||||||||||||||||||||||
5 | smoker [factor] | 1. Yes 2. No |
|
1000 (100%) | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
6 | cigs.per.day [numeric] | Mean (sd) : 6.8 (11.9) min < med < max: 0 < 0 < 40 IQR (CV) : 11 (1.8) | 37 distinct values | 965 (96.5%) | 35 (3.5%) | |||||||||||||||||||||||||||||||||||||||||||||
7 | diseased [factor] | 1. Yes 2. No |
|
1000 (100%) | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
8 | disease [character] | 1. Hypertension 2. Cancer 3. Cholesterol 4. Heart 5. Pulmonary 6. Musculoskeletal 7. Diabetes 8. Hearing 9. Digestive 10. Hypotension [ 3 others ] |
|
222 (22.2%) | 778 (77.8%) | |||||||||||||||||||||||||||||||||||||||||||||
9 | samp.wgts [numeric] | Mean (sd) : 1 (0.1) min < med < max: 0.9 < 1 < 1.1 IQR (CV) : 0.2 (0.1) |
|
1000 (100%) | 0 (0%) |
For data frames containing numerous variables, we can use the max.tbl.height
argument to wrap the results in a scrollable window having the specified height, in pixels. For instance:
print(dfSummary(tobacco, valid.col = FALSE, graph.magnif = 0.75),
max.tbl.height = 300, method = "render")
No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | ||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | gender [factor] | 1. F 2. M 3. (Missing) |
|
0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
2 | age [numeric] | Mean (sd) : 49.6 (18.3) min < med < max: 18 < 50 < 80 IQR (CV) : 32 (0.4) | 63 distinct values | 25 (2.5%) | |||||||||||||||||||||||||||||||||||||||||||||
3 | age.gr [factor] | 1. 18-34 2. 35-50 3. 51-70 4. 71 + |
|
25 (2.5%) | |||||||||||||||||||||||||||||||||||||||||||||
4 | BMI [numeric] | Mean (sd) : 25.7 (4.5) min < med < max: 8.8 < 25.6 < 39.4 IQR (CV) : 5.7 (0.2) | 974 distinct values | 26 (2.6%) | |||||||||||||||||||||||||||||||||||||||||||||
5 | smoker [factor] | 1. Yes 2. No |
|
0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
6 | cigs.per.day [numeric] | Mean (sd) : 6.8 (11.9) min < med < max: 0 < 0 < 40 IQR (CV) : 11 (1.8) | 37 distinct values | 35 (3.5%) | |||||||||||||||||||||||||||||||||||||||||||||
7 | diseased [factor] | 1. Yes 2. No |
|
0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
8 | disease [character] | 1. Hypertension 2. Cancer 3. Cholesterol 4. Heart 5. Pulmonary 6. Musculoskeletal 7. Diabetes 8. Hearing 9. Digestive 10. Hypotension [ 3 others ] |
|
778 (77.8%) | |||||||||||||||||||||||||||||||||||||||||||||
9 | samp.wgts [numeric] | Mean (sd) : 1 (0.1) min < med < max: 0.9 < 1 < 1.1 IQR (CV) : 0.2 (0.1) |
|
0 (0%) |
As explained in the introductory vignette, tb()
can be used to convert summarytools objects created with freq()
and descr()
to simple tibbles that packages specialized in table formatting will be able to process. This is particularly helpful with stby
objects:
library(kableExtra)
library(magrittr)
stby(iris, iris$Species, descr, stats = "fivenum") %>%
tb(order = 3) %>%
kable(format = "html", digits = 2) %>%
collapse_rows(columns = 1, valign = "top")
variable | Species | min | q1 | med | q3 | max |
---|---|---|---|---|---|---|
Petal.Length | setosa | 1.0 | 1.4 | 1.50 | 1.6 | 1.9 |
versicolor | 3.0 | 4.0 | 4.35 | 4.6 | 5.1 | |
virginica | 4.5 | 5.1 | 5.55 | 5.9 | 6.9 | |
Petal.Width | setosa | 0.1 | 0.2 | 0.20 | 0.3 | 0.6 |
versicolor | 1.0 | 1.2 | 1.30 | 1.5 | 1.8 | |
virginica | 1.4 | 1.8 | 2.00 | 2.3 | 2.5 | |
Sepal.Length | setosa | 4.3 | 4.8 | 5.00 | 5.2 | 5.8 |
versicolor | 4.9 | 5.6 | 5.90 | 6.3 | 7.0 | |
virginica | 4.9 | 6.2 | 6.50 | 6.9 | 7.9 | |
Sepal.Width | setosa | 2.3 | 3.2 | 3.40 | 3.7 | 4.4 |
versicolor | 2.0 | 2.5 | 2.80 | 3.0 | 3.4 | |
virginica | 2.2 | 2.8 | 3.00 | 3.2 | 3.8 |
This vignette uses theme rmarkdown::html_vignette
. Its yaml section looks like this:
# ---
# title: "Recommendations for Using summarytools With Rmarkdown"
# author: "Dominic Comtois"
# date: "2020-03-02"
# output:
# rmarkdown::html_vignette:
# css:
# - !expr system.file("rmarkdown/templates/html_vignette/resources/vignette.css", package = "rmarkdown")
# vignette: >
# %\VignetteIndexEntry{Recommendations for Rmarkdown}
# %\VignetteEngine{knitr::rmarkdown}
# %\VignetteEncoding{UTF-8}
# ---
The following summarytools global options have been set. More of them can be useful, but this is a good starting point.
st_options(bootstrap.css = FALSE, # Already part of the theme so no need for it
plain.ascii = FALSE, # One of the essential settings
style = "rmarkdown", # Idem.
dfSummary.silent = TRUE, # Suppresses messages about temporary files
footnote = NA, # Keeping the results minimalistic
subtitle.emphasis = FALSE) # For the vignette theme, this gives better results.
# For other themes, using TRUE might be preferable.
Also, the following knitr chunk options were set this way:
Finally, summarytools’ CSS has been included in the following manner, with chunk option echo = FALSE
:
This is by no way a definitive guide; depending on the themes you use, you could find that other settings yield better results. If you are looking to create a Word or a PDF document, you might want to try different combinations of options.