Once a dataset is cleaned and ready for statistical analysis, the first step is typically to summarize it. The univariate_table()
function makes it easy to create a custom descriptive analysis while consistently producing clean, presentation-ready output. It is built to integrate directly into your analysis work flow (e.g. R markdown) but can also be called from the console and be rendered in a number of formats.
require(cheese)
heart_disease %>%
univariate_table()
Variable | Level | Summary |
---|---|---|
Age | 56 (48, 61) | |
Sex | Female | 97 (32.01%) |
Male | 206 (67.99%) | |
ChestPain | Typical angina | 23 (7.59%) |
Atypical angina | 50 (16.5%) | |
Non-anginal pain | 86 (28.38%) | |
Asymptomatic | 144 (47.52%) | |
BP | 130 (120, 140) | |
Cholesterol | 241 (211, 275) | |
MaximumHR | 153 (133.5, 166) | |
ExerciseInducedAngina | No | 204 (67.33%) |
Yes | 99 (32.67%) | |
HeartDisease | No | 164 (54.13%) |
Yes | 139 (45.87%) |
By default, an HTML table is produced containing descriptive statistics for columns in the dataset.
In the table above, the summary statistics are presented within the cells in a particular format for different types of data. You can use the _summary
arguments to customize not only the appearance that the results are presented with, but the values that go into the results themselves.
Suppose instead of the "median (q1, q3)"
being displayed for numeric data, you want the "mean [sd] / median"
, in that exact format:
heart_disease %>%
univariate_table(
numeric_summary =
c(
Summary = "mean [sd] / median"
)
)
Variable | Level | Summary |
---|---|---|
Age | 54.44 [9.04] / 56 | |
Sex | Female | 97 (32.01%) |
Male | 206 (67.99%) | |
ChestPain | Typical angina | 23 (7.59%) |
Atypical angina | 50 (16.5%) | |
Non-anginal pain | 86 (28.38%) | |
Asymptomatic | 144 (47.52%) | |
BP | 131.69 [17.6] / 130 | |
Cholesterol | 246.69 [51.78] / 241 | |
MaximumHR | 149.61 [22.88] / 153 | |
ExerciseInducedAngina | No | 204 (67.33%) |
Yes | 99 (32.67%) | |
HeartDisease | No | 164 (54.13%) |
Yes | 139 (45.87%) |
The name Summary
was used to ensure that the result for the numeric data binded in the same column as the result for the other data types. If you chose to name it something else, you'd get a new column with those summaries:
heart_disease %>%
univariate_table(
numeric_summary =
c(
NewSummary = "mean [sd] / median"
)
)
Variable | Level | NewSummary | Summary |
---|---|---|---|
Age | 54.44 [9.04] / 56 | ||
Sex | Female | 97 (32.01%) | |
Male | 206 (67.99%) | ||
ChestPain | Typical angina | 23 (7.59%) | |
Atypical angina | 50 (16.5%) | ||
Non-anginal pain | 86 (28.38%) | ||
Asymptomatic | 144 (47.52%) | ||
BP | 131.69 [17.6] / 130 | ||
Cholesterol | 246.69 [51.78] / 241 | ||
MaximumHR | 149.61 [22.88] / 153 | ||
ExerciseInducedAngina | No | 204 (67.33%) | |
Yes | 99 (32.67%) | ||
HeartDisease | No | 164 (54.13%) | |
Yes | 139 (45.87%) |
You can add as many summary columns as you want separately for each type of data:
heart_disease %>%
univariate_table(
numeric_summary =
c(
`Numeric only` = "mean [sd] / median",
Summary = "median (q1, q3)"
),
categorical_summary =
c(
Summary = "count",
`Categorical only` = "percent = 100 * proportion"
)
)
Variable | Level | Numeric only | Summary | Categorical only |
---|---|---|---|---|
Age | 54.44 [9.04] / 56 | 56 (48, 61) | ||
Sex | Female | 97 | 32.01 = 100 * 0.32 | |
Male | 206 | 67.99 = 100 * 0.68 | ||
ChestPain | Typical angina | 23 | 7.59 = 100 * 0.08 | |
Atypical angina | 50 | 16.5 = 100 * 0.17 | ||
Non-anginal pain | 86 | 28.38 = 100 * 0.28 | ||
Asymptomatic | 144 | 47.52 = 100 * 0.48 | ||
BP | 131.69 [17.6] / 130 | 130 (120, 140) | ||
Cholesterol | 246.69 [51.78] / 241 | 241 (211, 275) | ||
MaximumHR | 149.61 [22.88] / 153 | 153 (133.5, 166) | ||
ExerciseInducedAngina | No | 204 | 67.33 = 100 * 0.67 | |
Yes | 99 | 32.67 = 100 * 0.33 | ||
HeartDisease | No | 164 | 54.13 = 100 * 0.54 | |
Yes | 139 | 45.87 = 100 * 0.46 |
A more visually-appealing case for adding multiple summaries is probably when all the data is the same type:
heart_disease %>%
univariate_table(
categorical_types = NULL, #Easily disable categorical data from being summarized
numeric_summary =
c(
`Median (Q1, Q3)` = "median (q1, q3)",
`Min-Max` = "min - max",
`Mean (SD)` = "mean (sd)"
)
)
Variable | Median (Q1, Q3) | Min-Max | Mean (SD) |
---|---|---|---|
Age | 56 (48, 61) | 29 - 77 | 54.44 (9.04) |
BP | 130 (120, 140) | 94 - 200 | 131.69 (17.6) |
Cholesterol | 241 (211, 275) | 126 - 564 | 246.69 (51.78) |
MaximumHR | 153 (133.5, 166) | 71 - 202 | 149.61 (22.88) |
Or when adding a summary that applies to all columns:
heart_disease %>%
univariate_table(
all_summary =
c(
`# obs. non-missing` = "available of length"
)
)
Variable | Level | Summary | # obs. non-missing |
---|---|---|---|
Age | 56 (48, 61) | 303 of 303 | |
Sex | 303 of 303 | ||
Female | 97 (32.01%) | ||
Male | 206 (67.99%) | ||
ChestPain | 303 of 303 | ||
Typical angina | 23 (7.59%) | ||
Atypical angina | 50 (16.5%) | ||
Non-anginal pain | 86 (28.38%) | ||
Asymptomatic | 144 (47.52%) | ||
BP | 130 (120, 140) | 303 of 303 | |
Cholesterol | 241 (211, 275) | 303 of 303 | |
BloodSugar | 303 of 303 | ||
MaximumHR | 153 (133.5, 166) | 303 of 303 | |
ExerciseInducedAngina | 303 of 303 | ||
No | 204 (67.33%) | ||
Yes | 99 (32.67%) | ||
HeartDisease | 303 of 303 | ||
No | 164 (54.13%) | ||
Yes | 139 (45.87%) |
These add an extra row for categorical variables. You may have also noticed that the BloodSugar
column didn't show up in the table until the all_summary
argument was used–this is because it is not classified as numeric or categorical data, and thus not evaluated by default. See the “Backend functionality” section to learn more.
The strata
argument takes a formula()
that can be used to stratify the analysis by any number of variables. Columns on the left side will appear down the rows, and columns on the right side will spread across the columns. You can use +
on either side to specify more than one column. Let's start by stratifying sex across the columns:
heart_disease %>%
univariate_table(
strata = ~ Sex
)
Variable | Level | Female | Male |
---|---|---|---|
Age | 57 (50, 63) | 54.5 (47, 59.75) | |
ChestPain | Typical angina | 4 (4.12%) | 19 (9.22%) |
Atypical angina | 18 (18.56%) | 32 (15.53%) | |
Non-anginal pain | 35 (36.08%) | 51 (24.76%) | |
Asymptomatic | 40 (41.24%) | 104 (50.49%) | |
BP | 132 (120, 140) | 130 (120, 140) | |
Cholesterol | 254 (215, 302) | 235 (208.75, 268.5) | |
MaximumHR | 157 (142, 165) | 150.5 (132, 167.5) | |
ExerciseInducedAngina | No | 75 (77.32%) | 129 (62.62%) |
Yes | 22 (22.68%) | 77 (37.38%) | |
HeartDisease | No | 72 (74.23%) | 92 (44.66%) |
Yes | 25 (25.77%) | 114 (55.34%) |
You can do the same thing down the rows:
heart_disease %>%
univariate_table(
strata = Sex ~ 1
)
Sex | Variable | Level | Summary |
---|---|---|---|
Female | Age | 57 (50, 63) | |
ChestPain | Typical angina | 4 (4.12%) | |
Atypical angina | 18 (18.56%) | ||
Non-anginal pain | 35 (36.08%) | ||
Asymptomatic | 40 (41.24%) | ||
BP | 132 (120, 140) | ||
Cholesterol | 254 (215, 302) | ||
MaximumHR | 157 (142, 165) | ||
ExerciseInducedAngina | No | 75 (77.32%) | |
Yes | 22 (22.68%) | ||
HeartDisease | No | 72 (74.23%) | |
Yes | 25 (25.77%) | ||
Male | Age | 54.5 (47, 59.75) | |
ChestPain | Typical angina | 19 (9.22%) | |
Atypical angina | 32 (15.53%) | ||
Non-anginal pain | 51 (24.76%) | ||
Asymptomatic | 104 (50.49%) | ||
BP | 130 (120, 140) | ||
Cholesterol | 235 (208.75, 268.5) | ||
MaximumHR | 150.5 (132, 167.5) | ||
ExerciseInducedAngina | No | 129 (62.62%) | |
Yes | 77 (37.38%) | ||
HeartDisease | No | 92 (44.66%) | |
Yes | 114 (55.34%) |
Or even both:
heart_disease %>%
univariate_table(
strata = Sex ~ HeartDisease
)
Sex | Variable | Level | No | Yes |
---|---|---|---|---|
Female | Age | 54 (46, 63.25) | 60 (57, 62) | |
ChestPain | Typical angina | 4 (5.56%) | 0 (0%) | |
Atypical angina | 16 (22.22%) | 2 (8%) | ||
Non-anginal pain | 34 (47.22%) | 1 (4%) | ||
Asymptomatic | 18 (25%) | 22 (88%) | ||
BP | 130 (119.5, 140) | 140 (130, 158) | ||
Cholesterol | 249 (210.75, 289.5) | 268 (236, 307) | ||
MaximumHR | 159 (146.75, 167.25) | 146 (133, 157) | ||
ExerciseInducedAngina | No | 64 (88.89%) | 11 (44%) | |
Yes | 8 (11.11%) | 14 (56%) | ||
Male | Age | 52 (44, 57) | 57.5 (51, 61) | |
ChestPain | Typical angina | 12 (13.04%) | 7 (6.14%) | |
Atypical angina | 25 (27.17%) | 7 (6.14%) | ||
Non-anginal pain | 34 (36.96%) | 17 (14.91%) | ||
Asymptomatic | 21 (22.83%) | 83 (72.81%) | ||
BP | 130 (120, 140) | 130 (120, 140) | ||
Cholesterol | 229.5 (206.5, 250.75) | 247.5 (212, 282) | ||
MaximumHR | 163 (150, 175.75) | 141 (125, 156) | ||
ExerciseInducedAngina | No | 77 (83.7%) | 52 (45.61%) | |
Yes | 15 (16.3%) | 62 (54.39%) |
Now suppose you want both stratification variables across the columns:
heart_disease %>%
univariate_table(
strata = ~ Sex + HeartDisease
)
Variable | Level | No | Yes | No | Yes |
---|---|---|---|---|---|
Age | 54 (46, 63.25) | 60 (57, 62) | 52 (44, 57) | 57.5 (51, 61) | |
ChestPain | Typical angina | 4 (5.56%) | 0 (0%) | 12 (13.04%) | 7 (6.14%) |
Atypical angina | 16 (22.22%) | 2 (8%) | 25 (27.17%) | 7 (6.14%) | |
Non-anginal pain | 34 (47.22%) | 1 (4%) | 34 (36.96%) | 17 (14.91%) | |
Asymptomatic | 18 (25%) | 22 (88%) | 21 (22.83%) | 83 (72.81%) | |
BP | 130 (119.5, 140) | 140 (130, 158) | 130 (120, 140) | 130 (120, 140) | |
Cholesterol | 249 (210.75, 289.5) | 268 (236, 307) | 229.5 (206.5, 250.75) | 247.5 (212, 282) | |
MaximumHR | 159 (146.75, 167.25) | 146 (133, 157) | 163 (150, 175.75) | 141 (125, 156) | |
ExerciseInducedAngina | No | 64 (88.89%) | 11 (44%) | 77 (83.7%) | 52 (45.61%) |
Yes | 8 (11.11%) | 14 (56%) | 15 (16.3%) | 62 (54.39%) |
The levels will span the columns in a hierarchical fashion depending on their order in the formula:
heart_disease %>%
univariate_table(
strata = ~ HeartDisease + Sex
)
Variable | Level | Female | Male | Female | Male |
---|---|---|---|---|---|
Age | 54 (46, 63.25) | 52 (44, 57) | 60 (57, 62) | 57.5 (51, 61) | |
ChestPain | Typical angina | 4 (5.56%) | 12 (13.04%) | 0 (0%) | 7 (6.14%) |
Atypical angina | 16 (22.22%) | 25 (27.17%) | 2 (8%) | 7 (6.14%) | |
Non-anginal pain | 34 (47.22%) | 34 (36.96%) | 1 (4%) | 17 (14.91%) | |
Asymptomatic | 18 (25%) | 21 (22.83%) | 22 (88%) | 83 (72.81%) | |
BP | 130 (119.5, 140) | 130 (120, 140) | 140 (130, 158) | 130 (120, 140) | |
Cholesterol | 249 (210.75, 289.5) | 229.5 (206.5, 250.75) | 268 (236, 307) | 247.5 (212, 282) | |
MaximumHR | 159 (146.75, 167.25) | 163 (150, 175.75) | 146 (133, 157) | 141 (125, 156) | |
ExerciseInducedAngina | No | 64 (88.89%) | 77 (83.7%) | 11 (44%) | 52 (45.61%) |
Yes | 8 (11.11%) | 15 (16.3%) | 14 (56%) | 62 (54.39%) |
Similarly, the rows also collapse hierarchically:
heart_disease %>%
univariate_table(
strata = HeartDisease + Sex ~ 1
)
HeartDisease | Sex | Variable | Level | Summary |
---|---|---|---|---|
No | Female | Age | 54 (46, 63.25) | |
ChestPain | Typical angina | 4 (5.56%) | ||
Atypical angina | 16 (22.22%) | |||
Non-anginal pain | 34 (47.22%) | |||
Asymptomatic | 18 (25%) | |||
BP | 130 (119.5, 140) | |||
Cholesterol | 249 (210.75, 289.5) | |||
MaximumHR | 159 (146.75, 167.25) | |||
ExerciseInducedAngina | No | 64 (88.89%) | ||
Yes | 8 (11.11%) | |||
Male | Age | 52 (44, 57) | ||
ChestPain | Typical angina | 12 (13.04%) | ||
Atypical angina | 25 (27.17%) | |||
Non-anginal pain | 34 (36.96%) | |||
Asymptomatic | 21 (22.83%) | |||
BP | 130 (120, 140) | |||
Cholesterol | 229.5 (206.5, 250.75) | |||
MaximumHR | 163 (150, 175.75) | |||
ExerciseInducedAngina | No | 77 (83.7%) | ||
Yes | 15 (16.3%) | |||
Yes | Female | Age | 60 (57, 62) | |
ChestPain | Typical angina | 0 (0%) | ||
Atypical angina | 2 (8%) | |||
Non-anginal pain | 1 (4%) | |||
Asymptomatic | 22 (88%) | |||
BP | 140 (130, 158) | |||
Cholesterol | 268 (236, 307) | |||
MaximumHR | 146 (133, 157) | |||
ExerciseInducedAngina | No | 11 (44%) | ||
Yes | 14 (56%) | |||
Male | Age | 57.5 (51, 61) | ||
ChestPain | Typical angina | 7 (6.14%) | ||
Atypical angina | 7 (6.14%) | |||
Non-anginal pain | 17 (14.91%) | |||
Asymptomatic | 83 (72.81%) | |||
BP | 130 (120, 140) | |||
Cholesterol | 247.5 (212, 282) | |||
MaximumHR | 141 (125, 156) | |||
ExerciseInducedAngina | No | 52 (45.61%) | ||
Yes | 62 (54.39%) |
You can use any of the functionality described in the previous section with stratification variables as well:
heart_disease %>%
univariate_table(
strata = ~ Sex + HeartDisease,
numeric_summary =
c(
`Mean (SD)` = "mean (sd)"
),
categorical_summary =
c(
`Count (%)` = "count (percent%)"
)
)
Variable | Level | Mean (SD) | Count (%) | Mean (SD) | Count (%) | Mean (SD) | Count (%) | Mean (SD) | Count (%) |
---|---|---|---|---|---|---|---|---|---|
Age | 54.56 (10.27) | 59.08 (4.86) | 51.04 (8.62) | 56.09 (8.39) | |||||
ChestPain | Typical angina | 4 (5.56%) | 0 (0%) | 12 (13.04%) | 7 (6.14%) | ||||
Atypical angina | 16 (22.22%) | 2 (8%) | 25 (27.17%) | 7 (6.14%) | |||||
Non-anginal pain | 34 (47.22%) | 1 (4%) | 34 (36.96%) | 17 (14.91%) | |||||
Asymptomatic | 18 (25%) | 22 (88%) | 21 (22.83%) | 83 (72.81%) | |||||
BP | 128.74 (16.54) | 146.6 (21.12) | 129.65 (16.02) | 131.93 (17.22) | |||||
Cholesterol | 256.75 (66.22) | 276.16 (59.88) | 231.6 (37.64) | 246.06 (45.44) | |||||
MaximumHR | 154.03 (19.25) | 143.16 (20.18) | 161.78 (18.56) | 138.4 (23.08) | |||||
ExerciseInducedAngina | No | 64 (88.89%) | 11 (44%) | 77 (83.7%) | 52 (45.61%) | ||||
Yes | 8 (11.11%) | 14 (56%) | 15 (16.3%) | 62 (54.39%) |
The summary columns simply get added to the column-spanning hierarchy.
The add_n
argument will add the sample size to the label for the stratification group:
heart_disease %>%
univariate_table(
strata = ~ Sex,
add_n = TRUE
)
Variable | Level | Female (N=97) | Male (N=206) |
---|---|---|---|
Age | 57 (50, 63) | 54.5 (47, 59.75) | |
ChestPain | Typical angina | 4 (4.12%) | 19 (9.22%) |
Atypical angina | 18 (18.56%) | 32 (15.53%) | |
Non-anginal pain | 35 (36.08%) | 51 (24.76%) | |
Asymptomatic | 40 (41.24%) | 104 (50.49%) | |
BP | 132 (120, 140) | 130 (120, 140) | |
Cholesterol | 254 (215, 302) | 235 (208.75, 268.5) | |
MaximumHR | 157 (142, 165) | 150.5 (132, 167.5) | |
ExerciseInducedAngina | No | 75 (77.32%) | 129 (62.62%) |
Yes | 22 (22.68%) | 77 (37.38%) | |
HeartDisease | No | 72 (74.23%) | 92 (44.66%) |
Yes | 25 (25.77%) | 114 (55.34%) |
When multiple stratification variables are added on one side of the formula, the sample size will show up on the lowest level of the hierarchy, excluding summary columns:
heart_disease %>%
univariate_table(
strata = ~ Sex + HeartDisease,
add_n = TRUE
)
Variable | Level | No (N=72) | Yes (N=25) | No (N=92) | Yes (N=114) |
---|---|---|---|---|---|
Age | 54 (46, 63.25) | 60 (57, 62) | 52 (44, 57) | 57.5 (51, 61) | |
ChestPain | Typical angina | 4 (5.56%) | 0 (0%) | 12 (13.04%) | 7 (6.14%) |
Atypical angina | 16 (22.22%) | 2 (8%) | 25 (27.17%) | 7 (6.14%) | |
Non-anginal pain | 34 (47.22%) | 1 (4%) | 34 (36.96%) | 17 (14.91%) | |
Asymptomatic | 18 (25%) | 22 (88%) | 21 (22.83%) | 83 (72.81%) | |
BP | 130 (119.5, 140) | 140 (130, 158) | 130 (120, 140) | 130 (120, 140) | |
Cholesterol | 249 (210.75, 289.5) | 268 (236, 307) | 229.5 (206.5, 250.75) | 247.5 (212, 282) | |
MaximumHR | 159 (146.75, 167.25) | 146 (133, 157) | 163 (150, 175.75) | 141 (125, 156) | |
ExerciseInducedAngina | No | 64 (88.89%) | 11 (44%) | 77 (83.7%) | 52 (45.61%) |
Yes | 8 (11.11%) | 14 (56%) | 15 (16.3%) | 62 (54.39%) |
A limitation is that when sample size is added in the presence of row and column strata, it is displayed for the marginal groups only:
heart_disease %>%
univariate_table(
strata = Sex ~ HeartDisease,
add_n = TRUE
)
Sex | Variable | Level | No (N=164) | Yes (N=139) |
---|---|---|---|---|
Female (N=97) | Age | 54 (46, 63.25) | 60 (57, 62) | |
ChestPain | Typical angina | 4 (5.56%) | 0 (0%) | |
Atypical angina | 16 (22.22%) | 2 (8%) | ||
Non-anginal pain | 34 (47.22%) | 1 (4%) | ||
Asymptomatic | 18 (25%) | 22 (88%) | ||
BP | 130 (119.5, 140) | 140 (130, 158) | ||
Cholesterol | 249 (210.75, 289.5) | 268 (236, 307) | ||
MaximumHR | 159 (146.75, 167.25) | 146 (133, 157) | ||
ExerciseInducedAngina | No | 64 (88.89%) | 11 (44%) | |
Yes | 8 (11.11%) | 14 (56%) | ||
Male (N=206) | Age | 52 (44, 57) | 57.5 (51, 61) | |
ChestPain | Typical angina | 12 (13.04%) | 7 (6.14%) | |
Atypical angina | 25 (27.17%) | 7 (6.14%) | ||
Non-anginal pain | 34 (36.96%) | 17 (14.91%) | ||
Asymptomatic | 21 (22.83%) | 83 (72.81%) | ||
BP | 130 (120, 140) | 130 (120, 140) | ||
Cholesterol | 229.5 (206.5, 250.75) | 247.5 (212, 282) | ||
MaximumHR | 163 (150, 175.75) | 141 (125, 156) | ||
ExerciseInducedAngina | No | 77 (83.7%) | 52 (45.61%) | |
Yes | 15 (16.3%) | 62 (54.39%) |
Often when a descriptive analysis is stratified by one or more variables, it is also of interest to add statistics that compare each variable across the groups. The associations
argument allows you to add a list containing an unlimited number of functions that can produce a scalar value to be placed in the table. First, let's define a function:
#Function for a p-value
pval <-
function(y, x) {
#For categorical data use Fisher's Exact test
if(some_type(x, "factor")) {
p <- fisher.test(factor(y), factor(x), simulate.p.value = TRUE)$p.value
#Otherwise use Kruskall-Wallis
} else {
p <- kruskal.test(x, factor(y))$p.value
}
ifelse(p < 0.001, "<0.001", as.character(round(p, 2)))
}
The stratification variable will be placed in the second argument of the function(s) provided. Now you can add it to the function call:
heart_disease %>%
univariate_table(
strata = ~ HeartDisease,
associations = list(`P-value` = pval)
)
Variable | Level | No | Yes | P-value |
---|---|---|---|---|
Age | 52 (44.75, 59) | 58 (52, 62) | 0.12 | |
Sex | <0.001 | |||
Female | 72 (43.9%) | 25 (17.99%) | ||
Male | 92 (56.1%) | 114 (82.01%) | ||
ChestPain | <0.001 | |||
Typical angina | 16 (9.76%) | 7 (5.04%) | ||
Atypical angina | 41 (25%) | 9 (6.47%) | ||
Non-anginal pain | 68 (41.46%) | 18 (12.95%) | ||
Asymptomatic | 39 (23.78%) | 105 (75.54%) | ||
BP | 130 (120, 140) | 130 (120, 145) | 0.51 | |
Cholesterol | 234.5 (208.75, 267.25) | 249 (217.5, 283.5) | 0.11 | |
MaximumHR | 161 (148.75, 172) | 142 (125, 156.5) | 0.08 | |
ExerciseInducedAngina | <0.001 | |||
No | 141 (85.98%) | 63 (45.32%) | ||
Yes | 23 (14.02%) | 76 (54.68%) |
The name of function in the list is what becomes the column label.
The comparison will take place across the number of subgroups there are within the column stratification:
heart_disease %>%
univariate_table(
strata = ~ Sex + HeartDisease,
associations = list(`P-value` = pval)
)
Variable | Level | No | Yes | No | Yes | P-value |
---|---|---|---|---|---|---|
Age | 54 (46, 63.25) | 60 (57, 62) | 52 (44, 57) | 57.5 (51, 61) | 0.53 | |
ChestPain | <0.001 | |||||
Typical angina | 4 (5.56%) | 0 (0%) | 12 (13.04%) | 7 (6.14%) | ||
Atypical angina | 16 (22.22%) | 2 (8%) | 25 (27.17%) | 7 (6.14%) | ||
Non-anginal pain | 34 (47.22%) | 1 (4%) | 34 (36.96%) | 17 (14.91%) | ||
Asymptomatic | 18 (25%) | 22 (88%) | 21 (22.83%) | 83 (72.81%) | ||
BP | 130 (119.5, 140) | 140 (130, 158) | 130 (120, 140) | 130 (120, 140) | 0.55 | |
Cholesterol | 249 (210.75, 289.5) | 268 (236, 307) | 229.5 (206.5, 250.75) | 247.5 (212, 282) | 0.11 | |
MaximumHR | 159 (146.75, 167.25) | 146 (133, 157) | 163 (150, 175.75) | 141 (125, 156) | 0.01 | |
ExerciseInducedAngina | <0.001 | |||||
No | 64 (88.89%) | 11 (44%) | 77 (83.7%) | 52 (45.61%) | ||
Yes | 8 (11.11%) | 14 (56%) | 15 (16.3%) | 62 (54.39%) |
However, using a row stratification makes the comparisons be within those groups:
heart_disease %>%
univariate_table(
strata = Sex ~ HeartDisease,
associations = list(`P-value` = pval)
)
Sex | Variable | Level | No | Yes | P-value |
---|---|---|---|---|---|
Female | Age | 54 (46, 63.25) | 60 (57, 62) | 0.17 | |
ChestPain | <0.001 | ||||
Typical angina | 4 (5.56%) | 0 (0%) | |||
Atypical angina | 16 (22.22%) | 2 (8%) | |||
Non-anginal pain | 34 (47.22%) | 1 (4%) | |||
Asymptomatic | 18 (25%) | 22 (88%) | |||
BP | 130 (119.5, 140) | 140 (130, 158) | 0.37 | ||
Cholesterol | 249 (210.75, 289.5) | 268 (236, 307) | 0.58 | ||
MaximumHR | 159 (146.75, 167.25) | 146 (133, 157) | 0.15 | ||
ExerciseInducedAngina | <0.001 | ||||
No | 64 (88.89%) | 11 (44%) | |||
Yes | 8 (11.11%) | 14 (56%) | |||
Male | Age | 52 (44, 57) | 57.5 (51, 61) | 0.29 | |
ChestPain | <0.001 | ||||
Typical angina | 12 (13.04%) | 7 (6.14%) | |||
Atypical angina | 25 (27.17%) | 7 (6.14%) | |||
Non-anginal pain | 34 (36.96%) | 17 (14.91%) | |||
Asymptomatic | 21 (22.83%) | 83 (72.81%) | |||
BP | 130 (120, 140) | 130 (120, 140) | 0.71 | ||
Cholesterol | 229.5 (206.5, 250.75) | 247.5 (212, 282) | 0.11 | ||
MaximumHR | 163 (150, 175.75) | 141 (125, 156) | 0.26 | ||
ExerciseInducedAngina | <0.001 | ||||
No | 77 (83.7%) | 52 (45.61%) | |||
Yes | 15 (16.3%) | 62 (54.39%) |
In general, there must be at least one column stratification variable in order to use association metrics. See univariate_associations()
for more details on the workhorse of this functionality.
descriptives()
is the function that drives the computation behind the statistics for the columns of the input dataset. Any of its arguments can be passed from univariate_table()
to add further customization.
As noted above, one of columns did not appear in the table by default because it was a logical()
type. By default, only factor()
and numeric()
types are placed into the result, though there are (at least) three ways to include it:
You could simply just make the column a conformable type outside of the call:
heart_disease %>%
dplyr::mutate(
BloodSugar = factor(BloodSugar)
) %>%
univariate_table()
Variable | Level | Summary |
---|---|---|
Age | 56 (48, 61) | |
Sex | Female | 97 (32.01%) |
Male | 206 (67.99%) | |
ChestPain | Typical angina | 23 (7.59%) |
Atypical angina | 50 (16.5%) | |
Non-anginal pain | 86 (28.38%) | |
Asymptomatic | 144 (47.52%) | |
BP | 130 (120, 140) | |
Cholesterol | 241 (211, 275) | |
BloodSugar | FALSE | 258 (85.15%) |
TRUE | 45 (14.85%) | |
MaximumHR | 153 (133.5, 166) | |
ExerciseInducedAngina | No | 204 (67.33%) |
Yes | 99 (32.67%) | |
HeartDisease | No | 164 (54.13%) |
Yes | 139 (45.87%) |
The _types
arguments allow you to specify the data types that are to be interpreted by the high-level function call. Let's allow logical()
types to be treated as a categorical variable:
heart_disease %>%
univariate_table(
categorical_types = c("factor", "logical")
)
Variable | Level | Summary |
---|---|---|
Age | 56 (48, 61) | |
Sex | Female | 97 (32.01%) |
Male | 206 (67.99%) | |
ChestPain | Typical angina | 23 (7.59%) |
Atypical angina | 50 (16.5%) | |
Non-anginal pain | 86 (28.38%) | |
Asymptomatic | 144 (47.52%) | |
BP | 130 (120, 140) | |
Cholesterol | 241 (211, 275) | |
BloodSugar | FALSE | 258 (85.15%) |
TRUE | 45 (14.85%) | |
MaximumHR | 153 (133.5, 166) | |
ExerciseInducedAngina | No | 204 (67.33%) |
Yes | 99 (32.67%) | |
HeartDisease | No | 164 (54.13%) |
Yes | 139 (45.87%) |
The most flexible approach would be to define its own set of functions. By default, the data type of anything that is not interpreted as categorical or numeric is considered “other”. There is infrastruce in place to supply functions and summaries in the same manner for these columns.
heart_disease %>%
univariate_table(
f_other = list(count = function(x) table(x)),
other_summary =
c(
Summary = "count"
)
)
Variable | Level | Summary |
---|---|---|
Age | 56 (48, 61) | |
Sex | Female | 97 (32.01%) |
Male | 206 (67.99%) | |
ChestPain | Typical angina | 23 (7.59%) |
Atypical angina | 50 (16.5%) | |
Non-anginal pain | 86 (28.38%) | |
Asymptomatic | 144 (47.52%) | |
BP | 130 (120, 140) | |
Cholesterol | 241 (211, 275) | |
BloodSugar | 258 | |
45 | ||
MaximumHR | 153 (133.5, 166) | |
ExerciseInducedAngina | No | 204 (67.33%) |
Yes | 99 (32.67%) | |
HeartDisease | No | 164 (54.13%) |
Yes | 139 (45.87%) |
You would need to also define functions for the percentages, proportions, etc. to exactly match the other examples.
You can also add custom functions that can be available for numeric or categorical columns:
heart_disease %>%
univariate_table(
categorical_types = NULL,
f_numeric =
list(
cv = ~sd(.x) / mean(.x)
),
numeric_summary =
c(
`Coef. of variation` = "sd / mean = cv"
)
)
Variable | Coef. of variation |
---|---|
Age | 9.04 / 54.44 = 0.17 |
BP | 17.6 / 131.69 = 0.13 |
Cholesterol | 51.78 / 246.69 = 0.21 |
MaximumHR | 22.88 / 149.61 = 0.15 |
The names of functions become the patterns that searched in the string templates.
Finally, we'll look at a few of the appearance-related arguments. These can be applied with any combination of other arguments.
As mentioned above, the default format for the table is HTML, but you could choose an alternative with the format
argument:
heart_disease %>%
univariate_table(
format = "none"
)
## # A tibble: 14 x 3
## Variable Level Summary
## <chr> <chr> <chr>
## 1 Age "" 56 (48, 61)
## 2 Sex Female 97 (32.01%)
## 3 "" Male 206 (67.99%)
## 4 ChestPain Typical angina 23 (7.59%)
## 5 "" Atypical angina 50 (16.5%)
## 6 "" Non-anginal pain 86 (28.38%)
## 7 "" Asymptomatic 144 (47.52%)
## 8 BP "" 130 (120, 140)
## 9 Cholesterol "" 241 (211, 275)
## 10 MaximumHR "" 153 (133.5, 166)
## 11 ExerciseInducedAngina No 204 (67.33%)
## 12 "" Yes 99 (32.67%)
## 13 HeartDisease No 164 (54.13%)
## 14 "" Yes 139 (45.87%)
There are also options for "latex", "pandoc", "markdown"
.
You can use the labels
and levels
arguments to add clean text to any of the variable or categorical level names, and the order
argument to change the position of the variables in the result:
heart_disease %>%
univariate_table(
labels =
c(
Age = "Age (years)",
ChestPain = "Chest pain"
),
levels =
list(
Sex =
c(
Male = "M"
)
),
order =
c(
"BP",
"Age",
"Cholesterol"
)
)
Variable | Level | Summary |
---|---|---|
BP | 130 (120, 140) | |
Age (years) | 56 (48, 61) | |
Cholesterol | 241 (211, 275) | |
Chest pain | Typical angina | 23 (7.59%) |
Atypical angina | 50 (16.5%) | |
Non-anginal pain | 86 (28.38%) | |
Asymptomatic | 144 (47.52%) | |
ExerciseInducedAngina | No | 204 (67.33%) |
Yes | 99 (32.67%) | |
HeartDisease | No | 164 (54.13%) |
Yes | 139 (45.87%) | |
MaximumHR | 153 (133.5, 166) | |
Sex | Female | 97 (32.01%) |
M | 206 (67.99%) |
Notice you only need to specify values that need to be changed. Also, ordering is done with the original names even when relabeled.
The variableName
and levelName
arguments are used to change what the headers are for the column names and categorical levels, while fill_blanks
determines what goes in empty cells. Finally, the caption
argument specifies labels the entire table:
heart_disease %>%
univariate_table(
variableName = "THESE ARE VARIABLES",
levelName = "THESE ARE LEVELS",
fill_blanks = "BLANK",
caption = "HERE IS MY CAPTION"
)
THESE ARE VARIABLES | THESE ARE LEVELS | Summary |
---|---|---|
Age | BLANK | 56 (48, 61) |
Sex | Female | 97 (32.01%) |
Male | 206 (67.99%) | |
ChestPain | Typical angina | 23 (7.59%) |
Atypical angina | 50 (16.5%) | |
Non-anginal pain | 86 (28.38%) | |
Asymptomatic | 144 (47.52%) | |
BP | BLANK | 130 (120, 140) |
Cholesterol | BLANK | 241 (211, 275) |
MaximumHR | BLANK | 153 (133.5, 166) |
ExerciseInducedAngina | No | 204 (67.33%) |
Yes | 99 (32.67%) | |
HeartDisease | No | 164 (54.13%) |
Yes | 139 (45.87%) |