The purpose of tab is to make it easier to create tables for papers, including Table 1’s showing characteristics of the sample and summary tables for fitted regression models. Currently, the following functions are included:
tabmeans
compares means in two or more groups.tabmedians
compares medians in two or more groups.tabfreq
compares frequencies in two or more groups.tabmulti
compares multiple variables in two or more groups.tabmeans.svy
, tabmedians.svy
, tabfreq.svy
, and tabmulti.svy
serve the same purposes as the above functions, but for complex survey data.tabglm
summarizes generalized linear models (GLM’s) fit via glm
or survey::svyglm
.tabgee
summarizes generalized estimating equation models (GEE’s) fit via gee::gee
.tabcoxph
summarizes Cox Proportional Hazards models fit via survival::coxph
or survey::svycoxph
.You can use tabmulti
to compare characteristics across levels of a factor variable, e.g. here comparing age, sex, and race by treatment group in the toy dataset tabdata
.
tabmulti(Age + Sex + Race ~ Group, data = tabdata) %>% kable()
Variable | Control | Treatment | P |
---|---|---|---|
Age, M (SD) | 70.5 (5.3) | 69.5 (5.9) | 0.15 |
Sex, n (%) | <0.001 | ||
Female | 93 (68.4) | 62 (38.5) | |
Male | 43 (31.6) | 99 (61.5) | |
Race, n (%) | 0.29 | ||
White | 46 (34.1) | 65 (39.6) | |
Black | 36 (26.7) | 52 (31.7) | |
Mexican American | 21 (15.6) | 19 (11.6) | |
Other | 32 (23.7) | 28 (17.1) |
To illustrate some options, we can request Age
and Race
to print as Age (years)
and Race/ethnicity
, compare medians rather than means for age, and include the sample sizes in the column headings:
tabmulti(Age + Sex + Race ~ Group, data = tabdata,
yvarlabels = list(Age = "Age (years)", Race = "Race/ethnicity"),
ymeasures = c("median", "freq", "freq"),
listwise.deletion = TRUE,
n.headings = TRUE) %>% kable()
Variable | Control (n = 134) | Treatment (n = 158) | P |
---|---|---|---|
Age (years), Median (IQR) | 70.0 (9.8) | 69.0 (11.0) | 0.19 |
Sex, n (%) | <0.001 | ||
Female | 92 (68.7) | 60 (38.0) | |
Male | 42 (31.3) | 98 (62.0) | |
Race/ethnicity, n (%) | 0.26 | ||
White | 46 (34.3) | 64 (40.5) | |
Black | 36 (26.9) | 50 (31.6) | |
Mexican American | 21 (15.7) | 17 (10.8) | |
Other | 31 (23.1) | 27 (17.1) |
Logistic regression for 1-year mortality vs. age, sex, and treatment, with the binary factor variables displayed in a “compressed” format:
fit <- glm(death_1yr ~ Age + Sex + Group, data = tabdata, family = binomial)
fit %>% tabglm(factor.compression = 5) %>% kable()
Variable | Beta (SE) | OR (95% CI) | P |
---|---|---|---|
Intercept | -2.02 (1.76) | - | 0.25 |
Age | 0.02 (0.02) | 1.02 (0.97, 1.07) | 0.50 |
Male | 0.11 (0.29) | 1.12 (0.63, 1.97) | 0.70 |
Treatment | -0.04 (0.29) | 0.96 (0.54, 1.69) | 0.88 |
GEE for high blood pressure (measured at 3 time points longitudinally) vs. various predictors, with some higher-order terms:
tabdata2 <- reshape(data = tabdata,
varying = c("bp.1", "bp.2", "bp.3", "highbp.1", "highbp.2", "highbp.3"),
timevar = "bp.visit", direction = "long")
tabdata2 <- tabdata2[order(tabdata2$id), ]
fit <- gee(highbp ~ poly(Age, 2, raw = TRUE) + Sex + Race + Race*Sex,
id = id, data = tabdata2, family = "binomial", corstr = "unstructured")
fit %>% tabgee(data = tabdata2) %>% kable()
Variable | Beta (SE) | OR (95% CI) | P |
---|---|---|---|
Intercept | -3.10 (14.84) | - | 0.83 |
Age | 0.06 (0.43) | 1.06 (0.46, 2.45) | 0.89 |
Age squared | -0.00 (0.00) | 1.00 (0.99, 1.01) | 0.88 |
Sex | |||
Female (ref) | - | - | - |
Male | 0.48 (0.29) | 1.61 (0.91, 2.84) | 0.10 |
Race | |||
White (ref) | - | - | - |
Black | 0.04 (0.32) | 1.04 (0.56, 1.95) | 0.90 |
Mexican American | 0.13 (0.38) | 1.14 (0.55, 2.39) | 0.72 |
Other | -0.83 (0.37) | 0.43 (0.21, 0.89) | 0.02 |
Sex by Race | |||
Male, Black | 0.23 (0.42) | 1.26 (0.55, 2.87) | 0.58 |
Male, Mexican American | 0.27 (0.54) | 1.31 (0.46, 3.75) | 0.61 |
Male, Other | 1.11 (0.51) | 3.05 (1.12, 8.25) | 0.03 |
Note that we had to set data = tabdata2
here, because gee
objects don’t store all of the information on factor variables (unlike glm
objects).
Survival model for mortality vs. predictors, again compressing the factor variables, and requesting slightly differnet columns (i.e. no p-values):
library("survival")
fit <- coxph(Surv(time = time, event = delta) ~ Age + Sex + Group, data = tabdata)
fit %>% tabcoxph(factor.compression = 5, columns = c("beta", "hr.ci")) %>% kable()
Variable | Beta | HR (95% CI) |
---|---|---|
Age | 0.03 | 1.03 (1.00, 1.06) |
Male | 0.01 | 1.01 (0.74, 1.39) |
Treatment | -0.05 | 0.95 (0.69, 1.30) |
The functions in tab can also accommodate complex survey data. To illustrate with the included dataset tabsvydata
(which is data from NHANES 2003-2004, except for the made-up variables time
and event
), here’s a Table 1:
library("survey")
design <- svydesign(
data = tabsvydata,
ids = ~sdmvpsu,
strata = ~sdmvstra,
weights = ~wtmec2yr,
nest = TRUE
)
tabmulti.svy(Age + Race + BMI ~ Sex, design = design) %>% kable()
Variable | Female | Male | P |
---|---|---|---|
Age, M (SD) | 37.0 (22.5) | 34.8 (21.7) | <0.001 |
Race, % (SE) | 0.08 | ||
Non-Hispanic White | 69.7 (3.7) | 69.6 (3.8) | |
Non-Hispanic Black | 13.2 (2.0) | 11.9 (1.9) | |
Mexican American | 8.6 (2.1) | 9.8 (2.2) | |
Other | 8.4 (1.0) | 8.8 (1.3) | |
BMI, M (SD) | 26.4 (7.5) | 26.0 (6.4) | 0.11 |
And here’s a linear regression:
fit <- svyglm(BMI ~ Age + Sex + Race, design = design)
fit %>% tabglm(factor.compression = 3) %>% kable()
Variable | Beta (SE) | 95% CI | P |
---|---|---|---|
Intercept | 20.95 (0.34) | (20.27, 21.62) | <0.001 |
Age | 0.14 (0.00) | (0.13, 0.15) | <0.001 |
Female (ref) | - | - | - |
Male | -0.07 (0.23) | (-0.51, 0.37) | 0.76 |
Non-Hispanic White (ref) | - | - | - |
Non-Hispanic Black | 1.91 (0.23) | (1.46, 2.35) | <0.001 |
Mexican American | 1.06 (0.30) | (0.47, 1.66) | 0.006 |
Other | -1.09 (0.33) | (-1.73, -0.45) | 0.007 |
All of the functions in tab have an argument called print.html
which can be used to export tables to word processors. Setting print.html = TRUE
will result in a HTML table being output to your current working directory. You can open the table (e.g. in Chrome) and copy/paste into your report.
I used knitr’s kable
function for the examples here, but other approaches should also work (e.g. xtable’s xtable
or pandoc’s pandoc.table
).
Lumley, Thomas. 2019. Survey: Analysis of Complex Survey Samples. https://CRAN.R-project.org/package=survey.
Lumley, Thomas, and others. 2004. “Analysis of Complex Survey Samples.” Journal of Statistical Software 9 (1): 1–19.
R by Thomas Lumley, Vincent J Carey. Ported to, and Brian Ripley. Note that maintainers are not available to give advice on using a package they did not author. 2015. Gee: Generalized Estimation Equation Solver. https://CRAN.R-project.org/package=gee.
Terry M. Therneau, and Patricia M. Grambsch. 2000. Modeling Survival Data: Extending the Cox Model. New York: Springer.
Therneau, Terry M. 2015. A Package for Survival Analysis in S. https://CRAN.R-project.org/package=survival.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.name/knitr/.
———. 2018. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.name/knitr/.