Summary Tables with tab

Functions

The purpose of tab is to make it easier to create tables for papers, including Table 1’s showing characteristics of the sample and summary tables for fitted regression models. Currently, the following functions are included:

tabmeans compares means in two or more groups.
tabmedians compares medians in two or more groups.
tabfreq compares frequencies in two or more groups.
tabmulti compares multiple variables in two or more groups.
tabmeans.svy, tabmedians.svy, tabfreq.svy, and tabmulti.svy serve the same purposes as the above functions, but for complex survey data.
tabglm summarizes generalized linear models (GLM’s) fit via glm or survey::svyglm.
tabgee summarizes generalized estimating equation models (GEE’s) fit via gee::gee.
tabcoxph summarizes Cox Proportional Hazards models fit via survival::coxph or survey::svycoxph.

Table 1’s

You can use tabmulti to compare characteristics across levels of a factor variable, e.g. here comparing age, sex, and race by treatment group in the toy dataset tabdata.

tabmulti(Age + Sex + Race ~ Group, data = tabdata) %>% kable()

Variable	Control	Treatment	P
Age, M (SD)	70.5 (5.3)	69.5 (5.9)	0.15
Sex, n (%)			<0.001
Female	93 (68.4)	62 (38.5)
Male	43 (31.6)	99 (61.5)
Race, n (%)			0.29
White	46 (34.1)	65 (39.6)
Black	36 (26.7)	52 (31.7)
Mexican American	21 (15.6)	19 (11.6)
Other	32 (23.7)	28 (17.1)

To illustrate some options, we can request Age and Race to print as Age (years) and Race/ethnicity, compare medians rather than means for age, and include the sample sizes in the column headings:

tabmulti(Age + Sex + Race ~ Group, data = tabdata, 
         yvarlabels = list(Age = "Age (years)", Race = "Race/ethnicity"), 
         ymeasures = c("median", "freq", "freq"), 
         listwise.deletion = TRUE, 
         n.headings = TRUE) %>% kable()

Variable	Control (n = 134)	Treatment (n = 158)	P
Age (years), Median (IQR)	70.0 (9.8)	69.0 (11.0)	0.19
Sex, n (%)			<0.001
Female	92 (68.7)	60 (38.0)
Male	42 (31.3)	98 (62.0)
Race/ethnicity, n (%)			0.26
White	46 (34.3)	64 (40.5)
Black	36 (26.9)	50 (31.6)
Mexican American	21 (15.7)	17 (10.8)
Other	31 (23.1)	27 (17.1)

Regression tables

GLM’s

Logistic regression for 1-year mortality vs. age, sex, and treatment, with the binary factor variables displayed in a “compressed” format:

fit <- glm(death_1yr ~ Age + Sex + Group, data = tabdata, family = binomial)
fit %>% tabglm(factor.compression = 5) %>% kable()

Variable	Beta (SE)	OR (95% CI)	P
Intercept	-2.02 (1.76)	-	0.25
Age	0.02 (0.02)	1.02 (0.97, 1.07)	0.50
Male	0.11 (0.29)	1.12 (0.63, 1.97)	0.70
Treatment	-0.04 (0.29)	0.96 (0.54, 1.69)	0.88

GEE’s

GEE for high blood pressure (measured at 3 time points longitudinally) vs. various predictors, with some higher-order terms:

tabdata2 <- reshape(data = tabdata,
                    varying = c("bp.1", "bp.2", "bp.3", "highbp.1", "highbp.2", "highbp.3"),
                    timevar = "bp.visit", direction = "long")
tabdata2 <- tabdata2[order(tabdata2$id), ]
fit <- gee(highbp ~ poly(Age, 2, raw = TRUE) + Sex + Race + Race*Sex,
           id = id, data = tabdata2, family = "binomial", corstr = "unstructured")
fit %>% tabgee(data = tabdata2) %>% kable()

Variable	Beta (SE)	OR (95% CI)	P
Intercept	-3.10 (14.84)	-	0.83
Age	0.06 (0.43)	1.06 (0.46, 2.45)	0.89
Age squared	-0.00 (0.00)	1.00 (0.99, 1.01)	0.88
Sex
Female (ref)	-	-	-
Male	0.48 (0.29)	1.61 (0.91, 2.84)	0.10
Race
White (ref)	-	-	-
Black	0.04 (0.32)	1.04 (0.56, 1.95)	0.90
Mexican American	0.13 (0.38)	1.14 (0.55, 2.39)	0.72
Other	-0.83 (0.37)	0.43 (0.21, 0.89)	0.02
Sex by Race
Male, Black	0.23 (0.42)	1.26 (0.55, 2.87)	0.58
Male, Mexican American	0.27 (0.54)	1.31 (0.46, 3.75)	0.61
Male, Other	1.11 (0.51)	3.05 (1.12, 8.25)	0.03

Note that we had to set data = tabdata2 here, because gee objects don’t store all of the information on factor variables (unlike glm objects).

Cox proportional hazards

Survival model for mortality vs. predictors, again compressing the factor variables, and requesting slightly differnet columns (i.e. no p-values):

library("survival")
fit <- coxph(Surv(time = time, event = delta) ~ Age + Sex + Group, data = tabdata)
fit %>% tabcoxph(factor.compression = 5, columns = c("beta", "hr.ci")) %>% kable()

Variable	Beta	HR (95% CI)
Age	0.03	1.03 (1.00, 1.06)
Male	0.01	1.01 (0.74, 1.39)
Treatment	-0.05	0.95 (0.69, 1.30)

Complex survey data

The functions in tab can also accommodate complex survey data. To illustrate with the included dataset tabsvydata (which is data from NHANES 2003-2004, except for the made-up variables time and event), here’s a Table 1:

library("survey")
design <- svydesign(
  data = tabsvydata,
  ids = ~sdmvpsu,
  strata = ~sdmvstra,
  weights = ~wtmec2yr,
  nest = TRUE
)
tabmulti.svy(Age + Race + BMI ~ Sex, design = design) %>% kable()

Variable	Female	Male	P
Age, M (SD)	37.0 (22.5)	34.8 (21.7)	<0.001
Race, % (SE)			0.08
Non-Hispanic White	69.7 (3.7)	69.6 (3.8)
Non-Hispanic Black	13.2 (2.0)	11.9 (1.9)
Mexican American	8.6 (2.1)	9.8 (2.2)
Other	8.4 (1.0)	8.8 (1.3)
BMI, M (SD)	26.4 (7.5)	26.0 (6.4)	0.11

And here’s a linear regression:

fit <- svyglm(BMI ~ Age + Sex + Race, design = design)
fit %>% tabglm(factor.compression = 3) %>% kable()

Variable	Beta (SE)	95% CI	P
Intercept	20.95 (0.34)	(20.27, 21.62)	<0.001
Age	0.14 (0.00)	(0.13, 0.15)	<0.001
Female (ref)	-	-	-
Male	-0.07 (0.23)	(-0.51, 0.37)	0.76
Non-Hispanic White (ref)	-	-	-
Non-Hispanic Black	1.91 (0.23)	(1.46, 2.35)	<0.001
Mexican American	1.06 (0.30)	(0.47, 1.66)	0.006
Other	-1.09 (0.33)	(-1.73, -0.45)	0.007

Exporting tables, e.g. to Word

All of the functions in tab have an argument called print.html which can be used to export tables to word processors. Setting print.html = TRUE will result in a HTML table being output to your current working directory. You can open the table (e.g. in Chrome) and copy/paste into your report.

Options for printing in R

I used knitr’s kable function for the examples here, but other approaches should also work (e.g. xtable’s xtable or pandoc’s pandoc.table).

References

Lumley, Thomas. 2019. Survey: Analysis of Complex Survey Samples. https://CRAN.R-project.org/package=survey.

Lumley, Thomas, and others. 2004. “Analysis of Complex Survey Samples.” Journal of Statistical Software 9 (1): 1–19.

R by Thomas Lumley, Vincent J Carey. Ported to, and Brian Ripley. Note that maintainers are not available to give advice on using a package they did not author. 2015. Gee: Generalized Estimation Equation Solver. https://CRAN.R-project.org/package=gee.

Terry M. Therneau, and Patricia M. Grambsch. 2000. Modeling Survival Data: Extending the Cox Model. New York: Springer.

Therneau, Terry M. 2015. A Package for Survival Analysis in S. https://CRAN.R-project.org/package=survival.

Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.

———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.name/knitr/.

———. 2018. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.name/knitr/.

Summary Tables with ‘tab’

Dane Van Domelen
vandomed@gmail.com

2019-06-17

Functions

Table 1’s

Regression tables

GLM’s

GEE’s

Cox proportional hazards

Complex survey data

Exporting tables, e.g. to Word

Options for printing in R

References

Summary Tables with ‘tab’

Dane Van Domelen vandomed@gmail.com

2019-06-17

Functions

Table 1’s

Regression tables

GLM’s

GEE’s

Cox proportional hazards

Complex survey data

Exporting tables, e.g. to Word

Options for printing in R

References

Dane Van Domelen
vandomed@gmail.com