Clinical Research Utilities Functions (CRUF) : Useful functions for clinical research data analysis.
Released version is available on CRAN with:
Development version is available on GitHub with:
TABKRIS2 aims to provide a ready to use descriptive table with easy customization. The main principles is that for a given dataframe, it computes descriptive statistics for each variable in the data. Only a dataframe with no other arguments is required.
Then, user can add several options to customize the aspect of the results such as changing the descriptive statistics method, adding a stratifying variable, performing tests, changing the default methods and tests.
The result is a dataframe ready to export into a Markdown or LaTeX document.
It detects the type of each variable from the input dataframe. Variable types are:
The default presentation for quantitative variables is “mean (SD)” and for qualitative variable is “n (percent %)”. Binomial variables are displayed in one line, unordered and ordered categorical variables are displayed with one line for every level.
Variable | Modality | N = 32 | Statistics |
mpg | 19.2 [15.43;22.8] | ||
cyl | |||
4 | 11 (34.38%) | ||
6 | 7 (21.88%) | ||
8 | 14 (43.75%) | ||
disp | 196.3 [120.83;326] | ||
hp | 123 [96.5;180] | ||
drat | 3.7 [3.08;3.92] | ||
wt | 3.33 [2.58;3.61] | ||
qsec | 17.71 [16.89;18.9] | ||
vs | 1 | 14 (43.75%) | |
am | 1 | 13 (40.62%) | |
gear | |||
3 | 15 (46.88%) | ||
4 | 12 (37.5%) | ||
5 | 5 (15.62%) | ||
carb | |||
1 | 7 (21.88%) | ||
2 | 10 (31.25%) | ||
3 | 3 (9.38%) | ||
4 | 10 (31.25%) | ||
6 | 1 (3.12%) | ||
8 | 1 (3.12%) |
Using the argument auto_detect = TRUE will test if each numeric variable can be coerced to a factor variable. It tests the potential levels of each variable and coerce to a factor type if the number of levels is moderate (i.e < 10). For variable with two levels, method used will be “bino”, else it will be “cate”. It is possible to set the cut-off of the levels of factor to coerce a variable with argument lev_co (for level cut-off), default is 10
# In mtcars, "cyl", "vs", "am", "gear" and "carb" are encoded as numeric but they are factors in reality.
# tabkris_2 changes each variable and display a message for each transformation.
desctable <- tabkris_2(mtcars, auto_detect = T, lev_co = 8)
Variable | Modality | N = 32 | Statistics |
mpg | 19.2 [15.43;22.8] | ||
cyl | |||
4 | 11 (34.38%) | ||
6 | 7 (21.88%) | ||
8 | 14 (43.75%) | ||
disp | 196.3 [120.83;326] | ||
hp | 123 [96.5;180] | ||
drat | 3.7 [3.08;3.92] | ||
wt | 3.33 [2.58;3.61] | ||
qsec | 17.71 [16.89;18.9] | ||
vs | 1 | 14 (43.75%) | |
am | 1 | 13 (40.62%) | |
gear | |||
3 | 15 (46.88%) | ||
4 | 12 (37.5%) | ||
5 | 5 (15.62%) | ||
carb | |||
1 | 7 (21.88%) | ||
2 | 10 (31.25%) | ||
3 | 3 (9.38%) | ||
4 | 10 (31.25%) | ||
6 | 1 (3.12%) | ||
8 | 1 (3.12%) |
Using the argument return_table = FALSE will not return a table but a list including all parameters used for the computation of the table. The user can modify only the argument he wants without needing to specify for every variable an unchanged parameter. To compute the table, pass the list to the function once more with return_table = TRUE.
It is possible to create a desc_prep object with every default parameter, change a parameter, compute a table and re-use the desc_prep to rechange another parameter for another table.
# desc_prep <- tabkris_2(mtcars, return_table = F, auto_detect = T)
# # Change the method for variable "vs" from a binomial to a categorical method
# desc_prep$method["vs"] <- "cate"
# desctable <- tabkris_2(desc_prep)
# # Variable of interest set to "am", also using the previous changed arguments
# desc_prep$varint <- "am"
# desctable_2 <- tabkris_2(desc_prep)
Several options are useful to render the results in another shape. It includes changing the names of each variable, changing the default presentation for qualitative and quantitative variables, displaying the NA number, changing the default number of digits and changing the language of the first row of the result table.
Default methods use for descriptive statistics is detected depending the variable type. It is possible to change the behavior in two different ways :
default_method is useful for changing every variable type method in one value, method is useful to fine-tune every variable.
default_method[x] | cont | bino | cate | ordo |
x = 1 (cont) | X | |||
x = 2 (bino) | X | X | X | |
x = 3 (cate) | X | X | ||
x = 4 (ordo) | X | X |
desc_prep <- tabkris_2(mtcars, return_table = F)
# Change the method for all binomial variable to categorical
desc_prep$default_method[2] <- "cate"
desctable <- tabkris_2(desc_prep)
# Changing only the method for "vs" to categorical
desc_prep$method["vs"] <- "cate"
desctable_2 <- tabkris_2(desc_prep)
The user provides a vector of length of the number of variables with customs labels in the names argument.
With explicit_na, user can choose to display NA for each variable or not. NA are not accounted in the percentages. Use “addNA(x)” to a factor variable to account for NA in descriptive statistics.
# Changing the names
lab <- c("Miles/US gallon", "Number of cylinders", "Displacement", "Horsepower", "Rear axle ratio", "Weight", "1/4 mile time", "Engine", "Transmission", "N Forward gears", "N carburetors")
desctable <- tabkris_2(mtcars, names = lab,
pres_quant = c("mean", "range"),
pres_quali = c("n", "total", "per"),
explicit_na = T,
digits = 1,
lang = "fr",
auto_detect = T)
Variable | Modalité | N = 32 | Statistiques |
Miles/US gallon | 20.1 (6) {10.4;33.9} | ||
NA | 0 | ||
Number of cylinders | |||
4 | 11/32 (34.4%) | ||
6 | 7/32 (21.9%) | ||
8 | 14/32 (43.8%) | ||
NA | 0 | ||
Displacement | 230.7 (123.9) {71.1;472} | ||
NA | 0 | ||
Horsepower | 146.7 (68.6) {52;335} | ||
NA | 0 | ||
Rear axle ratio | 3.6 (0.5) {2.8;4.9} | ||
NA | 0 | ||
Weight | 3.2 (1) {1.5;5.4} | ||
NA | 0 | ||
1/4 mile time | 17.8 (1.8) {14.5;22.9} | ||
NA | 0 | ||
Engine | 1 | 14/32 (43.8%) | |
NA | 0 | ||
Transmission | 1 | 13/32 (40.6%) | |
NA | 0 | ||
N Forward gears | |||
3 | 15/32 (46.9%) | ||
4 | 12/32 (37.5%) | ||
5 | 5/32 (15.6%) | ||
NA | 0 | ||
N carburetors | |||
1 | 7/32 (21.9%) | ||
2 | 10/32 (31.2%) | ||
3 | 3/32 (9.4%) | ||
4 | 10/32 (31.2%) | ||
6 | 1/32 (3.1%) | ||
8 | 1/32 (3.1%) | ||
NA | 0 |
With varint argument, user can specify a variable in the data to stratify the results on. The variable of interest will be removed from descriptive table. varint must be a factor with at least two levels.
If a variable of interest is specified, statistical tests with the hypothesis of a difference in levels of “varint” can be computed. Nature of test made depends on the “varint” and type of other variable. Only p-value of test is displayed with a type I error set to 0.05 and bilateral hypothesis.
It is possible to change the behavior of tests in two different ways :
default_test is useful for changing every variable type test in one value, test is useful to fine-tune every variable.
Implemented tests include t.test (with “stud”), wilcox.test (with “wilcox”), kruskal.test (with “kruskal”), chisq.test (with “chisq”), fisher.test (with “fish”). See table below to understand which tests are implemented and when it is possible to use them.
Note : If the number of levels of “varint” is greater than 2, “default_test” will be automatically set to "kruskal" for continuous and ordered variables.
test | cont | bino | cate | ordo |
stud | X | |||
wilcox | X | |||
kruskal | ||||
chisq | X | X | X | |
fisher | X | X | X |
test | cont | bino | cate | ordo |
stud | ||||
wilcox | ||||
kruskal | X | |||
chisq | X | X | ||
fisher | X |