tidytable
?tidyverse
-like syntax with data.table
speedrlang
compatibility - See heredtplyr
is missing, including many tidyr
functionsNote: tidytable
functions do not use data.table
’s modify-by-reference, and instead use the copy-on-modify principles followed by the tidyverse
and base R.
Install the released version from CRAN with:
Or install the development version from GitHub with:
dt()
: Pipeable data.table
syntax. See hereget_dummies.()
%notin%
arrange.()
filter.()
mutate.()
& mutate_across.()
select.()
summarize.()
& summarize_across.()
bind_cols.()
& bind_rows.()
case.()
: Similar to dplyr::case_when()
. See ?case.
for syntaxcount.()
distinct.()
ifelse.()
left_join.()
, inner_join.()
, right_join.()
, full_join.()
, & anti_join.()
lags.()
& leads.()
pull.()
relocate.()
rename.()
& rename_with.()
row_number.()
slice.()
: _head.()
/_tail.()
/_max.()
/_min.()
transmute.()
drop_na.()
complete.()
crossing.()
expand.()
expand_grid.()
fill.()
group_split.()
nest_by.()
& unnest.()
pivot_longer.()
& pivot_wider.()
replace_na.()
separate.()
separate_rows.()
uncount.()
map.()
, map2.()
, map_*.()
variants, & map2_*.()
variantstidytable
uses verb.()
syntax to replicate tidyverse
functions:
library(tidytable)
test_df <- data.table(x = c(1,2,3), y = c(4,5,6), z = c("a","a","b"))
test_df %>%
select.(x, y, z) %>%
filter.(x < 4, y > 1) %>%
arrange.(x, y) %>%
mutate.(double_x = x * 2,
double_y = y * 2)
#> x y z double_x double_y
#> <dbl> <dbl> <chr> <dbl> <dbl>
#> 1: 1 4 a 2 8
#> 2: 2 5 a 4 10
#> 3: 3 6 b 6 12
Group by calls are done from inside any function that has group by functionality (such as summarize.()
& mutate.()
)
.by = z
.by = c(y, z)
tidyselect
can also be used, including using predicates:
.by = where(is.character)
.by = c(where(is.character), where(is.factor))
.by = c(where(is.character), y)
test_df %>%
summarize.(avg_x = mean(x),
count = .N,
.by = z)
#> z avg_x count
#> <chr> <dbl> <int>
#> 1: a 1.5 2
#> 2: b 3.0 1
Note: For those new to data.table
, the .N
helper is a way to get the number of rows by group, much like n()
from dplyr
. tidytable
contains a helper function n.()
, but using .N
is recommended due to better performance.
tidyselect
supporttidytable
allows you to select/drop columns just like you would in the tidyverse.
Normal selection can be mixed with:
where(is.numeric)
, where(is.character)
, etc.everything()
, starts_with()
, ends_with()
, contains()
, any_of()
, etc.test_df <- data.table(a = c(1,2,3),
b = c(4,5,6),
c = c("a","a","b"),
d = c("a","b","c"))
test_df %>%
select.(where(is.numeric), d)
#> a b d
#> <dbl> <dbl> <chr>
#> 1: 1 4 a
#> 2: 2 5 b
#> 3: 3 6 c
You can also use this format to drop columns:
These same ideas can be used whenever selecting columns in tidytable
functions - for example when using count.()
, drop_na.()
, mutate_across.()
, pivot_longer.()
, etc.
rlang
compatibilityrlang
can be used to write custom functions with tidytable
functions.
mutate.()
df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))
# Using enquo() with !!
add_one <- function(data, add_col) {
add_col <- enquo(add_col)
data %>%
mutate.(new_col = !!add_col + 1)
}
# Using the {{ }} shortcut
add_one <- function(data, add_col) {
data %>%
mutate.(new_col = {{ add_col }} + 1)
}
df %>%
add_one(x)
#> x y z new_col
#> <dbl> <dbl> <chr> <dbl>
#> 1: 1 1 a 2
#> 2: 1 1 a 2
#> 3: 1 1 b 2
summarize.()
df <- data.table(x = 1:10, y = c(rep("a", 6), rep("b", 4)), z = c(rep("a", 6), rep("b", 4)))
find_mean <- function(data, grouping_cols, col) {
data %>%
summarize.(avg = mean({{ col }}),
.by = {{ grouping_cols }})
}
df %>%
find_mean(grouping_cols = c(y, z), col = x)
#> y z avg
#> <chr> <chr> <dbl>
#> 1: a a 3.5
#> 2: b b 8.5
All tidytable
functions automatically convert data.frame
and tibble
inputs to a data.table
:
library(dplyr)
library(data.table)
test_df <- tibble(x = c(1,2,3), y = c(4,5,6), z = c("a","a","b"))
test_df %>%
mutate.(double_x = x * 2) %>%
is.data.table()
#> [1] TRUE
dt()
helperThe dt()
function makes regular data.table
syntax pipeable, so you can easily mix tidytable
syntax with data.table
syntax:
df <- data.table(x = c(1,2,3), y = c(4,5,6), z = c("a", "a", "b"))
df %>%
dt(, list(x, y, z)) %>%
dt(x < 4 & y > 1) %>%
dt(order(x, y)) %>%
dt(, ':='(double_x = x * 2,
double_y = y * 2)) %>%
dt(, list(avg_x = mean(x)), by = z)
#> z avg_x
#> <chr> <dbl>
#> 1: a 1.5
#> 2: b 3.0
For those interested in performance, speed comparisons can be found here.