santoku is a versatile cutting tool for R. It provides chop()
, a replacement for base::cut()
.
Here are some advantages of santoku:
By default, chop()
always covers the whole range of the data, so you won’t get unexpected NA
values.
chop()
can handle single values as well as intervals. For example, chop(x, breaks = c(1, 2, 2, 3))
will create a separate factor level for values exactly equal to 2.
Flexible labelling, including easy ways to label intervals by numerals or letters.
Convenience functions for creating quantile intervals, evenly-spaced intervals or equal-sized groups.
Convenience functions for quickly tabulating chopped data.
These advantages make santoku especially useful for exploratory analysis, where you may not know the range of your data in advance.
library(santoku)
# chop returns a factor:
chop(1:10, c(3, 5, 7))
#> [1] [1, 3) [1, 3) [3, 5) [3, 5) [5, 7) [5, 7) [7, 10] [7, 10] [7, 10]
#> [10] [7, 10]
#> Levels: [1, 3) [3, 5) [5, 7) [7, 10]
# exactly() creates its own category
# `labels` for integer data:
chop(1:10, c(3, exactly(5), 7), labels = lbl_discrete())
#> [1] 1 - 2 1 - 2 3 - 4 3 - 4 5 6 7 - 10 7 - 10 7 - 10 7 - 10
#> Levels: 1 - 2 3 - 4 5 6 7 - 10
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
# chop dates by calendar month, then tabulate:
tab_width(Sys.Date() + 1:90, months(1), labels = lbl_discrete(fmt = "%d %b"))
#> x
#> 14 Jun - 13 Jul 14 Jul - 13 Aug 14 Aug - 13 Sep
#> 30 31 29
For more information, see the vignette.