Operators
John Mount, Nina Zumel
2020-02-01
cdata
recommends an operator idiom to apply data transforms.
The idea is simple, yet powerful.
First let’s start with some data.
d <- wrapr::build_frame(
"model_id" , "measure", "value" |
1 , "AUC" , 0.7 |
1 , "R2" , 0.4 |
2 , "AUC" , 0.8 |
2 , "R2" , 0.5 )
knitr::kable(d)
1 |
AUC |
0.7 |
1 |
R2 |
0.4 |
2 |
AUC |
0.8 |
2 |
R2 |
0.5 |
In the above data we have two measurements each for two individuals (individuals identified by the “model_id
” column). Using cdata
’s rowrecs_to_blocks_spec()
method we can capture a description of this record structure and transformation details.
library("cdata")
transform <- rowrecs_to_blocks_spec(
wrapr::qchar_frame(
"measure", "value" |
"AUC" , AUC |
"R2" , R2 ),
recordKeys = "model_id")
print(transform)
#> {
#> row_record <- wrapr::qchar_frame(
#> "model_id" , "AUC", "R2" |
#> . , AUC , R2 )
#> row_keys <- c('model_id')
#>
#> # becomes
#>
#> block_record <- wrapr::qchar_frame(
#> "model_id" , "measure", "value" |
#> . , "AUC" , AUC |
#> . , "R2" , R2 )
#> block_keys <- c('model_id', 'measure')
#>
#> # args: c(checkNames = TRUE, checkKeys = FALSE, strict = FALSE, allow_rqdatatable = TRUE)
#> }
Once we have this specification we can transform the data using operator notation.
We can collect the record blocks into rows by a “factor-out” (or aggregation/projection) step.
1 |
AUC |
0.7 |
1 |
R2 |
0.4 |
2 |
AUC |
0.8 |
2 |
R2 |
0.5 |
We can expand record rows into blocks by a “multiplication” (or join) step.
1 |
AUC |
0.7 |
2 |
AUC |
0.8 |
1 |
R2 |
0.4 |
2 |
R2 |
0.5 |
1 |
AUC |
0.7 |
2 |
AUC |
0.8 |
1 |
R2 |
0.4 |
2 |
R2 |
0.5 |
(%//%
and %**%
being two operators introduced by the cdata
package.)
And the two specialized operators have an inverse/adjoint relation.
1 |
AUC |
0.7 |
1 |
R2 |
0.4 |
2 |
AUC |
0.8 |
2 |
R2 |
0.5 |
1 |
AUC |
0.7 |
2 |
AUC |
0.8 |
1 |
R2 |
0.4 |
2 |
R2 |
0.5 |
We can also pipe into the spec (and into its adjoint) using the wrapr
dot pipe operator
.
1 |
AUC |
0.7 |
2 |
AUC |
0.8 |
1 |
R2 |
0.4 |
2 |
R2 |
0.5 |
And, of course, the exact same functionality for database tables.