There are number of areas where Crunch needs to represent an object as belonging to one of many categories. The simplest and most common example of this is the categories of a categorical variable. For a categorical variable, the values of the variable can be one of a limited set of categories and those categories are specified in the Crunch API as metadata about the variable. These categoricals are similar to R’s factor
s but are richer because Crunch categoricals can have any number of missing values (compared to just NA
for factor
s), as well as a numeric representation that is separate from the category ids (which is useful for things like income bins, where you might put the middle of the bin as the value).
Moving beyond just categorical variables, we have a need to be able to represent a number of different properties, transformations, etc. in a category-like way. One concrete example is used heavily in order to add subtotals and headings to representations of categorical variables. In order to do this, we have two families of S4 classes: AbstractCategory
and AbstractCategories
Although subtotals and headings was the initial motivation for the new classes, they will allow for other types of representations and manipulations in the future.
The core classes that all other classes inherit from are AbstractCategory
and AbstractCategories
. The first, AbstractCategory
, is designed to represent a single category, which might have a number of properties about it (what those are will be explained in more detail below). The second, AbstractCategories
is designed to hold more than one AbstractCategory
together to form a coherent group. As a simple, example: an AbstractCategories
for binned income could have 5 AbstractCategory
s: <$25,000, $25,000-$49,999, $50,000-$99,999, $100,000-$199,999, >$200,000. This could be represented in R as:
income <- AbstractCategories(AbstractCategory(name = "<$25,000"),
AbstractCategory(name = "$25,000-$49,999"),
AbstractCategory(name = "$50,000-$99,999"),
AbstractCategory(name = "$100,000-$199,999"),
AbstractCategory(name = ">$200,000"))
An alternate (and less typing) way to instantiate this same AbstractCategories
is to send lists, and the constructor takes care of calling the AbstractCategory
class on each (as below). Each of the child-classes of AbstractCategories
(described in the sections below) have their own mapping of plural container to singular entity constructor in the same way, so passing Categories
a list will result in a Categories
object full of Category
objects.
income <- AbstractCategories(list(name = "<$25,000"),
list(name = "$25,000-$49,999"),
list(name = "$50,000-$99,999"),
list(name = "$100,000-$199,999"),
list(name = ">$200,000"))
Finally, there’s a data
argument, if you already have a list of AbstractCategory
s (or simply named lists!) you want to pass in (the same thing could also be accomplished with do.call
):
income_list <- list(list(name = "<$25,000"),
list(name = "$25,000-$49,999"),
list(name = "$50,000-$99,999"),
list(name = "$100,000-$199,999"),
list(name = ">$200,000"))
income <- AbstractCategories(data=income_list)
Any methods that are defined for the abstract classes will function on the subclasses as well. Child classes might have special over-ride methods defined for them, but for the most part, if a method can be used on AbstractCategories
or AbstractCategory
it can be used on the child classes as well.
AbstractCategories
inherits from list
and AbstractCategory
inherits from namedList
so many of the same methods will be work with both of them. This includes using [
, [[
, [<-
, and [[<-
to get and set subsets of AbstractCategories
and $
, and [[
to get the properties in an AbstractCategory
.
lapply
has also been defined for AbstractCategories
for easily iterating over all members. modifyCats
also allows for modifying one AbstractCategories
object by updating with new information from a second AbstractCategories
object in the same way that modifyList
works, but crucially it does not recurse into the AbstractCategory
objects themselves.
Finally, there are a few custom methods that return the values of the properties as either a vector of that property for each member (when using the plural versions against AbstractCategories
) or a vector (typically of length one) for a single member (when using the singular versions against AbstractCategory
).
names
returns the names associated with each AbstractCategory
in an AbstractCategories
object. And name
returns the names associated with an AbstractCategory
object. ids
and id
patterns the exact same way.
Categories from a categorical variable are represented by the Categories
and Category
classes. They inherit directly from AbstractCategories
and Category
respectively. For these, each Category
must have a name
and an id
, they optionally can have a numeric_value
, missing
, and selected
property.
values
and value
return the numeric_value
s property from Categories
or a single Category
respectively.is.na
and is.na
returns the missing
property from Categories
or a single Category
respectively.is.selected
and is.selected
returns the selected
property from Categories
or a single Category
respectively.Insertions allow users to insert new categories into a variable or a CrunchCube for display purposes. This is useful when the user would like to show things like aggregates (e.g. subtotals) without manipulating the underlying data (or creating a new variable). Insertions are defined as part of the Crunch API (see the Transforms section below for an explanation about where Insertions live). The Insertion
s class is designed to mirror the Crunch API for insertions as closely as possible. Insertions
and Insertion
inherit directly from AbstractCategories
and Category
respectively.
Insertion
s must have a name
and an anchor
. The name
is just like Category
names, and is used as the label to display. The anchor
is the id of the category after which the insertion should be placed.
Since insertions can represent a number of different aggregations, they also can have function
and args
properties. The function
property is a character describing the aggregation to use (e.g. "subtotal"
) and the args
property is a vector of the category id
s to use as operands for the function
.
The Insertion
class has two child classes: Subtotal
and Heading
. The Insertions
class can contain anything that inherits from Insertion
. Therefor an Insertions
object might include Insertion
s, Subtotal
s, and Heading
s.
anchors
and anchor
return the anchor property from Insertions
or a single Insertion
respectively.funcs
and func
return the function property from Insertions
or a single Insertion
respectively.arguments
returns the args
property from a single Insertion
.Subtotals and headings are both types of insertions. Because of this Subtotal
and Heading
classes inherit from Insertion
rather than directly from AbstractCategory
. These classes are designed to hold known types of Insertions to make it easier to work with Insertions (for example: testing which insertion to style in what way when using prettyPrint
functions). Additionally, these classes have slightly more user-friendly names (e.g. after
instead of anchor
), and they accept either id
s or name
s to refer to specific Category
s.
A Subtotal
must have name
, after
, and categories
properties. name
is the same as other abstract categories. after
is similar to anchor
but can be either a category id
or a category name
after which the subtotal should be placed. categories
is either the category id
s or a category name
s to subtotal.
The same as Insertion
, however some have customizations: * func
always returns the string "subtotal"
(because by definition a Subtotal
object is an Insertion
with function="subtotal"
) * anchor
and arguments
both have an option var_categories
which is required if the Subtotal
is using category names instead of ids in the after
or categories
properties. Supplying the categories is required in order to translate from category name
s to id
s which are required to be a well-formed Insertion
.
A Heading
must have name
and after
properties. Both of which have the same interpretation as Subtotal
above.
The same as Subtotal
for anchor
. func
and arguments
return NA
As a concrete example, let’s take the following categories:
feeling_cats <- Categories(
list(name = "Very Happy", id = 1),
list(name = "Somewhat Happy", id = 2),
list(name = "Neither Happy nor Unhappy", id = 3),
list(name = "Somewhat Unhappy", id = 4),
list(name = "Very Unhappy", id = 5)
)
feeling_cats
## id name value missing
## 1 1 Very Happy NA FALSE
## 2 2 Somewhat Happy NA FALSE
## 3 3 Neither Happy nor Unhappy NA FALSE
## 4 4 Somewhat Unhappy NA FALSE
## 5 5 Very Unhappy NA FALSE
And make some subtotals and headings to use as insertions:
feeling_subtotals <- Insertions(
Heading(name = "How I feel about cheese", position = "top"),
Subtotal(name = "Generally Happy", after = "Somewhat Happy",
categories = c("Very Happy", "Somewhat Happy")),
Subtotal(name = "Generally Unhappy", after = 5,
categories = c(4, 5))
)
Notice that the “Generally Happy” subtotal is made specifying category name
s for after
and categories
:
## [1] "Somewhat Happy"
## [1] "Very Happy" "Somewhat Happy"
Where as the “Generally Unhappy” subtotal uses id
s:
## [1] 5
## [1] 4 5
Since the Crunch API does not have a distinction between Subtotal
s Heading
s, and other Insertion
s, we sometimes need to convert from Subtotal
s or Heading
s to Insertion
s. This is accomplished with the method makeInsertion()
. This method takes a Subtotal
or Heading
and returns a valid Insertion
. If the Subtotal
or Heading
has category name
references instead of id
s, then you must include a Categories
object as the var_categories
argument. In general, this is only needed before sending a heterogeneous set of Insertions
to the Crunch API.
Using the examples we used before, we can see how this works:
feeling_insertions <- Insertions(data = lapply(feeling_subtotals, makeInsertion, var_categories = feeling_cats))
Now, all of the Subtotal
s and Heading
from feeling_subtotals
are proper Insertion
s:
## [1] "Insertion" "Insertion" "Insertion"
This means that the after
property has been translated into anchor
, and the function
and args
properties have been filled in appropriately:
## [1] 5
## [1] "subtotal"
## [1] 4 5
Because Insertion
s are required to use category id
s only, the new all-Insertion
s feeling_insertions
has translated the “Generally Happy” subtotal’s category name
s to id
s:
## [1] 2
## [1] 1 2
Since the Crunch API does not have a distinction between Subtotal
s Heading
s, and other Insertion
s when we get data about Insertion
s from the API, we need to change the classes for the Insertion
s that the crunch
package knows about. To do this, we can use either subtypeInsertions
to change the types of all of the members of an Insertions
object, or subtypeInsertion
to change the type of a single Insertion
object.
These functions work by inspecting the Insertion
and determining if it can be identified as one of the known child classes of Insertion
(namely: Subtotal
or Heading
).
Using the same example above, we can convert back from all Insertion
s to the subtypes:
feeling_subtotals_again <- subtypeInsertions(feeling_insertions)
sapply(feeling_subtotals_again, class)
## [1] "Heading" "Subtotal" "Subtotal"
There are two sets of inheritance: one for containers and one for members: Classes inherit from those immediately to their left
top-level classes | 1st children | 2nd children | |
---|---|---|---|
containers | AnstractCategories |
Categories |
|
AnstractCategories |
Insertions |
||
members | AbstractCategory |
Category |
|
AbstractCategory |
Insertion |
Subtotal |
|
AbstractCategory |
Insertion |
Heading |
The Transforms
class and set of functions is not an abstract category at all, but rather it mirrors the Crunch API’s set of transformations that are allowed on a variable or CrunchCube. One of the possible transformations are insertions (which is where Insertions
are stored). Currently the crunch
package doesn’t support other transformations.