This vignette is intended to showcase the usage of the gghalves
extension by going through the individual _half_
geom
s to explain details of usage and function arguments.
The general idea of gghalves
stems from this StackOverflow question on how to plot a hybrid boxplot. This led to me developing the ggpol extension for ggplot2
. However, the fact that ggpol
has become a sort of aggregation for all kinds of geom
s over time, and seeing that many things can be cut in half, has ultimately led to this library.
The idea is that many geom
s that aggregate data, such as geom_boxplot
, geom_violin
and geom_dotplot
are (near) symmetric. Given that the space to display information is limited, we can make better use of it by cutting the geom
s in half and displaying additional geom
s that e.g. give information about the sample size.
GeomHalfPoint
, perhaps counterintuitively, does not display a literal half-circle. Rather, it plots the data points such that
_half_
geomFurther, by default geom_half_point
jitters the points horizontally and vertically.
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_half_point()
The way this works is that transformation = PositionJitter
is passed to the geom
. We could play with the default values of this transformation by passing along a transformation_params
argument
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_half_point(transformation_params = list(height = 0, width = 0.001, seed = 1))
#> Warning in f(...): Argument deprecated.
#> Use `transformation = position_*(params)` instead of passing the params via `transformation_params`
#> Warning in f(...): Argument deprecated.
#> Use `transformation = position_*(params)` instead of passing the params via `transformation_params`
#> Warning in f(...): Argument deprecated.
#> Use `transformation = position_*(params)` instead of passing the params via `transformation_params`
or we could change the transformation
argument itself:
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_half_point(transformation = PositionIdentity)
Making the transformation work with custom Position
s from ggplot2
extensions is something that will hopefully be included in future updates of this package.
Like all _half_
geoms, geom_half_point
also takes a side
argument, with l
for left and r
for right.
GeomHalfBoxplot
displays a boxplot that is cut in half and plotted either on the left or right side of the space allotted to the specific factor on the x-axis.
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_half_boxplot()
Additionally to the standard side
argument, you can also center
the half-boxplot and decide whether an errorbar is drawn or not.
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_half_boxplot(side = "r", center = TRUE, errorbar.draw = FALSE)
GeomHalfViolin
draws a half-violin plot. Besides the side
argument, it supports all the arguments that can be passed to the standard GeomViolin
.
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_half_violin()
GeomHalfDotplot
is slightly different from the other _half_
geom
s in that it does not support a side
argument, since this is already inherently built into the standard GeomDotplot
via stackdir
:
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_half_violin() +
geom_dotplot(binaxis = "y", method="histodot", stackdir="up")
#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.
So, given that geom_dotplot
can be used as a _half_
geom
, why the need for geom_half_dotplot
? The reason is that geom_dotplot
does not support dodging when there are multiple factors in play. Let’s consider the following example:
df <- data.frame(score = rgamma(150, 4, 1),
gender = sample(c("M", "F"), 150, replace = TRUE),
genotype = factor(sample(1:3, 150, replace = TRUE)))
Given this data, we want to group by genotype
, but also separate the plots by gender
. This does not quite work using the standard geom
:
ggplot(df, aes(x = genotype, y = score, fill = gender)) +
geom_half_violin() +
geom_dotplot(binaxis = "y", method="histodot", stackdir="up", position = PositionDodge)
#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.
Using geom_half_dotplot
, however, we can make this work:
ggplot(df, aes(x = genotype, y = score, fill = gender)) +
geom_half_violin() +
geom_half_dotplot(method="histodot", stackdir="up")
#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.
As mentioned in the package description, gghalves
can work well in combination with certain ggplot2
extensions. One of them is geom_beeswarm
of the ggbeeswarm
package. Note that, currently, you will need to install the latest version from GitHub to support the passing of beeswarmArgs
.
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_half_boxplot() +
geom_beeswarm(beeswarmArgs = list(side = 1))
Lastly, let us remake the plot displayed in the GitHub Readme. It is for display-purposes only, and thus uses a lot of filtering and a lot of geom
s…
ggplot() +
geom_half_boxplot(
data = iris %>% filter(Species=="setosa"),
aes(x = Species, y = Sepal.Length, fill = Species), outlier.color = NA) +
ggbeeswarm::geom_beeswarm(
data = iris %>% filter(Species=="setosa"),
aes(x = Species, y = Sepal.Length, fill = Species, color = Species), beeswarmArgs=list(side=+1)
) +
geom_half_violin(
data = iris %>% filter(Species=="versicolor"),
aes(x = Species, y = Sepal.Length, fill = Species), side="r") +
geom_half_dotplot(
data = iris %>% filter(Species=="versicolor"),
aes(x = Species, y = Sepal.Length, fill = Species), method="histodot", stackdir="down") +
geom_half_boxplot(
data = iris %>% filter(Species=="virginica"),
aes(x = Species, y = Sepal.Length, fill = Species), side = "r", errorbar.draw = TRUE,
outlier.color = NA) +
geom_half_point(
data = iris %>% filter(Species=="virginica"),
aes(x = Species, y = Sepal.Length, fill = Species, color = Species), side = "l") +
scale_fill_manual(values = c("setosa" = "#cba1d2", "versicolor"="#7067CF","virginica"="#B7C0EE")) +
scale_color_manual(values = c("setosa" = "#cba1d2", "versicolor"="#7067CF","virginica"="#B7C0EE")) +
theme(legend.position = "none")