Split Violin Plots

Tom Kelly

2020-06-15

##Violin Plots

Therefore violin plots are a powerful tool to assist researchers to visualise data, particularly in the quality checking and exploratory parts of an analysis. Violin plots have many benefits:

As shown below for the iris dataset, violin plots show distribution information that the boxplot is unable to.

###General Set up

library("vioplot")

We set up the data with two categories (Sepal Width) as follows:

data(iris)
summary(iris$Sepal.Width)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   2.800   3.000   3.057   3.300   4.400
table(iris$Sepal.Width > mean(iris$Sepal.Width))
## 
## FALSE  TRUE 
##    83    67
iris_large <- iris[iris$Sepal.Width > mean(iris$Sepal.Width), ]
iris_small <- iris[iris$Sepal.Width <= mean(iris$Sepal.Width), ]

###Boxplots

First we plot Sepal Length on its own:

boxplot(Sepal.Length~Species, data=iris, col="grey")

An indirect comparison can be achieved with par:

{
  par(mfrow=c(2,1))
boxplot(Sepal.Length~Species, data=iris_small, col = "lightblue")
boxplot(Sepal.Length~Species, data=iris_large, col = "palevioletred")
par(mfrow=c(1,1))
}

Violin Plots

First we plot Sepal Length on its own:

vioplot(Sepal.Length~Species, data=iris)

An indirect comparison can be achieved with par:

{
  par(mfrow=c(2,1))
vioplot(Sepal.Length~Species, data=iris_small, col = "lightblue", plotCentre = "line")
vioplot(Sepal.Length~Species, data=iris_large, col = "palevioletred", plotCentre = "line")
par(mfrow=c(1,1))
}

Split Violin Plots

A more direct comparision can be made with the side argument and add = TRUE on the second plot:

vioplot(Sepal.Length~Species, data=iris_large, col = "palevioletred", plotCentre = "line", side = "right")
vioplot(Sepal.Length~Species, data=iris_small, col = "lightblue", plotCentre = "line", side = "left", add = T)
title(xlab = "Species", ylab = "Sepal Length")
legend("topleft", fill = c("lightblue", "palevioletred"), legend = c("small", "large"), title = "Sepal Width")

Custom axes labels

Custom axes labels are supported for split violin plots. However, you must use these arguments on the first call of vioplot.

Note that this is disabled for the second vioplot call to avoid overlaying labels.

## Warning in vioplot.formula(Sepal.Length ~ Species, data = iris_small, col = "lightblue", : Warning: names can only be changed on first call of vioplot (when add = FALSE)
## Warning in vioplot.formula(Sepal.Length ~ Species, data = iris_small, col = "lightblue", : Warning: x-axis labels can only be changed on first call of vioplot (when add = FALSE)
## Warning in vioplot.formula(Sepal.Length ~ Species, data = iris_small, col = "lightblue", : Warning: y-axis labels can only be changed on first call of vioplot (when add = FALSE)
## Warning in vioplot.default(x, ...): Warning: names can only be changed on first call of vioplot (when add = FALSE)
## Warning in vioplot.default(x, ...): Warning: main title can only be changed on first call of vioplot (when add = FALSE)

Median

The line median option is more suitable for side by side comparisions but the point option is still available also:

It may be necessary to include a points command to fix the median being overwritten by the following plots:

Similarly points could be added where a line has been used previously:

Here it is aesthetically pleasing and intuitive to interpret categorical differences in mean and variation in a continuous variable.

Sources

These extensions to vioplot here are based on those provided here:

These have previously been discussed on the following sites: