Radar-Boxplot

This package provides the implementation of the radar-boxplot, a chart created and developed by the author.

Installation

The package is available in the CRAN repository.

install.packages("radarBoxplot")

Usage

There are two variants Default and Formula that accepts two different sets of arguments as input.

Default

Formula

Additional optional arguments

Description

It merges the concepts of both radar chart and the boxplot chart, allowing to compare multivariate data for multiple classes/clusters at a time. It provides a intuitive understanding over the data by creating radar polygons which can be compared in terms of shape and thickness, giving a meaningful insight towards identifying high inner variation and similar classes/clusters.

By interpreting the radar-boxplot, it is possible to predict classification confusion over classes and understand why and what could be done to achieve better results.

The radar-boxplot draws two different regions colors representing the same a boxplot would, but for multiple attributes at once. The following example shows an example of the radar-boxplot over Iris Dataset. The inner red region represents the 25-75% percentiles of each attribute, while the blue area represents the total range, excluding the outliers as defined by Tukey, 1977. Outlier appears as whiskers, just like the classic boxplot.

IQR = Q3 - Q1
LOWER_OUTLIER = Q1 - (1.5 x IQR)
UPPER_OUTLIER = Q3 + (1.5 x IQR)

Radar-boxplot example with red wine quality dataset
Radar-boxplot example with red wine quality dataset

You can see that as the rating gets higher there are two different things happening. First, there is a shift on the overall shape, mainly getting a more defined “pointy” shape towards citric acid and alcohol, while concave for volatile acidity. The second interesting fact is that the inner variation appears to reduce, suggesting that top quality wines must conform to a stricter set of parameters, while intermediante ones can have a mixture of poor properties along with high quality ones compensating each other. I could also propose a cluster analysis within ratings 5 and 6 (because of the high inner variation) to try to understand if there are multiple patterns within them, which could reveal different sets of intermediate wines.

The radar-boxplot is best suited when you have more than 4 relevant variables for your clustering/classification task, because it gives the possibility to represent higher dimensionality while still being readable.

Example

library(radarBoxplot)
data("winequality_red")

# Regular
radarBoxplot(quality ~ ., winequality_red)

# Orange and green pattern with white median
orange = "#FFA500CC"
green = rgb(0, .7, 0, 0.6)
radarBoxplot(quality ~ ., winequality_red,
             use.ggplot2=FALSE, medianLine=list(col="white"),
             innerPolygon=list(col=orange),
             outerPolygon=list(col=green))


# Plot in 2 rows and 3 columns
# change columns order (counter clockwise)
radarBoxplot(quality ~ volatile.acidity + citric.acid +
             residual.sugar + fixed.acidity + chlorides +
             free.sulfur.dioxide + total.sulfur.dioxide +
             density + pH + sulphates + alcohol,
             data = winequality_red,
             mfrow=c(2,3))

Acknowledgments

Thanks Dr. Michael Friendly for your great suggestions for improving this package, I’m still working on those. Also, I’d like to thank Dr. Peter Rousseeuw for his valuable feedback.