An Introduction to corrplot Package

Introduction

The corrplot package is a graphical display of a correlation matrix, confidence interval. It also contains some algorithms to do matrix reordering. In addition, corrplot is good at details, including choosing color, text labels, color labels, layout, etc.

Visualization methods

There are seven visualization methods (parameter method) in corrplot package, named "circle", "square", "ellipse", "number", "shade", "color", "pie".

Positive correlations are displayed in blue and negative correlations in red color. Color intensity and the size of the circle are proportional to the correlation coefficients.

library(corrplot)
## corrplot 0.84 loaded
M <- cor(mtcars)
corrplot(M, method = "circle")

plot of chunk methods

corrplot(M, method = "square")

plot of chunk methods

corrplot(M, method = "ellipse")

plot of chunk methods

corrplot(M, method = "number") # Display the correlation coefficient

plot of chunk methods

corrplot(M, method = "shade")

plot of chunk methods

corrplot(M, method = "color")

plot of chunk methods

corrplot(M, method = "pie")

plot of chunk methods

Layout

There are three layout types (parameter type):

corrplot(M, type = "upper")
corrplot(M, type = "upper")

plot of chunk layout

corrplot.mixed() is a wrapped function for mixed visualization style.

corrplot.mixed(M)

plot of chunk mixed

corrplot.mixed(M, lower.col = "black", number.cex = .7)

plot of chunk mixed

corrplot.mixed(M, lower = "ellipse", upper = "circle")

plot of chunk mixed

corrplot.mixed(M, lower = "square", upper = "circle", tl.col = "black")

plot of chunk mixed

Reorder a correlation matrix

The correlation matrix can be reordered according to the correlation coefficient. This is important to identify the hidden structure and pattern in the matrix. There are four methods in corrplot (parameter order), named "AOE", "FPC", "hclust", "alphabet". More algorithms can be found in seriation package.

You can also reorder the matrix “manually” via function corrMatOrder().

corrplot(M, order = "AOE")

plot of chunk order

corrplot(M, order = "hclust")

plot of chunk order

corrplot(M, order = "FPC")

plot of chunk order

corrplot(M, order = "alphabet")

plot of chunk order

If using "hclust", corrplot() can draw rectangles around the chart of corrrlation matrix based on the results of hierarchical clustering.

corrplot(M, order = "hclust", addrect = 2)

plot of chunk rectangles

corrplot(M, order = "hclust", addrect = 3)

plot of chunk rectangles

# Change background color to lightblue
corrplot(M, type = "upper", order = "hclust",
         col = c("black", "white"), bg = "lightblue")

plot of chunk hclust-lightblue

Using different color spectra

As shown in the above section, the color of the correlogram can be customized. The function colorRampPalette() is very convenient for generating color spectrum.

col1 <- colorRampPalette(c("#7F0000", "red", "#FF7F00", "yellow", "white",
                           "cyan", "#007FFF", "blue", "#00007F"))
col2 <- colorRampPalette(c("#67001F", "#B2182B", "#D6604D", "#F4A582",
                           "#FDDBC7", "#FFFFFF", "#D1E5F0", "#92C5DE",
                           "#4393C3", "#2166AC", "#053061"))
col3 <- colorRampPalette(c("red", "white", "blue")) 
col4 <- colorRampPalette(c("#7F0000", "red", "#FF7F00", "yellow", "#7FFF7F",
                           "cyan", "#007FFF", "blue", "#00007F"))
whiteblack <- c("white", "black")

## using these color spectra
corrplot(M, order = "hclust", addrect = 2, col = col1(100))

plot of chunk color

corrplot(M, order = "hclust", addrect = 2, col = col2(50))

plot of chunk color

corrplot(M, order = "hclust", addrect = 2, col = col3(20))

plot of chunk color

corrplot(M, order = "hclust", addrect = 2, col = col4(10))

plot of chunk color

corrplot(M, order = "hclust", addrect = 2, col = whiteblack, bg = "gold2")

plot of chunk color

You can also use the standard color palettes (package grDevices)

corrplot(M, order = "hclust", addrect = 2, col = heat.colors(100))

plot of chunk hclust-stdcolors

corrplot(M, order = "hclust", addrect = 2, col = terrain.colors(100))

plot of chunk hclust-stdcolors

corrplot(M, order = "hclust", addrect = 2, col = cm.colors(100))

plot of chunk hclust-stdcolors

corrplot(M, order = "hclust", addrect = 2, col = gray.colors(100))

plot of chunk hclust-stdcolors

Other option would be to use RcolorBrewer package.

library(RColorBrewer)

corrplot(M, type = "upper", order = "hclust",
         col = brewer.pal(n = 8, name = "RdBu"))

plot of chunk hclust-rcolorbrewer

corrplot(M, type = "upper", order = "hclust",
         col = brewer.pal(n = 8, name = "RdYlBu"))

plot of chunk hclust-rcolorbrewer

corrplot(M, type = "upper", order = "hclust",
         col = brewer.pal(n = 8, name = "PuOr"))

plot of chunk hclust-rcolorbrewer

Changing color and rotation of text labels and legend

Parameter cl.* is for color legend, and tl.* if for text legend. For the text label, tl.col (text label color) and tl.srt (text label string rotation) are used to change text colors and rotations.

Here are some examples.

## remove color legend and text legend 
corrplot(M, order = "AOE", cl.pos = "n", tl.pos = "n")  

plot of chunk color-label

## bottom  color legend, diagonal text legend, rotate text label
corrplot(M, order = "AOE", cl.pos = "b", tl.pos = "d", tl.srt = 60)

plot of chunk color-label

## a wider color legend with numbers right aligned
corrplot(M, order = "AOE", cl.ratio = 0.2, cl.align = "r")

plot of chunk color-label

## text labels rotated 45 degrees
corrplot(M, type = "lower", order = "hclust", tl.col = "black", tl.srt = 45)

plot of chunk color-label

Dealing with a non-correlation matrix

corrplot(abs(M),order = "AOE", col = col3(200), cl.lim = c(0, 1))

plot of chunk non-corr

## visualize a  matrix in [-100, 100]
ran <- round(matrix(runif(225, -100,100), 15))
corrplot(ran, is.corr = FALSE, method = "square")

plot of chunk non-corr

## a beautiful color legend 
corrplot(ran, is.corr = FALSE, method = "ellipse", cl.lim = c(-100, 100))

plot of chunk non-corr

If your matrix is rectangular, you can adjust the aspect ratio with the win.asp parameter to make the matrix rendered as a square.

ran <- matrix(rnorm(70), ncol = 7)
corrplot(ran, is.corr = FALSE, win.asp = .7, method = "circle")

plot of chunk non-corr-asp

Dealing with missing (NA) values

By default, corrplot renders NA values as "?" characters. Using na.label parameter, it is possible to use a different value (max. two characters are supported).

M2 <- M
diag(M2) = NA
corrplot(M2)

plot of chunk NAs

corrplot(M2, na.label = "o")

plot of chunk NAs

corrplot(M2, na.label = "NA")

plot of chunk NAs

Using “plotmath” expressions in labels

Since version 0.78, it is possible to use plotmath expression in variable names. To activate plotmath rendering, prefix your label with one of the characters ":", "=" or "$".

M2 <- M[1:5,1:5]
colnames(M2) <- c("alpha", "beta", ":alpha+beta", ":a[0]", "=a[beta]")
rownames(M2) <- c("alpha", "beta", NA, "$a[0]", "$ a[beta]")
corrplot(M2)

plot of chunk plotmath

Combining correlogram with the significance test

res1 <- cor.mtest(mtcars, conf.level = .95)
res2 <- cor.mtest(mtcars, conf.level = .99)

## specialized the insignificant value according to the significant level
corrplot(M, p.mat = res1$p, sig.level = .2)

plot of chunk test

corrplot(M, p.mat = res1$p, sig.level = .05)

plot of chunk test

corrplot(M, p.mat = res1$p, sig.level = .01)

plot of chunk test

## leave blank on no significant coefficient
corrplot(M, p.mat = res1$p, insig = "blank")

plot of chunk test

## add p-values on no significant coefficient
corrplot(M, p.mat = res1$p, insig = "p-value")

plot of chunk test

## add all p-values
corrplot(M, p.mat = res1$p, insig = "p-value", sig.level = -1)

plot of chunk test

## add cross on no significant coefficient 
corrplot(M, p.mat = res1$p, order = "hclust", insig = "pch", addrect = 3)

plot of chunk test

Visualize confidence interval

corrplot(M, low = res1$lowCI, upp = res1$uppCI, order = "hclust",
         rect.col = "navy", plotC = "rect", cl.pos = "n")

plot of chunk ci

corrplot(M, p.mat = res1$p, low = res1$lowCI, upp = res1$uppCI,
         order = "hclust", pch.col = "red", sig.level = 0.01,
         addrect = 3, rect.col = "navy", plotC = "rect", cl.pos = "n")

plot of chunk ci

res1 <- cor.mtest(mtcars, conf.level = .95)

corrplot(M, p.mat = res1$p, insig = "label_sig",
         sig.level = c(.001, .01, .05), pch.cex = .9, pch.col = "white")

plot of chunk ci_with_label

corrplot(M, p.mat = res1$p, method = "color",
         insig = "label_sig", pch.col = "white")

plot of chunk ci_with_label

corrplot(M, p.mat = res1$p, method = "color", type = "upper",
         sig.level = c(.001, .01, .05), pch.cex = .9,
         insig = "label_sig", pch.col = "white", order = "AOE")

plot of chunk ci_with_label

corrplot(M, p.mat = res1$p, insig = "label_sig", pch.col = "white",
         pch = "p<.05", pch.cex = .5, order = "AOE")

plot of chunk ci_with_label

Customize the correlogram

# matrix of the p-value of the correlation
p.mat <- cor.mtest(mtcars)$p
head(p.mat[, 1:5])
##              [,1]         [,2]         [,3]         [,4]         [,5]
## [1,] 0.000000e+00 6.112687e-10 9.380327e-10 1.787835e-07 1.776240e-05
## [2,] 6.112687e-10 0.000000e+00 1.802838e-12 3.477861e-09 8.244636e-06
## [3,] 9.380327e-10 1.802838e-12 0.000000e+00 7.142679e-08 5.282022e-06
## [4,] 1.787835e-07 3.477861e-09 7.142679e-08 0.000000e+00 9.988772e-03
## [5,] 1.776240e-05 8.244636e-06 5.282022e-06 9.988772e-03 0.000000e+00
## [6,] 1.293959e-10 1.217567e-07 1.222320e-11 4.145827e-05 4.784260e-06
# Specialized the insignificant value according to the significant level
corrplot(M, type = "upper", order = "hclust", 
         p.mat = p.mat, sig.level = 0.01)

plot of chunk pmat

# Leave blank on no significant coefficient
corrplot(M, type = "upper", order = "hclust", 
         p.mat = p.mat, sig.level = 0.01, insig = "blank")

plot of chunk pmat In the above figure, correlations with p-value > 0.01 are considered as insignificant. In this case the correlation coefficient values are leaved blank or crosses are added.

col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(M, method = "color", col = col(200),
         type = "upper", order = "hclust", number.cex = .7,
         addCoef.col = "black", # Add coefficient of correlation
         tl.col = "black", tl.srt = 90, # Text label color and rotation
         # Combine with significance
         p.mat = p.mat, sig.level = 0.01, insig = "blank", 
         # hide correlation coefficient on the principal diagonal
         diag = FALSE)

plot of chunk customized

Note: Some of the plots were taken from this blog.

Explore Large Feature Matrices

# generating large feature matrix (cols=features, rows=samples)
num_features <- 60 # how many features
num_samples <- 300 # how many samples
DATASET <- matrix(runif(num_features * num_samples),
               nrow = num_samples, ncol = num_features)

# setting some dummy names for the features e.g. f23
colnames(DATASET) <- paste0("f", 1:ncol(DATASET))

# let's make 30% of all features to be correlated with feature "f1"
num_feat_corr <- num_features * .3
idx_correlated_features <- as.integer(seq(from = 1,
                                          to = num_features,
                                          length.out = num_feat_corr))[-1]
for (i in idx_correlated_features) {
  DATASET[,i] <- DATASET[,1] + runif(num_samples) # adding some noise
}

corrplot(cor(DATASET), diag = FALSE, order = "FPC",
         tl.pos = "td", tl.cex = 0.5, method = "color", type = "upper")

plot of chunk large_matrix

h