The corrplot package is a graphical display of a correlation matrix, confidence interval. It also contains some algorithms to do matrix reordering. In addition, corrplot is good at details, including choosing color, text labels, color labels, layout, etc.
There are seven visualization methods (parameter method
) in corrplot package, named "circle"
, "square"
, "ellipse"
, "number"
, "shade"
, "color"
, "pie"
.
Positive correlations are displayed in blue and negative correlations in red color. Color intensity and the size of the circle are proportional to the correlation coefficients.
library(corrplot)
## corrplot 0.84 loaded
M <- cor(mtcars)
corrplot(M, method = "circle")
corrplot(M, method = "square")
corrplot(M, method = "ellipse")
corrplot(M, method = "number") # Display the correlation coefficient
corrplot(M, method = "shade")
corrplot(M, method = "color")
corrplot(M, method = "pie")
There are three layout types (parameter type
):
"full"
(default) : display full correlation matrix"upper"
: display upper triangular of the correlation matrix"lower"
: display lower triangular of the correlation matrixcorrplot(M, type = "upper")
corrplot(M, type = "upper")
corrplot.mixed()
is a wrapped function for mixed visualization style.
corrplot.mixed(M)
corrplot.mixed(M, lower.col = "black", number.cex = .7)
corrplot.mixed(M, lower = "ellipse", upper = "circle")
corrplot.mixed(M, lower = "square", upper = "circle", tl.col = "black")
The correlation matrix can be reordered according to the correlation
coefficient. This is important to identify the hidden structure and pattern in
the matrix. There are four methods in corrplot (parameter order
), named
"AOE"
, "FPC"
, "hclust"
, "alphabet"
. More algorithms can be found in
seriation package.
You can also reorder the matrix “manually” via function corrMatOrder()
.
"AOE"
is for the angular order of the eigenvectors. It is calculated from
the order of the angles \(a_i\),
\[ a_i = \begin{cases} \tan (e_{i2}/e_{i1}), & \text{if $e_{i1}>0$;} \newline \tan (e_{i2}/e_{i1}) + \pi, & \text{otherwise.} \end{cases} \]
where \(e_1\) and \(e_2\) are the largest two eigenvalues of the correlation matrix. See Michael Friendly (2002) for details.
"FPC"
for the first principal component order.
"hclust"
for hierarchical clustering order, and "hclust.method"
for the
agglomeration method to be used. "hclust.method"
should be one of
"ward"
, "single"
, "complete"
, "average"
, "mcquitty"
, "median"
or
"centroid"
.
"alphabet"
for alphabetical order.
corrplot(M, order = "AOE")
corrplot(M, order = "hclust")
corrplot(M, order = "FPC")
corrplot(M, order = "alphabet")
If using "hclust"
, corrplot()
can draw rectangles around the chart of corrrlation matrix based on the results of hierarchical clustering.
corrplot(M, order = "hclust", addrect = 2)
corrplot(M, order = "hclust", addrect = 3)
# Change background color to lightblue
corrplot(M, type = "upper", order = "hclust",
col = c("black", "white"), bg = "lightblue")
As shown in the above section, the color of the correlogram can be customized.
The function colorRampPalette()
is very convenient for generating color
spectrum.
col1 <- colorRampPalette(c("#7F0000", "red", "#FF7F00", "yellow", "white",
"cyan", "#007FFF", "blue", "#00007F"))
col2 <- colorRampPalette(c("#67001F", "#B2182B", "#D6604D", "#F4A582",
"#FDDBC7", "#FFFFFF", "#D1E5F0", "#92C5DE",
"#4393C3", "#2166AC", "#053061"))
col3 <- colorRampPalette(c("red", "white", "blue"))
col4 <- colorRampPalette(c("#7F0000", "red", "#FF7F00", "yellow", "#7FFF7F",
"cyan", "#007FFF", "blue", "#00007F"))
whiteblack <- c("white", "black")
## using these color spectra
corrplot(M, order = "hclust", addrect = 2, col = col1(100))
corrplot(M, order = "hclust", addrect = 2, col = col2(50))
corrplot(M, order = "hclust", addrect = 2, col = col3(20))
corrplot(M, order = "hclust", addrect = 2, col = col4(10))
corrplot(M, order = "hclust", addrect = 2, col = whiteblack, bg = "gold2")
You can also use the standard color palettes (package grDevices
)
corrplot(M, order = "hclust", addrect = 2, col = heat.colors(100))
corrplot(M, order = "hclust", addrect = 2, col = terrain.colors(100))
corrplot(M, order = "hclust", addrect = 2, col = cm.colors(100))
corrplot(M, order = "hclust", addrect = 2, col = gray.colors(100))
Other option would be to use RcolorBrewer
package.
library(RColorBrewer)
corrplot(M, type = "upper", order = "hclust",
col = brewer.pal(n = 8, name = "RdBu"))
corrplot(M, type = "upper", order = "hclust",
col = brewer.pal(n = 8, name = "RdYlBu"))
corrplot(M, type = "upper", order = "hclust",
col = brewer.pal(n = 8, name = "PuOr"))
Parameter cl.*
is for color legend, and tl.*
if for text legend. For the
text label, tl.col
(text label color) and tl.srt
(text label string
rotation) are used to change text colors and rotations.
Here are some examples.
## remove color legend and text legend
corrplot(M, order = "AOE", cl.pos = "n", tl.pos = "n")
## bottom color legend, diagonal text legend, rotate text label
corrplot(M, order = "AOE", cl.pos = "b", tl.pos = "d", tl.srt = 60)
## a wider color legend with numbers right aligned
corrplot(M, order = "AOE", cl.ratio = 0.2, cl.align = "r")
## text labels rotated 45 degrees
corrplot(M, type = "lower", order = "hclust", tl.col = "black", tl.srt = 45)
corrplot(abs(M),order = "AOE", col = col3(200), cl.lim = c(0, 1))
## visualize a matrix in [-100, 100]
ran <- round(matrix(runif(225, -100,100), 15))
corrplot(ran, is.corr = FALSE, method = "square")
## a beautiful color legend
corrplot(ran, is.corr = FALSE, method = "ellipse", cl.lim = c(-100, 100))
If your matrix is rectangular, you can adjust the aspect ratio with the
win.asp
parameter to make the matrix rendered as a square.
ran <- matrix(rnorm(70), ncol = 7)
corrplot(ran, is.corr = FALSE, win.asp = .7, method = "circle")
By default, corrplot renders NA values as "?"
characters. Using na.label
parameter, it is possible to use a different value (max. two characters are
supported).
M2 <- M
diag(M2) = NA
corrplot(M2)
corrplot(M2, na.label = "o")
corrplot(M2, na.label = "NA")
Since version 0.78
, it is possible to use
plotmath
expression in variable names. To activate plotmath rendering, prefix your label
with one of the characters ":"
, "="
or "$"
.
M2 <- M[1:5,1:5]
colnames(M2) <- c("alpha", "beta", ":alpha+beta", ":a[0]", "=a[beta]")
rownames(M2) <- c("alpha", "beta", NA, "$a[0]", "$ a[beta]")
corrplot(M2)
res1 <- cor.mtest(mtcars, conf.level = .95)
res2 <- cor.mtest(mtcars, conf.level = .99)
## specialized the insignificant value according to the significant level
corrplot(M, p.mat = res1$p, sig.level = .2)
corrplot(M, p.mat = res1$p, sig.level = .05)
corrplot(M, p.mat = res1$p, sig.level = .01)
## leave blank on no significant coefficient
corrplot(M, p.mat = res1$p, insig = "blank")
## add p-values on no significant coefficient
corrplot(M, p.mat = res1$p, insig = "p-value")
## add all p-values
corrplot(M, p.mat = res1$p, insig = "p-value", sig.level = -1)
## add cross on no significant coefficient
corrplot(M, p.mat = res1$p, order = "hclust", insig = "pch", addrect = 3)
corrplot(M, low = res1$lowCI, upp = res1$uppCI, order = "hclust",
rect.col = "navy", plotC = "rect", cl.pos = "n")
corrplot(M, p.mat = res1$p, low = res1$lowCI, upp = res1$uppCI,
order = "hclust", pch.col = "red", sig.level = 0.01,
addrect = 3, rect.col = "navy", plotC = "rect", cl.pos = "n")
res1 <- cor.mtest(mtcars, conf.level = .95)
corrplot(M, p.mat = res1$p, insig = "label_sig",
sig.level = c(.001, .01, .05), pch.cex = .9, pch.col = "white")
corrplot(M, p.mat = res1$p, method = "color",
insig = "label_sig", pch.col = "white")
corrplot(M, p.mat = res1$p, method = "color", type = "upper",
sig.level = c(.001, .01, .05), pch.cex = .9,
insig = "label_sig", pch.col = "white", order = "AOE")
corrplot(M, p.mat = res1$p, insig = "label_sig", pch.col = "white",
pch = "p<.05", pch.cex = .5, order = "AOE")
# matrix of the p-value of the correlation
p.mat <- cor.mtest(mtcars)$p
head(p.mat[, 1:5])
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.000000e+00 6.112687e-10 9.380327e-10 1.787835e-07 1.776240e-05
## [2,] 6.112687e-10 0.000000e+00 1.802838e-12 3.477861e-09 8.244636e-06
## [3,] 9.380327e-10 1.802838e-12 0.000000e+00 7.142679e-08 5.282022e-06
## [4,] 1.787835e-07 3.477861e-09 7.142679e-08 0.000000e+00 9.988772e-03
## [5,] 1.776240e-05 8.244636e-06 5.282022e-06 9.988772e-03 0.000000e+00
## [6,] 1.293959e-10 1.217567e-07 1.222320e-11 4.145827e-05 4.784260e-06
# Specialized the insignificant value according to the significant level
corrplot(M, type = "upper", order = "hclust",
p.mat = p.mat, sig.level = 0.01)
# Leave blank on no significant coefficient
corrplot(M, type = "upper", order = "hclust",
p.mat = p.mat, sig.level = 0.01, insig = "blank")
In the above figure, correlations with p-value > 0.01 are considered as
insignificant. In this case the correlation coefficient values are leaved blank
or crosses are added.
col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(M, method = "color", col = col(200),
type = "upper", order = "hclust", number.cex = .7,
addCoef.col = "black", # Add coefficient of correlation
tl.col = "black", tl.srt = 90, # Text label color and rotation
# Combine with significance
p.mat = p.mat, sig.level = 0.01, insig = "blank",
# hide correlation coefficient on the principal diagonal
diag = FALSE)
Note: Some of the plots were taken from this blog.
# generating large feature matrix (cols=features, rows=samples)
num_features <- 60 # how many features
num_samples <- 300 # how many samples
DATASET <- matrix(runif(num_features * num_samples),
nrow = num_samples, ncol = num_features)
# setting some dummy names for the features e.g. f23
colnames(DATASET) <- paste0("f", 1:ncol(DATASET))
# let's make 30% of all features to be correlated with feature "f1"
num_feat_corr <- num_features * .3
idx_correlated_features <- as.integer(seq(from = 1,
to = num_features,
length.out = num_feat_corr))[-1]
for (i in idx_correlated_features) {
DATASET[,i] <- DATASET[,1] + runif(num_samples) # adding some noise
}
corrplot(cor(DATASET), diag = FALSE, order = "FPC",
tl.pos = "td", tl.cex = 0.5, method = "color", type = "upper")