Getting started with wrGraph

Wolfgang Raffelsberger

2020-07-09

Introduction

This package contains a collection of various plotting tools, mostly as an extension of packages wrMisc and wrProteo.

To get started, we need to load the package wrMisc and wrProteo (this package), all are available from CRAN.

library("wrMisc")
library("wrGraph")
# This is version no:
packageVersion("wrGraph")
#> [1] '1.0.2'

Prepare Layout for Accomodating Multiple Smaller Plots

The function partitionPlot() prepares a matrix to serve as grid for segmenting the current device (ie the available plotting region). It aims to optimize the layout based on a given number of plots to accomodate. The user may choose if the layout should rather be adopted to lanscape (default) or portrait geometry.

## as the last column of the iris-data is not numeric we choose -1
(part <- partitionPlot(ncol(iris)-1))
#>      [,1] [,2]
#> [1,]    1    2
#> [2,]    3    4
layout(part)
for(i in colnames(iris)[-5]) hist(iris[,i],main=i)

One More Histogram Function

Histograms are a very versitile tool to rapidly gain insights in the distribution of data. This package presents a histogram function allowing to convienently work with log2-data and (if desired) display numbers calculated back to linear values on the x-axis.

set.seed(2016); dat1 <- round(c(rnorm(200,6,0.5),rlnorm(300,2,0.5),rnorm(100,17)),2)
dat1 <- dat1[which(dat1 <50 & dat1 > 0.2)]
histW(dat1,br="FD",isLog=FALSE,silent=TRUE)

## view as log, but x-scale in linear
histW(log2(dat1),br="FD",isLog=TRUE,silent=TRUE)

Now we can combine this with the previous segmentation :

## quick overview of distributions  
layout(partitionPlot(4))
for(i in 1:4) histW(iris[,i],isLog=FALSE,tit=colnames(iris)[i])

Small Histogram(s) as Legend

layout(1)
plot(iris[,1:2],main="Iris data")
legendHist(iris[,1],loc="br",legTit=colnames(iris)[1],cex=0.5)
legendHist(iris[,2],loc="tl",legTit=colnames(iris)[2],cex=0.5)

Violin Plots

Violin plots or vioplots are basically an adaptation of plotting the Kernel density estimation. Compared to the ‘original’ vioplots in R from package vioplot, the function provided here offers more flexibility for data-formats accepted (inlcuding data.frames and lists), coloring and display of n. In the case of the iris-data, there are no NAs and thus n is constant, thus the number of values (n) will be displayed only once. However, when working with data-sets containing NAs, or simply when working with lists the number of values per data-set/violin n may vary.

vioplotW(iris[,-5],tit="Iris-data")

Plotting Sorted Values (‘Summed Frequency’)

This plot offers an alterantive to histograms and density-plots. While histograms and density-plots are very intuitive, their interpretation may pose some difficulties due to the smoothing effect of Kernel-functions or the non-trvial choice of optimal width of bars (histigram) may influence interpretation.

As alternative, a plot is presented which basically reads like a summed frequency plot and has the main advantage, that all points of data may be easily displayed. Thus the resultant plot does not suffer from deformation due to binning or smoothing and offers maximal ‘resolution’.

cumFrqPlot(iris[,1:4])

For example, the iris-data are rounded. As a result in the plot above the line is not progessing smooth but with more marked charcater of steps of stairs. To get the same conclusion one would need to increase the number of bars in a histogram very much which would makes it in our experience more difficult to avaluate the same time the global distribution character.

At this plot you may note that the curves patal.width and petal.length look differently. On the previous vioplots you may have noticed the bimodal character of the values, again this plot may be helpful to identify distributios which are very difficult to see well using boxplots.

Examine Counts based on Variable Threshold Levels (for ROC curves)

The next plot is dedicated to visualize counting results with mooving thresholds. While a given threshold criteria moves up or down the resulting number of values passing may not necessarily follow in a linear way. This fucntion lets you follow muliple types of samples (eg type of leave in the iris-data) in a single plot. In particular when constructing ROC curves it may also be helpful to visualize the (absolute) counting data used underneith before determining TP and FP ratios. Typically used in context of benchmark-tests in proteomics.

thr <- seq(min(iris[,1:4]),max(iris[,1:4])+0.1,length.out=100)
irisC <- sapply(thr,function(x) colSums(iris[,1:4] < x))
irisC <- cbind(thr,t(irisC))

  head(irisC)
#>            thr Sepal.Length Sepal.Width Petal.Length Petal.Width
#> [1,] 0.1000000            0           0            0           0
#> [2,] 0.1797980            0           0            0           5
#> [3,] 0.2595960            0           0            0          34
#> [4,] 0.3393939            0           0            0          41
#> [5,] 0.4191919            0           0            0          48
#> [6,] 0.4989899            0           0            0          48
staggerdCountsPlot(irisC[,],countsCol=colnames(iris)[1:4],tit="Iris-data")

staggerdCountsPlot(irisC[,],varCountNa="Sepal",tit="Iris-data")

staggerdCountsPlot(irisC[,],varCountNa="Sepal",tit="Iris-data (log-scale)",logScale=TRUE)

Compare Two Groups with Sub-Organisation Each

In real-world testing data have often some nested structure. For example repeated measures from a set of patients which can be organized as diseased and non-diseased. This plot allows to plot all values obtained from the each patient together, then organized by disease-groups.

For this example suppose the iris-data were organized as 10 sets of 5 measures each (of course, in this case it is a pure hypothesis). Then, we can plot while highlighting the two factors (ie species and set of measurement). Basically we need to furnish with the main data two additional factors for the groupings. Note, that the 1st factor should contain the smaller sub-groups to visually inspect if there are any batch effects. This plot is not well adopted to big data (it will get too crowded).

dat <- iris[which(iris$Species %in% c("setosa","versicolor")),]
plotBy2Groups(dat$Sepal.Length,gl(2,50,labels=c("setosa","versicolor")),
  gl(20,5),yLab="Sepal.Length")

Plotting linear regression and confidence intervals

The function plotLinReg() provides help to display a series of bivariate points given in ‘dat’ (multiple data formats possible), to model a linear regression-plots and plot the results.

plotLinReg(iris$Sepal.Length, iris$Petal.Width, tit="Iris-data")

Principal Components Analysis (PCA)

Principal components analysis, PCA, is a very powerful method to investigate similarity and correlation in larger sets of data.
Please note that several implementations exist in R (eg prcomp() in the base package stats or the package FactoMineR). We’ll start by looking at the plot produced with the basic function and FactoMineR, too.

Let’s look at the similarity of the 3 iris-species from the iris data-set.

## the basic way
iris.prc <- prcomp(iris[,1:4], scale.=TRUE)
biplot(iris.prc)              # traditional plot

## via FactoMineR
library(FactoMineR); library(dplyr); library(factoextra)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
iris.Fac <- PCA(iris[,1:4],scale.unit=TRUE, graph=FALSE)
fviz_pca_ind(iris.Fac, geom.ind="point", col.ind=iris$Species, palette=c(2,4,3), 
  addEllipses=TRUE, legend.title="Groups" )

However, some sets of points do not always follow elliptic shapes. Note, that FactoMineR represented the 2nd principal component upside-down compared to the very first PCA figure. To facilitate comparisons, the function plotPCAw has an argument allowing to rotate/flip any princial component axis.

With more crowded data-sets it may be useful to rather highlight the more dense regions. For this reason this package proposes to use bagplots to highlight the region with 50% of data-points (in analogy to boxplots), a simple line draws the contour of the most distant points.

## via wrGraph, similar to FactoMineR but with bagplots
plotPCAw(t(as.matrix(iris[,-5])),gl(3,50,labels=c("setosa","versicolor","virginica")),
  tit="Iris data", rowTyName="types of leaves", suplFig=FALSE,cexTxt=1.3,rotatePC=2)

Thus, you can see in this case there is some intersection between versicolor and virginica species, but the center regions stay apart. Of course, similar to boxplots, this representation won’t work well with multi-modal distributions within one group of points. Luckily this happens only very rarely with principal components.

You can also add the 3rd principal component and the Scree-plot :

## including 3rd component and Screeplot
plotPCAw(t(as.matrix(iris[,-5])),gl(3,50,labels=c("setosa","versicolor","virginica")),
  tit="Iris data", rowTyName="types of leaves",cexTxt=2)

MA-plot

The aim of MA-plots is to display a relative change on one axis (ordinate) while showing the absolute (mean) value on the other axis (abscissa). This plot is very useful when inspecting larger data-sets for random or systematic effects, numerous implementations for specific applications exist (eg plotMA in DEseq2). The version presented here is rather generic and uses transparent points to avoid getting plots too crowded.

## let's generate some toy data
set.seed(2005); mat <- matrix(round(runif(2400),3),ncol=6)
mat[11:90,4:6] <- mat[11:90,4:6] +round(abs(rnorm(80)),3)
mat[11:90,] <- mat[11:90,] +0.3
dimnames(mat) <- list(paste("li",1:nrow(mat),sep="_"),paste(rep(letters[1:2],each=3),1:6,sep=""))
## assume 2 groups with 3 samples each
matMeans <- round(cbind(A=rowMeans(mat[,1:3]),B=rowMeans(mat[,4:6])),4)
MAplotW(matMeans[,1]-matMeans[,2],rowMeans(mat)) 

The function MAplotW may also be used to conveniently inspect results from t-tests performed with the package wrMisc or from the very popular package limma.

It is possible to add lines for a given fold-change threshold (which is meant to be on linear data ratios, thus 1.5 will draw a lines at +0.58 and -0.58).

## assume 2 groups with 3 samples each and run moderated t-test (from package 'limma')
tRes <- wrMisc::moderTest2grp(mat,gl(2,3),addResults=c("FDR","Mval","means"))
## 
## convenient way, add fold-change threshold and mark who is beyond
MAplotW(tRes,FCth=1.5,cexLa=1)    
#>  n=400  FCthrs=1.5  filt=400  passFC=56  range Mva -2.72 and 0.594  alph=0.25  useCex=0.93  alph2=0.3

In the last plot one can see easily the 80 points that have been increased in the second group, many of them exceed a (based on linear data) fold-change of 1.5 (which on log2 scale apprears at 0.58).

Standalone html Page with Plot and Mouse-Over Interactive Features

The idea of this function dates before somehow similar applications were made possible using Shiny. In this case the origial idea was simply to provide help to identify points in a plot and display via mouse-over labels to display and by providing clickable links. To do so, very simple html documents are created which display a separately saved image and have a section indicating at which location of the figure which information should be displayed as mouse-over and providing the clickable links. In high-throughput biology many times reserachers want quickly know which protein or gene a given point in a graphic correpsonds and being able to get further information on this protein or gene via links to UniProt or GenBank.

As this functionality was made for separate html documents it’s output cannot be easily integrated into a vignette like this one. To generate such a simple mouse-over interactive html document, the user is invited to run the code of this vignette. At the last step you could ask R to open the interactive html page in the default broswer.

Concerning the path provided in the argument pngFileNa of the function mouseOverHtmlFile: In a real world case, you migth want to choose other locations to save files rather than R-temp which will be deleted when closing your instance of R. It is also possible to work with relative paths (eg by giving the png-filename without path). Then the resultant html will simply require the png-file to be in the same directory as the html itself, and similary for the clickable links. This results in files that are very easy to distribute to other people, in particular if the clickable links are pointing to the internet, however, the pnh-file always needs to be in the same directory as the html … If you have the possibility to make the png-file accessible through a url, you could also provide this url.

The example below shows usage when specifying absolute paths. Please note that the resulting html will not display the image if your browser cannot access the image any more.

## Let's make some toy data 
df1 <- data.frame(id=letters[1:10],x=1:10,y=rep(5,10),mou=paste("point",letters[1:10]),
  link=file.path(tempdir(),paste(LETTERS[1:10],".html",sep="")),stringsAsFactors=FALSE)  
## here we'll use R's tempdir, later you may want to choose other locations
pngFile <- file.path(tempdir(),"test01.png")
png(pngFile,width=800, height=600,res=72)
## here we'll just plot a set of horiontal points ...
plot(df1[,2:3],las=1,main="test01")
dev.off()
#> png 
#>   2
## Note : Special characters should be converted for proper display in html during mouse-over
library(wrMisc)
df1$mou <- htmlSpecCharConv(df1$mou)
## Let's add the x- and y-coordiates of the points in pixels to the data.frame
df1 <- cbind(df1,convertPlotCoordPix(x=df1[,2],y=df1[,3],plotD=c(800,600),plotRes=72))
#>  ->  convertPlotCoordPix : supl params : mar=6.2,4,4,2  dim=800x600   res=72
head(df1)
#>   id x y     mou                                                      link xPix
#> 1  a 1 5 point a C:\\Users\\wraff\\AppData\\Local\\Temp\\RtmpSKVyRE/A.html   81
#> 2  b 2 5 point b C:\\Users\\wraff\\AppData\\Local\\Temp\\RtmpSKVyRE/B.html  155
#> 3  c 3 5 point c C:\\Users\\wraff\\AppData\\Local\\Temp\\RtmpSKVyRE/C.html  229
#> 4  d 4 5 point d C:\\Users\\wraff\\AppData\\Local\\Temp\\RtmpSKVyRE/D.html  303
#> 5  e 5 5 point e C:\\Users\\wraff\\AppData\\Local\\Temp\\RtmpSKVyRE/E.html  377
#> 6  f 6 5 point f C:\\Users\\wraff\\AppData\\Local\\Temp\\RtmpSKVyRE/F.html  451
#>   yPix
#> 1  284
#> 2  284
#> 3  284
#> 4  284
#> 5  284
#> 6  284
## Now make the html-page allowing to display mouse-over to the png made before
htmFile <- file.path(tempdir(),"test01.html")
mouseOverHtmlFile(df1,pngFile,HtmFileNa=htmFile,pxDiam=15,
  textAtStart="Points in the figure are interactive to mouse-over ...",
  textAtEnd="and/or may contain links")
## We still need to make some toy links
for(i in 1:nrow(df1)) cat(paste("point no ",i," : ",df1[i,1]," x=",df1[i,2]," y=",
  df1[i,3],sep=""), file=df1$link[i]) 
## Now we are ready to open the html file using any browser ..
#from within R# browseURL(htmFile) 

Thank you for you interest in this package. This package is still under development, new functions will be added to the next version.

Session-Info

#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18363)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=C                   LC_CTYPE=French_France.1252   
#> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
#> [5] LC_TIME=French_France.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_1.0.0      factoextra_1.0.7 ggplot2_3.3.2    FactoMineR_2.3  
#> [5] wrGraph_1.0.2    wrMisc_1.2.4    
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.1.0     xfun_0.15            purrr_0.3.4         
#>  [4] haven_2.3.1          lattice_0.20-41      carData_3.0-4       
#>  [7] tcltk_4.0.2          colorspace_1.4-1     vctrs_0.3.1         
#> [10] generics_0.0.2       htmltools_0.5.0      yaml_2.2.1          
#> [13] rlang_0.4.6          sm_2.2-5.6           pillar_1.4.4        
#> [16] ggpubr_0.4.0         foreign_0.8-80       glue_1.4.1          
#> [19] withr_2.2.0          RColorBrewer_1.1-2   readxl_1.3.1        
#> [22] lifecycle_0.2.0      stringr_1.4.0        cellranger_1.1.0    
#> [25] munsell_0.5.0        ggsignif_0.6.0       gtable_0.3.0        
#> [28] zip_2.0.4            leaps_3.1            evaluate_0.14       
#> [31] labeling_0.3         forcats_0.5.0        knitr_1.29          
#> [34] rio_0.5.16           curl_4.3             broom_0.5.6         
#> [37] Rcpp_1.0.5           scales_1.1.1         backports_1.1.8     
#> [40] flashClust_1.01-2    limma_3.44.3         scatterplot3d_0.3-41
#> [43] abind_1.4-5          farver_2.0.3         hms_0.5.3           
#> [46] digest_0.6.25        openxlsx_4.1.5       stringi_1.4.6       
#> [49] rstatix_0.6.0        ggrepel_0.8.2        grid_4.0.2          
#> [52] tools_4.0.2          magrittr_1.5         tibble_3.0.1        
#> [55] cluster_2.1.0        crayon_1.3.4         tidyr_1.1.0         
#> [58] car_3.0-8            pkgconfig_2.0.3      MASS_7.3-51.6       
#> [61] ellipsis_0.3.1       data.table_1.12.8    rmarkdown_2.3       
#> [64] R6_2.4.1             nlme_3.1-148         compiler_4.0.2