In the interactive manhattan, qq and volcano plots below, we only use a subset (chromosomes 4 to 7) of the HapMap
data included in this package.
Manhattan, Q-Q and volcano plots are popular graphical methods for visualizing results from high-dimensional data analysis such as a (epi)genome wide asssociation study (GWAS or EWAS), in which p-values, Z-scores, test statistics are plotted on a scatter plot against their genomic position. Manhattan plots are used for visualizing potential regions of interest in the genome that are associated with a phenotype. Q-Q plots tell us about the distributional assumptions of the observed test statistics. Volcano plots are the negative log10 p-values plotted against their effect size, odds ratio or log fold-change. They are used to identify clinically meaningful markers in genomic experiments, i.e., markers that are statistically significant and have an effect size greater than some threshold.
Interactive manhattan, Q-Q and volcano plots allow the inspection of specific value (e.g. rs number or gene name) by hovering the mouse over a point, as well as zooming into a region of the genome (e.g. a chromosome) by dragging a rectangle around the relevant area.
You can install manhattanly
from CRAN:
Alternatively, you can install the development version of manhattanly
from GitHub with:
if (!require("pacman")) install.packages("pacman")
The manhattanly
package ships with an example dataset called HapMap
. See help(HapMap)
for more details about how this dataset was created. Here is what the HapMap
dataset looks like:
# load the manhattanly library
library(manhattanly)
## 1 1 937641 0.3353438 rs9697358 0.9634 -0.0946 ISG15 1068
## 2 1 1136887 0.2458571 rs34945898 1.1605 -0.0947 TNFRSF4 0
## 3 1 2116240 0.8232859 rs12034613 0.2233 -0.0741 FP7162 0
## 4 1 2310562 0.4932038 rs4648633 0.6852 0.0146 MORN1 0
## 5 1 2681715 0.6053916 rs4430271 0.5167 0.1234 MMEL1 127427
## 6 1 2917484 0.1944431 rs6685625 1.2975 0.1979 ACTRT2 10421
## [1] 14412 8
The required columns to create a manhattan plot are the chromosome, base-pair position and p-value. By default, the manhattanly
function assumes these columns are named CHR
, BP
and P
(but these can be specified by the user if they are different)
Create an interactive manhattan plot using one command:
manhattanly(subset(HapMap, CHR %in% 4:7), snp = "SNP", gene = "GENE")
The arguments snp = "SNP"
and gene = "GENE"
specify that we want to add snp and gene information to each point. This information is found in the columns names "SNP"
and "GENE"
in the HapMap
dataset. See help(manhattanly)
for a full list of options.
Similarly, we can create an interactive Q-Q plot using one command (See help(qqly)
for a full list of options):
qqly(subset(HapMap, CHR %in% 4:7), snp = "SNP", gene = "GENE")
You can then save the plot as a .png
file by clicking on the camera icon in the toolbar (which appears when you hover your mouse over it).
You can also make a volcano plot which by default, highlights the points greater than the default genomewideline
and effect_size_line
volcanoly(subset(HapMap, CHR %in% 4:7), snp = "SNP", gene = "GENE")
We can also highlight SNPs of interest using the highlight
argument. This package comes with a list of SNPs of interest called significantSNP
(see help(significantSNP)
for more details). To highlight these SNPs we simply pass this vector to the highlight
argument (note that these SNPs need to be present in the "SNP"
column of your data):
manhattanly(subset(HapMap, CHR %in% 4:7), snp = "SNP", gene = "GENE", highlight = significantSNP)
You can add up to 4 annotations. In the following plot we add the snp, gene, the distance to the nearest gene and the effect size:
manhattanly(subset(HapMap, CHR %in% 4:7), snp = "SNP", gene = "GENE",
annotation1 = "DISTANCE", annotation2 = "EFFECTSIZE",
highlight = significantSNP)
The annotations in the previous plots only appear when we hover the mouse over the point. Once we have identified a SNP, or a few SNPs of interest we want to explicitly show the annotation information and save the plot. The output of the manhattanly
function is an object which can be further manipulated using the %>%
operator from the magrittr
p <- manhattanly(subset(HapMap, CHR %in% 4:7), snp = "SNP", gene = "GENE",
annotation1 = "DISTANCE", annotation2 = "EFFECTSIZE",
highlight = significantSNP)
# get the x and y coordinates from the pre-processed data
plotData <- manhattanr(subset(HapMap, CHR %in% 4:7), snp = "SNP", gene = "GENE")[["data"]]
# annotate the smallest p-value
annotate <- plotData[which.min(plotData$P),]
# x and y coordinates of SNP with smallest p-value
xc <- annotate$pos
yc <- annotate$logp
p %>% plotly::layout(annotations = list(
list(x = xc, y = yc,
text = paste0(annotate$SNP,"<br>","GENE: ",annotate$GENE),
font = list(family = "serif", size = 10))))
R Markdown
R Markdown
is a an R
software package that allows the creation of dynamic documents, i.e., embed R
code with text to create fully reproducible reports. Furthermore it allows easy creation of HTML
reports without knowing how to code in HTML
(such as this vignette). This means you can embed interactive manhattan and qq plots in HTML
reports using the manhattanly
package. For example, to embed the above manhattan plot I included the following code chunk in the .Rmd
manhattanly(subset(HapMap, CHR %in% 4:7), snp = "SNP", gene = "GENE")
The manhattanly
package splits up the data pre-processing from the rendering of the plot object (inspired by the heatmaply
package by Tal Galili). These are done by the manhattanr
and qqr
and functions:
# create an object of class `manhattanr`
manhattanrObject <- manhattanr(HapMap)
# whats in there
## List of 10
## $ data :'data.frame': 14412 obs. of 6 variables:
## ..$ CHR : int [1:14412] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ BP : int [1:14412] 937641 1136887 2116240 2310562 2681715 2917484 2942700 3298358 3501155 3676178 ...
## ..$ P : num [1:14412] 0.335 0.246 0.823 0.493 0.605 ...
## ..$ logp : num [1:14412] 0.4745 0.6093 0.0844 0.307 0.218 ...
## ..$ pos : num [1:14412] 937641 1136887 2116240 2310562 2681715 ...
## ..$ index: num [1:14412] 1 1 1 1 1 1 1 1 1 1 ...
## $ xlabel : chr "Chromosome"
## $ ticks : num [1:23] 1.24e+08 3.68e+08 5.88e+08 7.83e+08 9.68e+08 ...
## $ labs : int [1:23] 1 2 3 4 5 6 7 8 9 10 ...
## $ nchr : int 23
## $ pName : chr "P"
## $ snpName : logi NA
## $ geneName : logi NA
## $ annotation1Name: logi NA
## $ annotation2Name: logi NA
## - attr(*, "class")= chr "manhattanr"
# the data used for plotting is a data.frame
# this data.frame can be used with any graphics function such as in base R
# you do not need to use plotly
## CHR BP P logp pos index
## 1 1 937641 0.3353438 0.47450973 937641 1
## 2 1 1136887 0.2458571 0.60931719 1136887 1
## 3 1 2116240 0.8232859 0.08444933 2116240 1
## 4 1 2310562 0.4932038 0.30697357 2310562 1
## 5 1 2681715 0.6053916 0.21796358 2681715 1
## 6 1 2917484 0.1944431 0.71120743 2917484 1[["data"]])
## [1] TRUE
This manhattanrObject
which is of class manhattanr
can also be passed to the manhattanly
function (we omit the plot here for the sake of size of the rendered vignette):
We can specify more annotations in the data using the snp
, gene
, annotation1
and annotation2
# create an object of class `manhattanr`
manhattanrObject <- manhattanr(HapMap, snp = "SNP", gene = "GENE",
annotation1 = "DISTANCE", annotation2 = "EFFECTSIZE")
# the annotation columns are now part of the data.frame
## 1 1 937641 0.3353438 rs9697358 ISG15 1068 -0.0946 0.47450973
## 2 1 1136887 0.2458571 rs34945898 TNFRSF4 0 -0.0947 0.60931719
## 3 1 2116240 0.8232859 rs12034613 FP7162 0 -0.0741 0.08444933
## 4 1 2310562 0.4932038 rs4648633 MORN1 0 0.0146 0.30697357
## 5 1 2681715 0.6053916 rs4430271 MMEL1 127427 0.1234 0.21796358
## 6 1 2917484 0.1944431 rs6685625 ACTRT2 10421 0.1979 0.71120743
## pos index
## 1 937641 1
## 2 1136887 1
## 3 2116240 1
## 4 2310562 1
## 5 2681715 1
## 6 2917484 1[["data"]])
## [1] TRUE
Similarly the data used for the Q-Q plot can be created using the qqr
qqrObject <- qqr(HapMap)
## List of 6
## $ data :'data.frame': 14412 obs. of 3 variables:
## ..$ P : num [1:14412] 6.75e-10 3.41e-09 3.95e-09 4.71e-09 5.02e-09 ...
## ..$ OBSERVED: num [1:14412] 9.17 8.47 8.4 8.33 8.3 ...
## ..$ EXPECTED: num [1:14412] 4.46 3.98 3.76 3.61 3.51 ...
## $ pName : chr "P"
## $ snpName : logi NA
## $ geneName : logi NA
## $ annotation1Name: logi NA
## $ annotation2Name: logi NA
## - attr(*, "class")= chr "qqr"
## 4346 6.75010e-10 9.170690 4.459754
## 4347 3.41101e-09 8.467117 3.982633
## 4344 3.95101e-09 8.403292 3.760784
## 4338 4.70701e-09 8.327255 3.614656
## 4342 5.02201e-09 8.299122 3.505512
## 4341 6.22801e-09 8.205651 3.418362
This qqrObject
which is of class qqr
can also be passed to the qqly
function (we omit the plot here for the sake of size of the rendered vignette):