ACSNMineR is an R package, freely available.
ACSN stands for Atlas of Cancer Signaling Networks, and shows gene interaction in pathways relevant to cancer.
This package is designed for an easy analysis of gene maps (either user imported from gmt files or ACSN maps). Its aim is to allow a statistical analysis of statistically enriched or depleted pathways from a user imported gene list, as well as a graphic representation of results.
This readme contains:
Gmt files can be imported thanks to the format_from_gmt function. Let’s use saved data from the package:
# Retrieve path of the example gmt
file<-system.file("extdata", "cellcycle_short.gmt", package = "ACSNMineR")
# Then import it
gmt<-ACSNMineR::format_from_gmt(file)
V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 |
---|---|---|---|---|---|---|---|---|---|
E2F1 | 19 | ATM | ATR | CHEK2 | CREBBP | TFDP1 | E2F1 | EP300 | HDAC1 |
MAPK | 9 | ATM | GSK3B | MAX | MDM2 | CDKN1A | TP53 | BCL2 | MYC |
MOMP_REGULATION | 13 | CDK1 | GSK3B | PPP1CA | PPP1CB | PPP1CC | TP53 | BBC3 | BCL2 |
HEDGEHOG | 9 | CREBBP | GSK3B | HDAC1 | CCNB1 | CCNB2 | CCNB3 | CCNC | BCL2 |
P21CIP | 3 | AKT1 | PCNA | CDKN1A | |||||
G2_CC_PHASE | 4 | CDK1 | CDC25C | WEE1 | CCNB1 | ||||
CYCLIND | 9 | AKT1 | CDC37 | CDK4 | CDK6 | GSK3B | HSP90AA1 | CCND1 | CCND2 |
E2F4 | 8 | TFDP2 | E2F4 | E2F5 | HDAC1 | SIN3B | SUV39H1 | RBL1 | RBL2 |
CYCLINE | 7 | CDK2 | CDKN3 | CCNE1 | CCNE2 | RBL1 | RBL2 | CDKN1B | |
E2F6 | 13 | EHMT2 | TFDP1 | TFDP2 | E2F6 | E2F8 | EED | EHMT1 | EPC1 |
ACSN maps are built-in and can easily be accessed through ACSNEnrcihment::ACSN_maps:
# Name of available maps:
names(ACSNMineR::ACSN_maps)
Apoptosis |
CellCycle |
DNA_repair |
EMT_motility |
Survival |
Master |
#And accessing them:
ACSNMineR::ACSN_maps$CellCycle
V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 |
---|---|---|---|---|---|---|---|---|---|
APC | 15 | ANAPC1 | ANAPC2 | CDC27 | ANAPC4 | ANAPC5 | CDC16 | ANAPC7 | CDC23 |
APOPTOSIS_ENTRY | 10 | AKT1 | ATM | ATR | CHEK1 | CHEK2 | E2F1 | MDM2 | NBN |
CDC25 | 2 | CDC25C | CHEK2 | ||||||
CYCLINA | 6 | CDK2 | CCNA1 | CCNA2 | RBL1 | RBL2 | CDKN1B | ||
CYCLINB | 7 | ATM | CDK1 | MDM2 | PKMYT1 | CCNB1 | CCNB2 | CCNB3 | |
CYCLINC | 2 | CDK3 | CCNC |
The gene set that was used for tests is the following:
ACSNMineR::genes_test
ATM |
ATR |
CHEK2 |
CREBBP |
TFDP1 |
E2F1 |
EP300 |
HDAC1 |
KAT2B |
GTF2H1 |
GTF2H2 |
GTF2H2B |
Gene set enrichment for a single set can be performed by calling:
Example<-ACSNMineR::enrichment(ACSNMineR::genes_test,
min_module_size = 10,
threshold = 0.05,
maps = list(cellcycle = ACSNMineR::ACSN_maps$CellCycle))
module | module_size | nb_genes_in_module | genes_in_module | universe_size | nb_genes_in_universe | p.value | p.value.corrected | test |
---|---|---|---|---|---|---|---|---|
APOPTOSIS_ENTRY | 10 | 4 | ATM ATR CHEK2 E2F1 | 227 | 12 | 0.0023144 | 0.011572 | greater |
E2F1 | 19 | 12 | ATM ATR CHEK2 CREBBP TFDP1 E2F1 EP300 HDAC1 KAT2B GTF2H1 GTF2H2 GTF2H2B | 227 | 12 | 0.0000000 | 0.000000 | greater |
Where:
genes_test is a character vector to test
min_module_size is the minimal size of a module to be taken into account
threshold is the maximal p-value that will be displayed in the results (all modules with p-values higher than threshold will be removed)
maps is a list of maps -here we take the cell cycle map from ACSN- imported through the format_from_gmt() function of the package
Gene set enrichment for multiple sets/cohorts can be performed by calling:
Example<-ACSNMineR::multisample_enrichment(Genes_by_sample = list(set1 = ACSNMineR::genes_test[-1],
set2 = ACSNMineR::genes_test[-2]),
maps = ACSNMineR::ACSN_maps$CellCycle,
min_module_size = 10,
cohort_threshold = FALSE)
print(Example[[1]])
module | module_size | nb_genes_in_module | genes_in_module | universe_size | nb_genes_in_universe | p.value | p.value.corrected | test |
---|---|---|---|---|---|---|---|---|
E2F1 | 19 | 11 | ATR CHEK2 CREBBP TFDP1 E2F1 EP300 HDAC1 KAT2B GTF2H1 GTF2H2 GTF2H2B | 227 | 11 | 0 | 1e-07 | greater |
print(Example[[2]])
module | module_size | nb_genes_in_module | genes_in_module | universe_size | nb_genes_in_universe | p.value | p.value.corrected | test |
---|---|---|---|---|---|---|---|---|
E2F1 | 19 | 11 | ATM CHEK2 CREBBP TFDP1 E2F1 EP300 HDAC1 KAT2B GTF2H1 GTF2H2 GTF2H2B | 227 | 11 | 0 | 1e-07 | greater |
Where:
Genes_by_sample is a list of character vectors to test
min_module_size is the minimal size of a module to be taken into account
maps is a list of maps -here we take the cell cycle map from ACSN - imported through the format_from_gmt() function of the package
Results from the enrichment analysis function can be transformed to images thanks to the represent enrichment function. Two different plot are available: heatmap and barplot.
Heatmaps for single sample or multiple sample representing p-values can be easily generated thanks to the represent_enrichment function.
ACSNMineR::represent_enrichment(enrichment = list(
SampleA = ACSNMineR::enrichment_test[1:10,],
SampleB = ACSNMineR::enrichment_test[3:10,]),
plot = "heatmap",
scale = "reverselog",
low = "steelblue" , high ="white",
na.value = "grey")+theme(axis.text = element_text(size = 6,angle = 45),
legend.text = element_text(size = 6),
legend.title = element_text(size = 8))
Where:
enrichment is the result from the enrichment or multisample_enrichment function
scale can be set to either identity or log and will affect the gradient of colors
low: the color for the low (significant) p-values
high: color for the high (less significant) p-values
na.value is the color in which tiles which have “NA” should appear
A barplot can be achieved by using the following:
ACSNMineR::represent_enrichment(enrichment = list(
SampleA = ACSNMineR::enrichment_test[1:10,],
SampleB = ACSNMineR::enrichment_test[3:10,]),
plot = "bar",
scale = "reverselog")
Where:
enrichment is the result from the enrichment or multisample_enrichment function
scale can be set to either identity or log and will affect the gradient of colors
nrow is the number of rows that should be used to plot all barplots (default is 1)