Transcription factors and microRNAs are important for regulating the gene expression in normal physiology and pathological conditions. Many bioinformatic tools were built to predict and identify transcription factors and microRNA targets and their role in development of diseases including cancers. The availability of public access high-throughput data allowed for data-driven discoveries and validations of these predictions. Here, we build on that kind of tools and integrative analyses to provide a tool to access, manage and visualize data from open source databases. cRegulome provides a programmatic access to the regulome (microRNA and transcription factor) correlations with target genes in cancer. The package obtains a local instance of Cistrome Cancer and miRCancerdb databases and provides objects and methods to interact with and visualize the correlation data.
To get started with cRegulome, we show a very quick example. We first start by downloading a small test database file, make a simple query and convert the output to a cRegulome object to print and visualize.
# load required libraries
library(cRegulome)
library(RSQLite)
library(ggplot2)
# download the db file when using it for the first time
destfile = paste(tempdir(), 'cRegulome.db.gz', sep = '/')
if(!file.exists(destfile)) {
get_db(test = TRUE)
}
# connect to the db file
db_file = paste(tempdir(), 'cRegulome.db', sep = '/')
conn <- dbConnect(SQLite(), db_file)
# alternative to downloading the database file
wget https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/9537385/cRegulome.db.gz
gunzip cRegulome.db.gz
# enter a custom query with different arguments
dat <- get_mir(conn,
mir = 'hsa-let-7g',
study = 'STES',
min_abs_cor = .3,
max_num = 5)
# make a cmicroRNA object
ob <- cmicroRNA(dat)
# print object
ob
## A cmicroRNA object: microRNA-gene correlations in Cancer
## Contains:
## 1 Cancer study/ies: STES
## 1 microRNA/s: hsa-let-7g
## 5 features: ARHGEF15 LSM11 STARD13 CXorf36 IL13
# plot object
cor_plot(ob)
The two main sources of data used by this package are Cistrome Cancer and miRCancerdb databases. Cistrome Cancer is based on an integrative analysis of The Cancer Genome Atlas (TCGA) and public ChIP-seq data. It provides calculated correlations of (n = 320) transcription factors and their target genes in (n = 29) cancer study. In addition, Cistrome Cancer provides the transcription factors regulatory potential to target and non-target genes. miRCancerdb uses TCGA data and TargetScan annotations to correlate known microRNAs (n = 750) and target and non-target genes in (n = 25) cancer studies.
cRegulome obtains a pre-build SQLite database file of the Cistrome Cancer and miRCancerdb databases. The details of this build is provided at cRegulomedb in addition to the scripts used to pull, format and deposit the data at an on-line repository. Briefly, the SQLite database consist of 4 tables cor_mir
and cor_tf
for correlation values and targets_mir
and targets_tf
for microRNA miRBase ID and transcription factors symbols to genes mappings. Two indices were created to facilitate the database search using the miRBase IDs and transcription factors symbols. The database file can be downloaded using the function get_db
.
To show the details of the database file, the following code connects to the database and show the names of tables and fields in each of them.
# table names
tabs <- dbListTables(conn)
print(tabs)
## [1] "cor_mir" "cor_tf" "targets_mir" "targets_tf"
# fields/columns in the tables
for(i in seq_along(tabs)) {
print(dbListFields(conn, tabs[i]))
}
## [1] "mirna_base" "feature" "STES"
## [1] "tf" "feature" "STES"
## [1] "mirna_base" "feature"
## [1] "tf" "feature" "study"
To query the database using cRegulome, we provide two main functions; get_mir
and get_tf
for querying microRNA and transcription factors correlations respectively. Users need to provide the proper IDs for microRNA, transcription factor symbols and/or TCGA study identifiers. microRNAs are referred to by the official miRBase IDs, transcription factors by their corresponding official gene symbols that contains them and TCGA studies with their common identifiers. In either cases, the output of calling the these functions is a tidy data frame of 4 columns; mirna_base
/ tf
, feature
, cor
and study
These correspond to the miRBase IDs or transcription factors symbol, gene symbol, correlation value and the TCGA study identifier.
Here we show an example of such a query. Then, we illustrate how this query is executed on the database using basic RSQLite
and dbplyr
which is what the get_*
functions are doing.
# query the db for two microRNAs
dat_mir <- get_mir(conn,
mir = c('hsa-let-7g', 'hsa-let-7i'),
study = 'STES')
# query the db for two transcription factors
dat_tf <- get_tf(conn,
tf = c('LEF1', 'MYB'),
study = 'STES')
# show first 6 line of each of the data.frames
head(dat_mir); head(dat_tf)
## mirna_base feature cor study
## 1 hsa-let-7g ABL2 0.11 STES
## 2 hsa-let-7g ACTR10 -0.16 STES
## 3 hsa-let-7g ACVR1C -0.13 STES
## 4 hsa-let-7g ADAMTS1 0.27 STES
## 5 hsa-let-7g ADAMTS5 0.20 STES
## 6 hsa-let-7g ADCY9 0.18 STES
## tf feature cor study
## 1 LEF1 A2M 0.55 STES
## 2 LEF1 A4GALT 0.40 STES
## 3 LEF1 ABCA1 0.44 STES
## 4 LEF1 ABCA6 0.54 STES
## 5 LEF1 ABCA8 0.41 STES
## 6 LEF1 ABCA9 0.48 STES
Two S3 objects are provided by cRegulome to store and dispatch methods on the correlation data. cmicroRNA and cTF for microRNA and transcription factors respectively. The structure of these objects is very similar. Basically, as all S3 objects, it’s a list of 4 items; microRNA or TF for the regulome element, features for the gene hits, studies for the TCGA studies and finally corr is either a data.frame
when the object has data.from a single TCGA study or a named list of data.frames when it has data from multiple studies. Each of these data.frames has the regulome element (microRNAs or transcription factors) in columns and features/genes in rows.
To construct these objects, users need to call a constructor function with the corresponding names on the data.frame output form get_*
. The reverse is possible by calling the function cor_tidy
on the object to get back the tidy data.frame.
# explore the cmicroRNA object
ob_mir <- cmicroRNA(dat_mir)
class(ob_mir)
## [1] "cmicroRNA"
str(ob_mir)
## List of 4
## $ microRNA: chr [1:2] "hsa-let-7g" "hsa-let-7i"
## $ features: chr [1:612] "ABL2" "ACTR10" "ACVR1C" "ADAMTS1" ...
## $ studies : chr "STES"
## $ corr :List of 1
## ..$ STES:'data.frame': 612 obs. of 2 variables:
## .. ..$ hsa-let-7g: num [1:612] NA NA NA NA 0.11 NA NA NA -0.16 NA ...
## .. ..$ hsa-let-7i: num [1:612] -0.2 0.12 0.14 -0.1 0.27 -0.12 0.13 0.1 NA 0.23 ...
## - attr(*, "class")= chr "cmicroRNA"
# explore the cTF object
ob_tf <- cTF(dat_tf)
class(ob_tf)
## [1] "cTF"
str(ob_tf)
## List of 4
## $ TF : chr [1:2] "LEF1" "MYB"
## $ features: chr [1:1953] "A2M" "A4GALT" "ABCA1" "ABCA6" ...
## $ studies : chr "STES"
## $ corr :List of 1
## ..$ STES:'data.frame': 1953 obs. of 2 variables:
## .. ..$ LEF1: num [1:1953] 0.55 0.4 0.44 0.54 0.41 0.48 0.56 0.43 0.54 0.45 ...
## .. ..$ MYB : num [1:1953] NA NA NA NA NA NA NA NA NA NA ...
## - attr(*, "class")= chr "cTF"
cRegulome provides S3 methods to interact a visualize the correlations data in the cmicroRNA and cTF objects. Table 1 provides an over view of these functions. These methods dispatch directly on the objects and could be customized and manipulated in the same way as their generics.
# cmicroRNA object methods
methods(class = 'cmicroRNA')
## [1] cor_hist cor_igraph cor_joy cor_plot
## [5] cor_tidy cor_upset cor_venn_diagram print
## see '?methods' for accessing help and source code
# cTF object methods
methods(class = 'cTF')
## [1] cor_hist cor_igraph cor_joy cor_plot
## [5] cor_tidy cor_upset cor_venn_diagram print
## see '?methods' for accessing help and source code
# tidy method
head(cor_tidy(ob_mir))
## mirna_base feature cor study
## 5 hsa-let-7g ABL2 0.11 STES
## 9 hsa-let-7g ACTR10 -0.16 STES
## 11 hsa-let-7g ACVR1C -0.13 STES
## 13 hsa-let-7g ADAMTS1 0.27 STES
## 16 hsa-let-7g ADAMTS5 0.20 STES
## 17 hsa-let-7g ADCY9 0.18 STES
# cor_hist method
cor_hist(ob_mir,
breaks = 100,
main = '', xlab = 'Correlation')
dev.off()
## null device
## 1
# cor_joy method
cor_joy(ob_mir) +
labs(x = 'Correlation', y = '')
dev.off()
## null device
## 1
# cor_venn_diagram method
cor_venn_diagram(ob_mir, cat.default.pos = 'text')
dev.off()
## null device
## 1
# cor_upset method
cor_upset(ob_mir)
dev.off()
## null device
## 1
Comments, issues and contributions are welcomed at: https://github.com/MahShaaban/cRegulome
Please cite:
citation('cRegulome')