The wTO
package, can be used on any kind of count data, but we highly
recommend to use normalized and quality controlled data according to the
data type such as RMA, MD5 for microarray, RPKM, TPM or PKM for RNA-seq,
sample normalized data for metagenomics.
As an example, the package contains three data sets, two from microarray
chips (Microarray_Expression1
and Microarray_Expression2
), and one
from abundance in metagenomics (metagenomics_abundance
).
The wTO method is a method for building networks based on pairwise correlations normalized and corrected by all shared correlations. For this reason, the user can choose a set of factors of interest, called here Overlaps, those are the nodes that will be corrected and normalized by all other factors in the dataset. Those factors can be Transcription Factor, long non coding RNAs, a set of species of interest etc.
The wTO
package contains 2 data sets that were obtained using
expression arrays (Microarray_Expression1
and
Microarray_Expression2
), they were previously normalized and the
quality control was done. We will use it to build the wTO network using
the different methods implemented in the package.
First we are going to inspect those data sets.
require(wTO)
#> Loading required package: wTO
require(magrittr)
#> Loading required package: magrittr
data("ExampleGRF")
data("Microarray_Expression1")
data("Microarray_Expression2")
dim(Microarray_Expression1)
#> [1] 268 18
dim(Microarray_Expression2)
#> [1] 268 18
Microarray_Expression1[1:5,1:5]
#> ID1 ID2 ID3 ID4 ID5
#> FAM122B 5.325653 5.039814 5.099828 5.053185 5.213816
#> DEFB108B 2.038747 1.965599 1.925807 1.977435 2.079381
#> CCSER2 4.973347 4.865783 4.818910 5.024392 4.697314
#> GPD2 5.453287 5.595471 5.223886 5.130226 5.370672
#> HECW1 4.350837 4.279759 4.218375 4.472152 4.408025
Microarray_Expression2[1:5,1:5]
#> ID1 ID2 ID3 ID4 ID5
#> FAM122B 5.532142 6.395654 5.159082 5.806040 5.339848
#> DEFB108B 2.456210 1.993044 2.251673 2.440018 2.493610
#> CCSER2 5.164945 4.923511 4.979691 5.080116 5.014569
#> GPD2 5.742455 5.649180 5.430411 6.007418 5.662126
#> HECW1 4.595407 5.243644 4.802716 4.957706 4.738554
head(ExampleGRF)
#> x
#> 1 ACAD8
#> 2 ANAPC2
#> 3 ANKRD22
#> 4 ANKRD2
#> 5 ARHGAP35
#> 6 ASH1L
Please, note that the individuals are in the columns and the gene
expressions are in the rows. Moreover, the row.names()
are the names
of the genes. The list of genes that will be used for measuring the
interactions are in ExampleGRF. There should always be more than 2 of
them contained in the expression set. If there are no common nodes to be
measured, the method will retun an error.
sum(ExampleGRF$x %in% row.names(Microarray_Expression1))
#> [1] 168
We can run the wTO
package with 3 modes. The first one is running the
wTO without resampling. For that we can use the wTO()
. The second
one, wTO.Complete()
, gives you the whole diagnosis plot,
hard-threshold on the ωi, j, the
ωi, j, |ωi, j| values and p-values.
The last mode, wTO.fast()
, just returns the ωi, j
values and p-value.
wTO()
function:To use the wTO()
function, the first step is to compute the
correlation among the nodes of interest using CorrelationOverlap()
and
then use it as input for the wTO()
. In the first function the user is
allowed to choose the method for correlation between Pearson ('p') or
Spearman ('s'). The second function allows the choice between absolute
values ('abs') or signed values ('sign'). Please, keep in mind that the
result of the wTO()
function is a matrix, and it can be easily
converted to an edge list using the function wTO.in.line()
.
wTO_p_abs = CorrelationOverlap(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p') %>% wTO(., sign = 'abs')
wTO_p_abs[1:5,1:5]
#> ZNF333 ZNF28 ANKRD22 ZFR TRIM33
#> ZNF333 0.352 0.237 0.269 0.242 0.241
#> ZNF28 0.237 0.287 0.209 0.206 0.239
#> ANKRD22 0.269 0.209 0.299 0.199 0.252
#> ZFR 0.242 0.206 0.199 0.328 0.258
#> TRIM33 0.241 0.239 0.252 0.258 0.361
wTO_p_abs %<>% wTO.in.line()
head(wTO_p_abs)
#> Node.1 Node.2 wTO
#> 1: ZNF333 ZNF28 0.237
#> 2: ZNF333 ANKRD22 0.269
#> 3: ZNF28 ANKRD22 0.209
#> 4: ZNF333 ZFR 0.242
#> 5: ZNF28 ZFR 0.206
#> 6: ANKRD22 ZFR 0.199
wTO_s_abs = CorrelationOverlap(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 's') %>% wTO(., sign = 'abs') %>% wTO.in.line()
head(wTO_s_abs)
#> Node.1 Node.2 wTO
#> 1: ZNF333 ZNF28 0.236
#> 2: ZNF333 ANKRD22 0.258
#> 3: ZNF28 ANKRD22 0.215
#> 4: ZNF333 ZFR 0.264
#> 5: ZNF28 ZFR 0.187
#> 6: ANKRD22 ZFR 0.193
wTO_p_sign = CorrelationOverlap(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p') %>% wTO(., sign = 'sign') %>% wTO.in.line()
head(wTO_p_sign)
#> Node.1 Node.2 wTO
#> 1: ZNF333 ZNF28 -0.099
#> 2: ZNF333 ANKRD22 -0.185
#> 3: ZNF28 ANKRD22 0.076
#> 4: ZNF333 ZFR -0.117
#> 5: ZNF28 ZFR -0.077
#> 6: ANKRD22 ZFR -0.036
wTO_s_sign = CorrelationOverlap(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 's') %>% wTO(., sign = 'sign') %>% wTO.in.line()
head(wTO_s_sign)
#> Node.1 Node.2 wTO
#> 1: ZNF333 ZNF28 -0.064
#> 2: ZNF333 ANKRD22 -0.143
#> 3: ZNF28 ANKRD22 0.029
#> 4: ZNF333 ZFR -0.164
#> 5: ZNF28 ZFR -0.011
#> 6: ANKRD22 ZFR 0.024
wTO.Complete()
function:The usage of the function wTO.Complete()
is straight-forward. No
plug-in-functions() are necessary. The arguments parsed to the
wTO.Complete()
functions are the number k of threads that should be
used for computing the ωi, j, the amount of
replications, n, the expression matrix, Data, the Overlapping
nodes, the correlation method (Pearson or Spearman) for the
method_resampling that should be Bootstrap, BlockBootstrap or
Reshuffle, the p-value correction method, pvalmethod (any from the
p.adjust.methods), if the correlation should be saved, the δ is the
expected difference, expected.diff, between the resampled values and
the ωi, j and also if the diagnosis plot should be
plotted.
wTO_s_sign_complete = wTO.Complete(k = 5, n = 250, Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p', method_resampling = 'Bootstrap', pvalmethod = 'BH', savecor = TRUE, expected.diff = 0.2, plot = TRUE)
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> Simulations are done.
#> Computing p-values
#> Computing cutoffs
#> Done!
The diagnosis plot shows the quality of the resampling (first two plots). The closer the purple line to the black line, the better. The ωi, j vs |ωi, j| shows the amount of ωi, j being affected by cancellations on the heuristics of the method, the more similar to a smile plot, the better. The last two plots show the relashionship between p-values and the ωi, j. It is expected that higher ω's presents lower p-values.
The resulting object from the wTO.Complete()
function is a list
containing: * wTO an edge list of informations such as the signed and
unsigned ωi, j values its raw and adjusted p-values. *
Correlation values, also as an edge list * Quantiles, the quantiles
from the empirical distribution and the calculated ω's from the
original data, for both signed and unsigned networks.
wTO_s_sign_complete
#> $wTO
#> Node.1 Node.2 wTO_sign wTO_abs pval_sig pval_abs Padj_sig
#> 1: ZNF333 ZNF28 -0.099 0.237 0.168 0.004 0.3607366
#> 2: ZNF333 ANKRD22 -0.185 0.269 0.188 0.016 0.3607366
#> 3: ZNF333 ZFR -0.117 0.242 0.180 0.012 0.3607366
#> 4: ZNF333 TRIM33 0.007 0.241 0.136 0.008 0.3607366
#> 5: ZNF333 RIMS3 -0.325 0.409 0.144 0.000 0.3607366
#> ---
#> 14024: ANAPC2 SBNO2 -0.147 0.298 0.156 0.000 0.3607366
#> 14025: ANAPC2 ZNF528 -0.142 0.222 0.152 0.016 0.3607366
#> 14026: TIGD7 SBNO2 -0.297 0.354 0.128 0.004 0.3607366
#> 14027: TIGD7 ZNF528 -0.099 0.219 0.212 0.032 0.3607366
#> 14028: SBNO2 ZNF528 0.141 0.311 0.368 0.024 0.4030531
#> Padj_abs
#> 1: 0.01167541
#> 2: 0.02395390
#> 3: 0.02083624
#> 4: 0.01712559
#> 5: 0.00000000
#> ---
#> 14024: 0.00000000
#> 14025: 0.02395390
#> 14026: 0.01167541
#> 14027: 0.03670450
#> 14028: 0.03016774
#>
#> $Correlation
#> Node.1 Node.2 Cor
#> 1: FAM122B DEFB108B 0.366857931
#> 2: FAM122B CCSER2 0.278870911
#> 3: DEFB108B CCSER2 -0.252482453
#> 4: FAM122B GPD2 -0.005649124
#> 5: DEFB108B GPD2 -0.107064848
#> ---
#> 35774: TRIM23 ZNF528 0.054249174
#> 35775: ZNF559 ZNF528 -0.218309729
#> 35776: ANAPC2 ZNF528 -0.013821370
#> 35777: TIGD7 ZNF528 0.011807143
#> 35778: SBNO2 ZNF528 0.092317502
#>
#> $Quantiles
#> 0.1% 2.5% 10% 90% 97.5% 99.9%
#> Empirical.Quantile -0.56 -0.46 -0.34 0.37 0.48 0.56
#> Quantile -0.50 -0.40 -0.28 0.32 0.43 0.52
#> Empirical.Quantile.abs 0.21 0.24 0.27 0.47 0.53 0.57
#> Quantile.abs 0.17 0.19 0.21 0.41 0.47 0.53
#>
#> attr(,"class")
#> [1] "wTO" "list"
wTO.fast()
function:The wTO.fast()
function is a simplified verion of the wTO.Complete()
function, that doesn't return diagnosis, correlation, nor the quantiles,
but allows the user to choose the method for correlation, the sign of
the ω to be calculated and the resampling method should be one of the
two Bootrastap or BlockBootstrap. The p-values are the raw
p-values and if the user desires to calculate its correction it can be
easily done as shown above.
fast_example = wTO.fast(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 's', sign = 'sign', delta = 0.2, n = 250, method_resampling = 'Bootstrap')
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> ..........................................................................................................................................................................................................................................................Done!
head(fast_example)
#> Node.1 Node.2 wTO pval
#> 1: ZNF333 ZNF28 -0.064 0.264
#> 2: ZNF333 ANKRD22 -0.143 0.236
#> 3: ZNF28 ANKRD22 0.029 0.188
#> 4: ZNF333 ZFR -0.164 0.232
#> 5: ZNF28 ZFR -0.011 0.256
#> 6: ANKRD22 ZFR 0.024 0.264
fast_example$adj.pval = p.adjust(fast_example$pval)
Along with the expression data, the wTO
package also includes a
metagenomics dataset that is the abundance of some OTU's in bacterias
collected since 1997. More about this data can be found at
\[
The OTU (Operational Taxonomic Units) contains the taxonomy of the particular OTU and from Week1 to Week98, the abundance of that particular OTU in that week.
data("metagenomics_abundance")
metagenomics_abundance[2:10, 1:10]
#> OTU
#> 2 Root;k__Archaea;p__Euryarchaeota;c__Thermoplasmata;o__E2;f__MarinegroupII;g__;s__
#> 3 Root;k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__OCS155;g__;s__
#> 4 Root;k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Microbacteriaceae;g__;s__
#> 5 Root;k__Bacteria;p__Bacteroidetes;c__Cytophagia;o__Cytophagales;f__Flammeovirgaceae;g__;s__
#> 6 Root;k__Bacteria;p__Bacteroidetes;c__Cytophagia;o__Cytophagales;f__Flammeovirgaceae;g__JTB248;s__
#> 7 Root;k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__;g__;s__
#> 8 Root;k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Cryomorphaceae;g__;s__
#> 9 Root;k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Cryomorphaceae;g__Fluviicola;s__
#> 10 Root;k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Flavobacteriaceae;g__;s__
#> Week1 Week2 Week3 Week4 Week5 Week6 Week7 Week8 Week9
#> 2 1 6 0 0 0 1 0 1 0
#> 3 0 0 0 0 0 0 0 5 0
#> 4 0 0 0 0 0 0 0 0 0
#> 5 0 1 0 0 0 0 0 0 0
#> 6 0 0 0 0 0 0 0 0 0
#> 7 0 1 0 0 0 0 0 1 0
#> 8 0 0 0 0 0 0 0 1 0
#> 9 0 0 0 0 0 0 0 1 0
#> 10 0 1 0 0 0 0 0 7 0
Before we are able to define the network, we have first to understand
the patterns of autocorrelation of each species, and then define the
lag, that will be used for the BlockBootstrap resampling in the
wTO.Complete()
or fast.wTO()
functions. To define the lag, we use
autocorrelation function acf()
.
row.names(metagenomics_abundance) = metagenomics_abundance$OTU
metagenomics_abundance = metagenomics_abundance[,-1]
par(mfrow = c(3,3))
for ( i in 1:nrow(metagenomics_abundance)){
acf(t(metagenomics_abundance[i,]))
}
Because most of them have only a high autocorrelation with itself or maximum 2 weeks, we will use a lag of 2 for the blocks used in the bootstrap.
The functions wTO.fast()
and wTO.Complete()
are able to accomodate
the lag parameter, therefore, they will be used here.
Meta_fast = wTO.fast(Data = metagenomics_abundance, Overlap = row.names(metagenomics_abundance), method = 'p', sign = 'sign', n = 250, method_resampling = 'BlockBootstrap', lag = 2)
#> There are 67 overlapping nodes, 67 total nodes and 98 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> ..........................................................................................................................................................................................................................................................Done!
Meta_Complete = wTO.Complete(k = 1, n = 250, Data = metagenomics_abundance, Overlap = row.names(metagenomics_abundance), method = 's' , method_resampling = 'BlockBootstrap', lag = 2 )
#> There are 67 overlapping nodes, 67 total nodes and 98 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> Simulations are done.
#> Computing p-values
#> Computing cutoffs
#> Done!
From the expression data-sets, we are able to draw a Consensus Network.
For that, the function wTO.Consensus()
can be used. This function
works in a special way, that the user should pass a list of data.frames
containing the Nodes names and the wTO and p-values. We show an example
above.
Let's calculate the networks the same way we did in the Section Genomic data.
wTO_Data1 = wTO.fast(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p', n = 250)
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> ..........................................................................................................................................................................................................................................................Done!
wTO_Data2 = wTO.fast(Data = Microarray_Expression2, Overlap = ExampleGRF$x, method = 'p', n = 250)
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> ..........................................................................................................................................................................................................................................................Done!
Now, let's combine both networks in one Consensus Network.
CN_expression = wTO.Consensus(data = list (wTO_Data1 = data.frame
(Node.1 = wTO_Data1$Node.1,
Node.2 = wTO_Data1$Node.2,
wTO = wTO_Data1$wTO,
pval = wTO_Data1$pval)
, wTO_Data2C = data.frame
(Node.1 = wTO_Data2$Node.1,
Node.2 = wTO_Data2$Node.2,
wTO = wTO_Data2$wTO,
pval = wTO_Data2$pval)))
#> Joining by: Node.1, Node.2
#> Joining by: Node.1, Node.2
#> Joining by: ID
#> Total common nodes: 168
Or using the wTO.Complete()
:
wTO_Data1C = wTO.Complete(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p', n = 250, k = 5)
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> Simulations are done.
#> Computing p-values
#> Computing cutoffs
#> Done!
wTO_Data2C = wTO.Complete(Data = Microarray_Expression2, Overlap = ExampleGRF$x, method = 'p', n = 250, k = 5)
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> Simulations are done.
#> Computing p-values
#> Computing cutoffs
#> Done!
Now, let's combine both networks in one Consensus Network.
CN_expression = wTO.Consensus(data = list (wTO_Data1C = data.frame
(Node.1 = wTO_Data1C$wTO$Node.1,
Node.2 = wTO_Data1C$wTO$Node.2,
wTO = wTO_Data1C$wTO$wTO_sign,
pval = wTO_Data1C$wTO$pval_sig), wTO_Data2C = data.frame
(Node.1 = wTO_Data2C$wTO$Node.1,
Node.2 = wTO_Data2C$wTO$Node.2,
wTO = wTO_Data2C$wTO$wTO_sign,
pval = wTO_Data2C$wTO$pval_sig)))
#> Joining by: Node.1, Node.2
#> Joining by: Node.1, Node.2
#> Joining by: ID
#> Total common nodes: 168
head(CN_expression)
#> Node.1 Node.2 CN pval.fisher
#> 1 ZNF333 ZNF28 -0.1191288 0.06006662
#> 2 ZNF333 ANKRD22 -0.1400000 0.13016905
#> 3 ZNF333 ZFR -0.1091659 0.10147819
#> 4 ZNF333 TRIM33 -0.0240000 0.11707440
#> 5 ZNF333 RIMS3 -0.2798101 0.10797662
#> 6 ZNF333 ZNF595 0.1542850 0.11529832
The wTO
package also includes an interactive visualization tool that
can be used to inspect the results of the wTO netwoks or Consensus
Network.
The arguments given to this function are the Nodes names, its wTO and
p-values. Optionals are the cutoffs that can be applied to the p-value
or to the wTO value. We highly reccomend using both by subseting the
data previous to the visualization. The layout of the network can be
also chosen from a variety that are implemented in igraph package, for
the the Make_Cluster argument many clustering algorithms that are
implemented in igraph can be used. The final graph can be exported as an
html
or as png
.
Visualization = NetVis(Node.1 = CN_expression$Node.1,
Node.2 = CN_expression$Node.2,
wTO = CN_expression$CN,
pval = CN_expression$pval.fisher, cutoff = list(kind = 'pval', value = 0.001), MakeGroups = 'louvain', layout = 'layout_components')
#> Joining by: id
CN_expression_filtered = subset(CN_expression, abs(CN_expression$CN)> 0.4 & CN_expression$pval.fisher < 0.0001)
dim(CN_expression_filtered)
#> [1] 45 4
Visualization2 = NetVis(
Node.1 = CN_expression_filtered$Node.1,
Node.2 = CN_expression_filtered$Node.2,
wTO = CN_expression_filtered$CN,
pval = CN_expression_filtered$pval.fisher,
cutoff = list(kind = 'pval', value = 0.001),
MakeGroups = 'louvain',
layout = 'layout_components', path = 'Vis.html')
#> Joining by: id
#> Vis.html
sessionInfo()
#> R version 3.4.4 (2018-03-15)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] magrittr_1.5 wTO_1.6.1
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_0.12.17 visNetwork_2.0.3 digest_0.6.15
#> [4] rprojroot_1.3-2 plyr_1.8.4 jsonlite_1.5
#> [7] backports_1.1.2 evaluate_0.10.1 som_0.3-5.1
#> [10] stringi_1.2.2 reshape2_1.4.3 data.table_1.11.4
#> [13] rmarkdown_1.10 tools_3.4.4 stringr_1.3.1
#> [16] htmlwidgets_1.2 igraph_1.2.1 parallel_3.4.4
#> [19] yaml_2.1.19 compiler_3.4.4 pkgconfig_2.0.1
#> [22] htmltools_0.3.6 knitr_1.20