Input data

The wTO package, can be used on any kind of count data, but we highly recommend to use normalized and quality controlled data according to the data type such as RMA, MD5 for microarray, RPKM, TPM or PKM for RNA-seq, sample normalized data for metagenomics.

As an example, the package contains three data sets, two from microarray chips (Microarray_Expression1 and Microarray_Expression2), and one from abundance in metagenomics (metagenomics_abundance).

wTO

The wTO method is a method for building networks based on pairwise correlations normalized and corrected by all shared correlations. For this reason, the user can choose a set of factors of interest, called here Overlaps, those are the nodes that will be corrected and normalized by all other factors in the dataset. Those factors can be Transcription Factor, long non coding RNAs, a set of species of interest etc.

Genomic data

The wTO package contains 2 data sets that were obtained using expression arrays (Microarray_Expression1 and Microarray_Expression2), they were previously normalized and the quality control was done. We will use it to build the wTO network using the different methods implemented in the package.

First we are going to inspect those data sets.

require(wTO)
#> Loading required package: wTO
require(magrittr)
#> Loading required package: magrittr
data("ExampleGRF")
data("Microarray_Expression1")
data("Microarray_Expression2")

dim(Microarray_Expression1)
#> [1] 268  18
dim(Microarray_Expression2)
#> [1] 268  18

Microarray_Expression1[1:5,1:5]
#>               ID1      ID2      ID3      ID4      ID5
#> FAM122B  5.325653 5.039814 5.099828 5.053185 5.213816
#> DEFB108B 2.038747 1.965599 1.925807 1.977435 2.079381
#> CCSER2   4.973347 4.865783 4.818910 5.024392 4.697314
#> GPD2     5.453287 5.595471 5.223886 5.130226 5.370672
#> HECW1    4.350837 4.279759 4.218375 4.472152 4.408025
Microarray_Expression2[1:5,1:5]
#>               ID1      ID2      ID3      ID4      ID5
#> FAM122B  5.532142 6.395654 5.159082 5.806040 5.339848
#> DEFB108B 2.456210 1.993044 2.251673 2.440018 2.493610
#> CCSER2   5.164945 4.923511 4.979691 5.080116 5.014569
#> GPD2     5.742455 5.649180 5.430411 6.007418 5.662126
#> HECW1    4.595407 5.243644 4.802716 4.957706 4.738554

head(ExampleGRF)
#>          x
#> 1    ACAD8
#> 2   ANAPC2
#> 3  ANKRD22
#> 4   ANKRD2
#> 5 ARHGAP35
#> 6    ASH1L

Please, note that the individuals are in the columns and the gene expressions are in the rows. Moreover, the row.names() are the names of the genes. The list of genes that will be used for measuring the interactions are in ExampleGRF. There should always be more than 2 of them contained in the expression set. If there are no common nodes to be measured, the method will retun an error.

sum(ExampleGRF$x %in% row.names(Microarray_Expression1))
#> [1] 168

Running the wTO

We can run the wTO package with 3 modes. The first one is running the wTO without resampling. For that we can use the wTO() . The second one, wTO.Complete(), gives you the whole diagnosis plot, hard-threshold on the ω_i, j, the ω_i, j, |ω_i, j| values and p-values. The last mode, wTO.fast(), just returns the ω_i, j values and p-value.

Using the `wTO()` function:

To use the wTO() function, the first step is to compute the correlation among the nodes of interest using CorrelationOverlap() and then use it as input for the wTO(). In the first function the user is allowed to choose the method for correlation between Pearson ('p') or Spearman ('s'). The second function allows the choice between absolute values ('abs') or signed values ('sign'). Please, keep in mind that the result of the wTO() function is a matrix, and it can be easily converted to an edge list using the function wTO.in.line().

wTO_p_abs = CorrelationOverlap(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p') %>% wTO(., sign = 'abs')

wTO_p_abs[1:5,1:5]
#>         ZNF333 ZNF28 ANKRD22   ZFR TRIM33
#> ZNF333   0.352 0.237   0.269 0.242  0.241
#> ZNF28    0.237 0.287   0.209 0.206  0.239
#> ANKRD22  0.269 0.209   0.299 0.199  0.252
#> ZFR      0.242 0.206   0.199 0.328  0.258
#> TRIM33   0.241 0.239   0.252 0.258  0.361
wTO_p_abs %<>% wTO.in.line()
head(wTO_p_abs)
#>     Node.1  Node.2   wTO
#> 1:  ZNF333   ZNF28 0.237
#> 2:  ZNF333 ANKRD22 0.269
#> 3:   ZNF28 ANKRD22 0.209
#> 4:  ZNF333     ZFR 0.242
#> 5:   ZNF28     ZFR 0.206
#> 6: ANKRD22     ZFR 0.199

wTO_s_abs = CorrelationOverlap(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 's') %>% wTO(., sign = 'abs') %>% wTO.in.line()
head(wTO_s_abs)
#>     Node.1  Node.2   wTO
#> 1:  ZNF333   ZNF28 0.236
#> 2:  ZNF333 ANKRD22 0.258
#> 3:   ZNF28 ANKRD22 0.215
#> 4:  ZNF333     ZFR 0.264
#> 5:   ZNF28     ZFR 0.187
#> 6: ANKRD22     ZFR 0.193

wTO_p_sign = CorrelationOverlap(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p') %>% wTO(., sign = 'sign') %>% wTO.in.line()
head(wTO_p_sign)
#>     Node.1  Node.2    wTO
#> 1:  ZNF333   ZNF28 -0.099
#> 2:  ZNF333 ANKRD22 -0.185
#> 3:   ZNF28 ANKRD22  0.076
#> 4:  ZNF333     ZFR -0.117
#> 5:   ZNF28     ZFR -0.077
#> 6: ANKRD22     ZFR -0.036

wTO_s_sign = CorrelationOverlap(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 's') %>% wTO(., sign = 'sign') %>% wTO.in.line()
head(wTO_s_sign)
#>     Node.1  Node.2    wTO
#> 1:  ZNF333   ZNF28 -0.064
#> 2:  ZNF333 ANKRD22 -0.143
#> 3:   ZNF28 ANKRD22  0.029
#> 4:  ZNF333     ZFR -0.164
#> 5:   ZNF28     ZFR -0.011
#> 6: ANKRD22     ZFR  0.024

Using the `wTO.Complete()` function:

The usage of the function wTO.Complete() is straight-forward. No plug-in-functions() are necessary. The arguments parsed to the wTO.Complete() functions are the number k of threads that should be used for computing the ω_i, j, the amount of replications, n, the expression matrix, Data, the Overlapping nodes, the correlation method (Pearson or Spearman) for the method_resampling that should be Bootstrap, BlockBootstrap or Reshuffle, the p-value correction method, pvalmethod (any from the p.adjust.methods), if the correlation should be saved, the δ is the expected difference, expected.diff, between the resampled values and the ω_i, j and also if the diagnosis plot should be plotted.

wTO_s_sign_complete = wTO.Complete(k = 5, n = 250, Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p', method_resampling = 'Bootstrap', pvalmethod = 'BH', savecor = TRUE, expected.diff = 0.2, plot = TRUE)
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> Simulations are done.
#> Computing p-values
#> Computing cutoffs
#> Done!

The diagnosis plot shows the quality of the resampling (first two plots). The closer the purple line to the black line, the better. The ω_i, j vs |ω_i, j| shows the amount of ω_i, j being affected by cancellations on the heuristics of the method, the more similar to a smile plot, the better. The last two plots show the relashionship between p-values and the ω_i, j. It is expected that higher ω's presents lower p-values.

The resulting object from the wTO.Complete() function is a list containing: * wTO an edge list of informations such as the signed and unsigned ω_i, j values its raw and adjusted p-values. * Correlation values, also as an edge list * Quantiles, the quantiles from the empirical distribution and the calculated ω's from the original data, for both signed and unsigned networks.

wTO_s_sign_complete
#> $wTO
#>        Node.1  Node.2 wTO_sign wTO_abs pval_sig pval_abs  Padj_sig
#>     1: ZNF333   ZNF28   -0.099   0.237    0.168    0.004 0.3607366
#>     2: ZNF333 ANKRD22   -0.185   0.269    0.188    0.016 0.3607366
#>     3: ZNF333     ZFR   -0.117   0.242    0.180    0.012 0.3607366
#>     4: ZNF333  TRIM33    0.007   0.241    0.136    0.008 0.3607366
#>     5: ZNF333   RIMS3   -0.325   0.409    0.144    0.000 0.3607366
#>    ---                                                            
#> 14024: ANAPC2   SBNO2   -0.147   0.298    0.156    0.000 0.3607366
#> 14025: ANAPC2  ZNF528   -0.142   0.222    0.152    0.016 0.3607366
#> 14026:  TIGD7   SBNO2   -0.297   0.354    0.128    0.004 0.3607366
#> 14027:  TIGD7  ZNF528   -0.099   0.219    0.212    0.032 0.3607366
#> 14028:  SBNO2  ZNF528    0.141   0.311    0.368    0.024 0.4030531
#>          Padj_abs
#>     1: 0.01167541
#>     2: 0.02395390
#>     3: 0.02083624
#>     4: 0.01712559
#>     5: 0.00000000
#>    ---           
#> 14024: 0.00000000
#> 14025: 0.02395390
#> 14026: 0.01167541
#> 14027: 0.03670450
#> 14028: 0.03016774
#> 
#> $Correlation
#>          Node.1   Node.2          Cor
#>     1:  FAM122B DEFB108B  0.366857931
#>     2:  FAM122B   CCSER2  0.278870911
#>     3: DEFB108B   CCSER2 -0.252482453
#>     4:  FAM122B     GPD2 -0.005649124
#>     5: DEFB108B     GPD2 -0.107064848
#>    ---                               
#> 35774:   TRIM23   ZNF528  0.054249174
#> 35775:   ZNF559   ZNF528 -0.218309729
#> 35776:   ANAPC2   ZNF528 -0.013821370
#> 35777:    TIGD7   ZNF528  0.011807143
#> 35778:    SBNO2   ZNF528  0.092317502
#> 
#> $Quantiles
#>                         0.1%  2.5%   10%  90% 97.5% 99.9%
#> Empirical.Quantile     -0.56 -0.46 -0.34 0.37  0.48  0.56
#> Quantile               -0.50 -0.40 -0.28 0.32  0.43  0.52
#> Empirical.Quantile.abs  0.21  0.24  0.27 0.47  0.53  0.57
#> Quantile.abs            0.17  0.19  0.21 0.41  0.47  0.53
#> 
#> attr(,"class")
#> [1] "wTO"  "list"

Using the `wTO.fast()` function:

The wTO.fast() function is a simplified verion of the wTO.Complete() function, that doesn't return diagnosis, correlation, nor the quantiles, but allows the user to choose the method for correlation, the sign of the ω to be calculated and the resampling method should be one of the two Bootrastap or BlockBootstrap. The p-values are the raw p-values and if the user desires to calculate its correction it can be easily done as shown above.

fast_example = wTO.fast(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 's', sign = 'sign', delta = 0.2, n = 250, method_resampling = 'Bootstrap')
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> ..........................................................................................................................................................................................................................................................Done!

head(fast_example)
#>     Node.1  Node.2    wTO  pval
#> 1:  ZNF333   ZNF28 -0.064 0.264
#> 2:  ZNF333 ANKRD22 -0.143 0.236
#> 3:   ZNF28 ANKRD22  0.029 0.188
#> 4:  ZNF333     ZFR -0.164 0.232
#> 5:   ZNF28     ZFR -0.011 0.256
#> 6: ANKRD22     ZFR  0.024 0.264

fast_example$adj.pval = p.adjust(fast_example$pval)

Metagenomic data

Along with the expression data, the wTO package also includes a metagenomics dataset that is the abundance of some OTU's in bacterias collected since 1997. More about this data can be found at \[\].

The OTU (Operational Taxonomic Units) contains the taxonomy of the particular OTU and from Week1 to Week98, the abundance of that particular OTU in that week.

data("metagenomics_abundance")
metagenomics_abundance[2:10, 1:10]
#>                                                                                                            OTU
#> 2                            Root;k__Archaea;p__Euryarchaeota;c__Thermoplasmata;o__E2;f__MarinegroupII;g__;s__
#> 3                   Root;k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__OCS155;g__;s__
#> 4         Root;k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Microbacteriaceae;g__;s__
#> 5                  Root;k__Bacteria;p__Bacteroidetes;c__Cytophagia;o__Cytophagales;f__Flammeovirgaceae;g__;s__
#> 6            Root;k__Bacteria;p__Bacteroidetes;c__Cytophagia;o__Cytophagales;f__Flammeovirgaceae;g__JTB248;s__
#> 7                          Root;k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__;g__;s__
#> 8            Root;k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Cryomorphaceae;g__;s__
#> 9  Root;k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Cryomorphaceae;g__Fluviicola;s__
#> 10        Root;k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Flavobacteriaceae;g__;s__
#>    Week1 Week2 Week3 Week4 Week5 Week6 Week7 Week8 Week9
#> 2      1     6     0     0     0     1     0     1     0
#> 3      0     0     0     0     0     0     0     5     0
#> 4      0     0     0     0     0     0     0     0     0
#> 5      0     1     0     0     0     0     0     0     0
#> 6      0     0     0     0     0     0     0     0     0
#> 7      0     1     0     0     0     0     0     1     0
#> 8      0     0     0     0     0     0     0     1     0
#> 9      0     0     0     0     0     0     0     1     0
#> 10     0     1     0     0     0     0     0     7     0

Before we are able to define the network, we have first to understand the patterns of autocorrelation of each species, and then define the lag, that will be used for the BlockBootstrap resampling in the wTO.Complete() or fast.wTO() functions. To define the lag, we use autocorrelation function acf().

row.names(metagenomics_abundance) = metagenomics_abundance$OTU
metagenomics_abundance = metagenomics_abundance[,-1]
par(mfrow = c(3,3))
for ( i in 1:nrow(metagenomics_abundance)){
  acf(t(metagenomics_abundance[i,]))
}

Because most of them have only a high autocorrelation with itself or maximum 2 weeks, we will use a lag of 2 for the blocks used in the bootstrap.

The functions wTO.fast() and wTO.Complete() are able to accomodate the lag parameter, therefore, they will be used here.

Meta_fast = wTO.fast(Data = metagenomics_abundance, Overlap = row.names(metagenomics_abundance), method = 'p', sign = 'sign', n = 250, method_resampling = 'BlockBootstrap', lag = 2)
#> There are 67 overlapping nodes, 67 total nodes and 98 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> ..........................................................................................................................................................................................................................................................Done!

Meta_Complete = wTO.Complete(k = 1, n = 250, Data = metagenomics_abundance, Overlap = row.names(metagenomics_abundance), method = 's' , method_resampling = 'BlockBootstrap', lag = 2 )
#> There are 67 overlapping nodes, 67 total nodes and 98 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> Simulations are done.
#> Computing p-values
#> Computing cutoffs
#> Done!

Consensus Network

From the expression data-sets, we are able to draw a Consensus Network. For that, the function wTO.Consensus() can be used. This function works in a special way, that the user should pass a list of data.frames containing the Nodes names and the wTO and p-values. We show an example above.

Let's calculate the networks the same way we did in the Section Genomic data.

wTO_Data1 = wTO.fast(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p', n = 250)
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> ..........................................................................................................................................................................................................................................................Done!
wTO_Data2 = wTO.fast(Data = Microarray_Expression2, Overlap = ExampleGRF$x, method = 'p', n = 250)
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> ..........................................................................................................................................................................................................................................................Done!

Now, let's combine both networks in one Consensus Network.

CN_expression = wTO.Consensus(data = list (wTO_Data1 = data.frame
                                    (Node.1 = wTO_Data1$Node.1, 
                                     Node.2 = wTO_Data1$Node.2, 
                                     wTO = wTO_Data1$wTO,
                                      pval = wTO_Data1$pval)
                                    , wTO_Data2C = data.frame
                                    (Node.1 = wTO_Data2$Node.1, 
                                     Node.2 = wTO_Data2$Node.2, 
                                     wTO = wTO_Data2$wTO,
                                      pval = wTO_Data2$pval)))
#> Joining by: Node.1, Node.2
#> Joining by: Node.1, Node.2
#> Joining by: ID
#> Total common nodes: 168

Or using the wTO.Complete():

wTO_Data1C = wTO.Complete(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p', n = 250, k = 5)
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> Simulations are done.
#> Computing p-values
#> Computing cutoffs
#> Done!
wTO_Data2C = wTO.Complete(Data = Microarray_Expression2, Overlap = ExampleGRF$x, method = 'p', n = 250, k = 5)
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> Simulations are done.
#> Computing p-values
#> Computing cutoffs

#> Done!

Now, let's combine both networks in one Consensus Network.

CN_expression = wTO.Consensus(data = list (wTO_Data1C = data.frame
                                    (Node.1 = wTO_Data1C$wTO$Node.1, 
                                     Node.2 = wTO_Data1C$wTO$Node.2, 
                                     wTO = wTO_Data1C$wTO$wTO_sign,
                                      pval = wTO_Data1C$wTO$pval_sig), wTO_Data2C = data.frame
                                    (Node.1 = wTO_Data2C$wTO$Node.1, 
                                     Node.2 = wTO_Data2C$wTO$Node.2, 
                                     wTO = wTO_Data2C$wTO$wTO_sign,
                                      pval = wTO_Data2C$wTO$pval_sig)))
#> Joining by: Node.1, Node.2
#> Joining by: Node.1, Node.2
#> Joining by: ID
#> Total common nodes: 168

head(CN_expression)
#>   Node.1  Node.2         CN pval.fisher
#> 1 ZNF333   ZNF28 -0.1191288  0.06006662
#> 2 ZNF333 ANKRD22 -0.1400000  0.13016905
#> 3 ZNF333     ZFR -0.1091659  0.10147819
#> 4 ZNF333  TRIM33 -0.0240000  0.11707440
#> 5 ZNF333   RIMS3 -0.2798101  0.10797662
#> 6 ZNF333  ZNF595  0.1542850  0.11529832

Visualization

The wTO package also includes an interactive visualization tool that can be used to inspect the results of the wTO netwoks or Consensus Network.

The arguments given to this function are the Nodes names, its wTO and p-values. Optionals are the cutoffs that can be applied to the p-value or to the wTO value. We highly reccomend using both by subseting the data previous to the visualization. The layout of the network can be also chosen from a variety that are implemented in igraph package, for the the Make_Cluster argument many clustering algorithms that are implemented in igraph can be used. The final graph can be exported as an html or as png.

Visualization = NetVis(Node.1 = CN_expression$Node.1, 
       Node.2 = CN_expression$Node.2,
       wTO = CN_expression$CN, 
       pval = CN_expression$pval.fisher, cutoff = list(kind = 'pval', value = 0.001), MakeGroups = 'louvain', layout = 'layout_components')
#> Joining by: id

CN_expression_filtered = subset(CN_expression, abs(CN_expression$CN)> 0.4 & CN_expression$pval.fisher < 0.0001)

dim(CN_expression_filtered)
#> [1] 45  4

Visualization2 = NetVis(
  Node.1 = CN_expression_filtered$Node.1, 
       Node.2 = CN_expression_filtered$Node.2,
       wTO = CN_expression_filtered$CN, 
       pval = CN_expression_filtered$pval.fisher, 
       cutoff = list(kind = 'pval', value = 0.001), 
       MakeGroups = 'louvain', 
       layout = 'layout_components', path = 'Vis.html')
#> Joining by: id
#> Vis.html

sessionInfo()
#> R version 3.4.4 (2018-03-15)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04 LTS
#> 
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] magrittr_1.5 wTO_1.6.1   
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_0.12.17      visNetwork_2.0.3  digest_0.6.15    
#>  [4] rprojroot_1.3-2   plyr_1.8.4        jsonlite_1.5     
#>  [7] backports_1.1.2   evaluate_0.10.1   som_0.3-5.1      
#> [10] stringi_1.2.2     reshape2_1.4.3    data.table_1.11.4
#> [13] rmarkdown_1.10    tools_3.4.4       stringr_1.3.1    
#> [16] htmlwidgets_1.2   igraph_1.2.1      parallel_3.4.4   
#> [19] yaml_2.1.19       compiler_3.4.4    pkgconfig_2.0.1  
#> [22] htmltools_0.3.6   knitr_1.20