Demonstration of the functionnalities of the R ODAM package

Description

  • ‘ODAM’ (Open Data for Access and Mining) is a framework that implements a simple way to make research data broadly accessible and fully available for reuse, including by a script language such as R. The main purpose is to make a dataset accessible online with a minimal effort from the data provider, and to allow any scientists or bioinformaticians to be able to explore the dataset and then extract a subpart or the totality of the data according to their needs.

  • The Rodam package has only one class, odamws that provides methods to allow you to retrieve online data using ‘ODAM’ Web Services. This obviously requires that data are implemented according the ‘ODAM’ approach , namely that the data subsets were deposited in the suitable data repository in the form of TSV files associated with their metadata also described in TSV files.
  • The R ODAM package offers a set of functions for retrieve data and their metadata of datasets that are implemented help with the “Experimental Data Table Management System” (EDTMS) called ODAM, which stands for “Open Data for Access and Mining”.

  • See https://www.slideshare.net/danieljacob771282/odam-open-data-access-and-mining for further information.


Load the R ODAM package

library(Rodam)
## Loading required package: RCurl
## Loading required package: bitops


Initialize the ODAM object

Initialize the ‘ODAM’ object with the wanted dataset along with its corresponding URL of webservice

dh <- new('odamws', wsURL='https://pmb-bordeaux.fr/getdata/', dsname='frim1')


Get the Data Tree

options(width=256)
options(warn=-1)
options(stringsAsFactors=FALSE)

show(dh)
##                        levelName SetID Identifier WSEntry                             Description Count
## 1  plants                            1    PlantID   plant                          Plant features   552
## 2   °--samples                       2   SampleID  sample                         Sample features  1288
## 3       ¦--aliquots                  3  AliquotID aliquot                       Aliquots features   530
## 4       ¦   ¦--cellwall_metabo       4  AliquotID aliquot      Cell wall Compound quantifications    75
## 5       ¦   ¦--cellwall_metaboFW     5  AliquotID aliquot Cell Wall Compound quantifications (FW)    75
## 6       ¦   ¦--activome              6  AliquotID aliquot                       Activome Features   266
## 7       ¦   ¦--plato_hexosesP       10  AliquotID aliquot                       Hexoses Phosphate   266
## 8       ¦   ¦--lipids_AG            11  AliquotID aliquot                               Lipids AG    57
## 9       ¦   °--AminoAcid            12  AliquotID aliquot                             Amino Acids    69
## 10      °--pools                     7     PoolID    pool                Pools of remaining pools   195
## 11          ¦--qMS_metabo            8     PoolID    pool             MS Compounds quantification    25
## 12          °--qNMR_metabo           9     PoolID    pool            NMR Compounds quantification    65


Get all WebService entries

Get all WebService entries defined in the data subset ‘samples’

dh$getWSEntryByName("samples")
##    Subset Attribute   WSEntry
## 1  plants   PlantID     plant
## 2  plants      Rank       row
## 3  plants  PlantNum  plantnum
## 4  plants Treatment treatment
## 5 samples  SampleID    sample
## 6 samples     Truss     truss
## 7 samples  DevStage     stage
## 8 samples  FruitAge       age

NOTE:

a ‘WSEntry’ is an alias name associated with an attribute that allows user to query the data subset by putting a filter condition (i.e. a selection constraint) on the corresponding attribute. Not all attributes have a WSEntry but only few ones, especially the attributes within the identifier and factor categories. For instance, the WSEntry of the ‘SampleID’ attribute is ‘sample’. Thus, if you want to select only samples with their ID equal to 365, you have to specify the filter condition as ‘sample/365’.



Get data from ‘samples’ subset with a constraint

data <- dh$getDataByName('samples','sample/365')
data
##   PlantID Rank PlantNum Treatment SampleID Truss DevStage FruitAge HarvestDate HarvestHour FruitPosition FruitDiameter FruitHeight FruitFW DW
## 1     E35    E      311   Control      365    T6    FR.02    47DPA       40423         0.5             5         55.46       48.98   83.32 NA
## 2     A17    A       17   Control      365    T6    FR.02    47DPA       40423         0.5             3         56.59       47.77   82.02 NA
## 3      A8    A        8   Control      365    T6    FR.02    47DPA       40423         0.5             5         55.11       44.90   71.82 NA
## 4      D3    D      210   Control      365    T6    FR.02    47DPA       40423         0.5             5         49.28       44.35   58.28 NA
## 5     H11    H      356   Control      365    T6    FR.02    47DPA       40423         0.5             6         46.68       38.69   49.25 NA


But if this WSEntry concept is not clear for you, you can retrieve the full data subset, then performe a local selection as shown below :

data <- dh$getDataByName('samples') 
data[data$SampleID==365, ]
##     PlantID Rank PlantNum Treatment SampleID Truss DevStage FruitAge HarvestDate HarvestHour FruitPosition FruitDiameter FruitHeight FruitFW DW
## 22       A8    A        8   Control      365    T6    FR.02    47DPA       40423         0.5             5         55.11       44.90   71.82 NA
## 41      A17    A       17   Control      365    T6    FR.02    47DPA       40423         0.5             3         56.59       47.77   82.02 NA
## 402      D3    D      210   Control      365    T6    FR.02    47DPA       40423         0.5             5         49.28       44.35   58.28 NA
## 590     E35    E      311   Control      365    T6    FR.02    47DPA       40423         0.5             5         55.46       48.98   83.32 NA
## 662     H11    H      356   Control      365    T6    FR.02    47DPA       40423         0.5             6         46.68       38.69   49.25 NA


Convert all numeric values of date and time in a human-readable format

data$HarvestDate <- dh$dateToStr(data$HarvestDate)
data$HarvestHour <- dh$timeToStr(data$HarvestHour)
data[data$SampleID==365, ]
##     PlantID Rank PlantNum Treatment SampleID Truss DevStage FruitAge HarvestDate HarvestHour FruitPosition FruitDiameter FruitHeight FruitFW DW
## 22       A8    A        8   Control      365    T6    FR.02    47DPA  2010-09-02        12h0             5         55.11       44.90   71.82 NA
## 41      A17    A       17   Control      365    T6    FR.02    47DPA  2010-09-02        12h0             3         56.59       47.77   82.02 NA
## 402      D3    D      210   Control      365    T6    FR.02    47DPA  2010-09-02        12h0             5         49.28       44.35   58.28 NA
## 590     E35    E      311   Control      365    T6    FR.02    47DPA  2010-09-02        12h0             5         55.46       48.98   83.32 NA
## 662     H11    H      356   Control      365    T6    FR.02    47DPA  2010-09-02        12h0             6         46.68       38.69   49.25 NA



Get ‘activome’ data subset

Get ‘activome’ data subset along with its metadata

ds <- dh$getSubsetByName('activome')
ds$samples   # Show the identifier defined in the data subset
## [1] "AliquotID"
ds$facnames  # Show all factors defined in the data subset
## [1] "Treatment" "DevStage"  "FruitAge"
ds$varnames  # Show all quantitative variables defined in the data subset
##  [1] "PGM"           "F16BP_Cyt"     "PyrK"          "CitS"          "PPI"           "AcoS"          "PFK"           "FruS"          "F16BP_Stroma" 
## [10] "GluS"          "ISODH_NAD"     "EnoS"          "ISODH_NADP"    "PEPC"          "FBPA"          "SucCoALig"     "MALDH"         "AlaS"         
## [19] "FumS"          "AspS"          "GLUDH_NADP"    "GAPDH_NAD"     "GAPDH_NADP"    "GLUDH_NAD"     "TPI"           "PhoS"          "NI"           
## [28] "AciS"          "G6PDH"         "UGPS"          "SucS"          "MAL_NAD"       "ShiS"          "MAL_NADP"      "PGI_tot"       "SolStarchS"   
## [37] "AGPS"          "SucPhosphateS"
ds$qualnames # Show all qualitative variables defined in the data subset
## character(0)
ds$WSEntry   # Show all WS entries defined in the data subset
##      Subset Attribute   WSEntry
## 1    plants   PlantID     plant
## 2    plants      Rank       row
## 3    plants  PlantNum  plantnum
## 4    plants Treatment treatment
## 5   samples  SampleID    sample
## 6   samples     Truss     truss
## 7   samples  DevStage     stage
## 8   samples  FruitAge       age
## 9  aliquots  SampleID    sample
## 10 aliquots AliquotID   aliquot
## 11 activome AliquotID   aliquot


Boxplot of all variables defined in ds$varnames

Rank <- simplify2array(lapply(ds$varnames, function(x) { round(mean(log10(ds$data[ , x]), na.rm=T)) }))
cols <- c('red', 'orange', 'darkgreen', 'blue', 'purple')
boxplot(log10(ds$data[, ds$varnames]), outline=F, horizontal=T, border=cols[Rank], las=2, cex.axis=0.8)


Find how many IDs in common there are between the subsets

Based on the subset network, the common ID to be considered is the “SampleID” identifier

 refID <- "SampleID"
 subsetList <- c( "samples", "activome", "qNMR_metabo", "cellwall_metabo" )
 n <- length(subsetList)
 Mintersubsets <- matrix(data=0, nrow=n, ncol=n)
 for (i in 1:(n-1))
     for (j in (i+1):n)
          Mintersubsets[i,j] <- length(dh$getCommonID(refID,subsetList[i],subsetList[j]))
 
 rownames(Mintersubsets) <- subsetList
 colnames(Mintersubsets) <- subsetList
 Mintersubsets[ -n, -1 ]
##             activome qNMR_metabo cellwall_metabo
## samples          254         191              70
## activome           0         191              70
## qNMR_metabo        0           0              24


Get the merged data of two data subsets based on their common identifiers

setNameList <- c("activome", "qNMR_metabo" )
dsMerged <- dh$getSubsetByName(setNameList)

Boxplot of all variables defined in ds$varnames

cols <- c( rep('red', length(dsMerged$varsBySubset[[setNameList[1]]])), 
           rep('darkgreen', length(dsMerged$varsBySubset[[setNameList[2]]])) )
boxplot(log10(dsMerged$data[, dsMerged$varnames]), outline=F, horizontal=T, border=cols, las=2, cex.axis=0.8)





R Session Information

options(width=128)
sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=C                   LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
## [5] LC_TIME=French_France.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] Rodam_0.1.6     RCurl_1.95-4.11 bitops_1.0-6   
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_0.2.4   xfun_0.3           Rook_1.1-1         purrr_0.2.5        colorspace_1.3-2   htmltools_0.3.6   
##  [7] viridisLite_0.3.0  yaml_2.2.0         XML_3.98-1.16      rlang_0.3.1        pillar_1.3.1       glue_1.3.0        
## [13] RColorBrewer_1.1-2 bindrcpp_0.2.2     bindr_0.1.1        plyr_1.8.4         stringr_1.3.1      munsell_0.5.0     
## [19] gtable_0.2.0       data.tree_0.7.8    visNetwork_2.0.4   htmlwidgets_1.3    evaluate_0.11      knitr_1.21        
## [25] DiagrammeR_1.0.0   Rcpp_1.0.0         readr_1.1.1        scales_1.0.0       jsonlite_1.6       rgexf_0.15.3      
## [31] gridExtra_2.3      brew_1.0-6         ggplot2_3.1.0      hms_0.4.2          digest_0.6.18      stringi_1.2.4     
## [37] dplyr_0.7.8        grid_3.5.1         influenceR_0.1.0   tools_3.5.1        magrittr_1.5       lazyeval_0.2.1    
## [43] tibble_2.0.1       crayon_1.3.4       tidyr_0.8.1        pkgconfig_2.0.2    downloader_0.4     assertthat_0.2.0  
## [49] rmarkdown_1.11     rstudioapi_0.8     viridis_0.5.1      R6_2.3.0           igraph_1.2.2       compiler_3.5.1