‘ODAM’ (Open Data for Access and Mining) is a framework that implements a simple way to make research data broadly accessible and fully available for reuse, including by a script language such as R. The main purpose is to make a dataset accessible online with a minimal effort from the data provider, and to allow any scientists or bioinformaticians to be able to explore the dataset and then extract a subpart or the totality of the data according to their needs.
The R ODAM package offers a set of functions for retrieve data and their metadata of datasets that are implemented help with the “Experimental Data Table Management System” (EDTMS) called ODAM, which stands for “Open Data for Access and Mining”.
See https://www.slideshare.net/danieljacob771282/odam-open-data-access-and-mining for further information.
library(Rodam)
## Loading required package: RCurl
## Loading required package: bitops
Initialize the ‘ODAM’ object with the wanted dataset along with its corresponding URL of webservice
dh <- new('odamws', wsURL='https://pmb-bordeaux.fr/getdata/', dsname='frim1')
options(width=256)
options(warn=-1)
options(stringsAsFactors=FALSE)
show(dh)
## levelName SetID Identifier WSEntry Description Count
## 1 plants 1 PlantID plant Plant features 552
## 2 °--samples 2 SampleID sample Sample features 1288
## 3 ¦--aliquots 3 AliquotID aliquot Aliquots features 530
## 4 ¦ ¦--cellwall_metabo 4 AliquotID aliquot Cell wall Compound quantifications 75
## 5 ¦ ¦--cellwall_metaboFW 5 AliquotID aliquot Cell Wall Compound quantifications (FW) 75
## 6 ¦ ¦--activome 6 AliquotID aliquot Activome Features 266
## 7 ¦ ¦--plato_hexosesP 10 AliquotID aliquot Hexoses Phosphate 266
## 8 ¦ ¦--lipids_AG 11 AliquotID aliquot Lipids AG 57
## 9 ¦ °--AminoAcid 12 AliquotID aliquot Amino Acids 69
## 10 °--pools 7 PoolID pool Pools of remaining pools 195
## 11 ¦--qMS_metabo 8 PoolID pool MS Compounds quantification 25
## 12 °--qNMR_metabo 9 PoolID pool NMR Compounds quantification 65
Get all WebService entries defined in the data subset ‘samples’
dh$getWSEntryByName("samples")
## Subset Attribute WSEntry
## 1 plants PlantID plant
## 2 plants Rank row
## 3 plants PlantNum plantnum
## 4 plants Treatment treatment
## 5 samples SampleID sample
## 6 samples Truss truss
## 7 samples DevStage stage
## 8 samples FruitAge age
a ‘WSEntry’ is an alias name associated with an attribute that allows user to query the data subset by putting a filter condition (i.e. a selection constraint) on the corresponding attribute. Not all attributes have a WSEntry but only few ones, especially the attributes within the identifier and factor categories. For instance, the WSEntry of the ‘SampleID’ attribute is ‘sample’. Thus, if you want to select only samples with their ID equal to 365, you have to specify the filter condition as ‘sample/365’.
data <- dh$getDataByName('samples','sample/365')
data
## PlantID Rank PlantNum Treatment SampleID Truss DevStage FruitAge HarvestDate HarvestHour FruitPosition FruitDiameter FruitHeight FruitFW DW
## 1 E35 E 311 Control 365 T6 FR.02 47DPA 40423 0.5 5 55.46 48.98 83.32 NA
## 2 A17 A 17 Control 365 T6 FR.02 47DPA 40423 0.5 3 56.59 47.77 82.02 NA
## 3 A8 A 8 Control 365 T6 FR.02 47DPA 40423 0.5 5 55.11 44.90 71.82 NA
## 4 D3 D 210 Control 365 T6 FR.02 47DPA 40423 0.5 5 49.28 44.35 58.28 NA
## 5 H11 H 356 Control 365 T6 FR.02 47DPA 40423 0.5 6 46.68 38.69 49.25 NA
But if this WSEntry concept is not clear for you, you can retrieve the full data subset, then performe a local selection as shown below :
data <- dh$getDataByName('samples')
data[data$SampleID==365, ]
## PlantID Rank PlantNum Treatment SampleID Truss DevStage FruitAge HarvestDate HarvestHour FruitPosition FruitDiameter FruitHeight FruitFW DW
## 22 A8 A 8 Control 365 T6 FR.02 47DPA 40423 0.5 5 55.11 44.90 71.82 NA
## 41 A17 A 17 Control 365 T6 FR.02 47DPA 40423 0.5 3 56.59 47.77 82.02 NA
## 402 D3 D 210 Control 365 T6 FR.02 47DPA 40423 0.5 5 49.28 44.35 58.28 NA
## 590 E35 E 311 Control 365 T6 FR.02 47DPA 40423 0.5 5 55.46 48.98 83.32 NA
## 662 H11 H 356 Control 365 T6 FR.02 47DPA 40423 0.5 6 46.68 38.69 49.25 NA
data$HarvestDate <- dh$dateToStr(data$HarvestDate)
data$HarvestHour <- dh$timeToStr(data$HarvestHour)
data[data$SampleID==365, ]
## PlantID Rank PlantNum Treatment SampleID Truss DevStage FruitAge HarvestDate HarvestHour FruitPosition FruitDiameter FruitHeight FruitFW DW
## 22 A8 A 8 Control 365 T6 FR.02 47DPA 2010-09-02 12h0 5 55.11 44.90 71.82 NA
## 41 A17 A 17 Control 365 T6 FR.02 47DPA 2010-09-02 12h0 3 56.59 47.77 82.02 NA
## 402 D3 D 210 Control 365 T6 FR.02 47DPA 2010-09-02 12h0 5 49.28 44.35 58.28 NA
## 590 E35 E 311 Control 365 T6 FR.02 47DPA 2010-09-02 12h0 5 55.46 48.98 83.32 NA
## 662 H11 H 356 Control 365 T6 FR.02 47DPA 2010-09-02 12h0 6 46.68 38.69 49.25 NA
Get ‘activome’ data subset along with its metadata
ds <- dh$getSubsetByName('activome')
ds$samples # Show the identifier defined in the data subset
## [1] "AliquotID"
ds$facnames # Show all factors defined in the data subset
## [1] "Treatment" "DevStage" "FruitAge"
ds$varnames # Show all quantitative variables defined in the data subset
## [1] "PGM" "F16BP_Cyt" "PyrK" "CitS" "PPI" "AcoS" "PFK" "FruS" "F16BP_Stroma"
## [10] "GluS" "ISODH_NAD" "EnoS" "ISODH_NADP" "PEPC" "FBPA" "SucCoALig" "MALDH" "AlaS"
## [19] "FumS" "AspS" "GLUDH_NADP" "GAPDH_NAD" "GAPDH_NADP" "GLUDH_NAD" "TPI" "PhoS" "NI"
## [28] "AciS" "G6PDH" "UGPS" "SucS" "MAL_NAD" "ShiS" "MAL_NADP" "PGI_tot" "SolStarchS"
## [37] "AGPS" "SucPhosphateS"
ds$qualnames # Show all qualitative variables defined in the data subset
## character(0)
ds$WSEntry # Show all WS entries defined in the data subset
## Subset Attribute WSEntry
## 1 plants PlantID plant
## 2 plants Rank row
## 3 plants PlantNum plantnum
## 4 plants Treatment treatment
## 5 samples SampleID sample
## 6 samples Truss truss
## 7 samples DevStage stage
## 8 samples FruitAge age
## 9 aliquots SampleID sample
## 10 aliquots AliquotID aliquot
## 11 activome AliquotID aliquot
Rank <- simplify2array(lapply(ds$varnames, function(x) { round(mean(log10(ds$data[ , x]), na.rm=T)) }))
cols <- c('red', 'orange', 'darkgreen', 'blue', 'purple')
boxplot(log10(ds$data[, ds$varnames]), outline=F, horizontal=T, border=cols[Rank], las=2, cex.axis=0.8)
Based on the subset network, the common ID to be considered is the “SampleID” identifier
refID <- "SampleID"
subsetList <- c( "samples", "activome", "qNMR_metabo", "cellwall_metabo" )
n <- length(subsetList)
Mintersubsets <- matrix(data=0, nrow=n, ncol=n)
for (i in 1:(n-1))
for (j in (i+1):n)
Mintersubsets[i,j] <- length(dh$getCommonID(refID,subsetList[i],subsetList[j]))
rownames(Mintersubsets) <- subsetList
colnames(Mintersubsets) <- subsetList
Mintersubsets[ -n, -1 ]
## activome qNMR_metabo cellwall_metabo
## samples 254 191 70
## activome 0 191 70
## qNMR_metabo 0 0 24
setNameList <- c("activome", "qNMR_metabo" )
dsMerged <- dh$getSubsetByName(setNameList)
cols <- c( rep('red', length(dsMerged$varsBySubset[[setNameList[1]]])),
rep('darkgreen', length(dsMerged$varsBySubset[[setNameList[2]]])) )
boxplot(log10(dsMerged$data[, dsMerged$varnames]), outline=F, horizontal=T, border=cols, las=2, cex.axis=0.8)
options(width=128)
sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=C LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 LC_NUMERIC=C
## [5] LC_TIME=French_France.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Rodam_0.1.6 RCurl_1.95-4.11 bitops_1.0-6
##
## loaded via a namespace (and not attached):
## [1] tidyselect_0.2.4 xfun_0.3 Rook_1.1-1 purrr_0.2.5 colorspace_1.3-2 htmltools_0.3.6
## [7] viridisLite_0.3.0 yaml_2.2.0 XML_3.98-1.16 rlang_0.3.1 pillar_1.3.1 glue_1.3.0
## [13] RColorBrewer_1.1-2 bindrcpp_0.2.2 bindr_0.1.1 plyr_1.8.4 stringr_1.3.1 munsell_0.5.0
## [19] gtable_0.2.0 data.tree_0.7.8 visNetwork_2.0.4 htmlwidgets_1.3 evaluate_0.11 knitr_1.21
## [25] DiagrammeR_1.0.0 Rcpp_1.0.0 readr_1.1.1 scales_1.0.0 jsonlite_1.6 rgexf_0.15.3
## [31] gridExtra_2.3 brew_1.0-6 ggplot2_3.1.0 hms_0.4.2 digest_0.6.18 stringi_1.2.4
## [37] dplyr_0.7.8 grid_3.5.1 influenceR_0.1.0 tools_3.5.1 magrittr_1.5 lazyeval_0.2.1
## [43] tibble_2.0.1 crayon_1.3.4 tidyr_0.8.1 pkgconfig_2.0.2 downloader_0.4 assertthat_0.2.0
## [49] rmarkdown_1.11 rstudioapi_0.8 viridis_0.5.1 R6_2.3.0 igraph_1.2.2 compiler_3.5.1