In R, data are often stored in data frames which are tables in which each row represents a record and each column a variable. Because data frames are highly used, they have been improved in different objects such as tibble, data.table or AnnotatedDataFrame.
However, in many projects, the data do not fit in only one table but they are organized in different data frames, each of them dealing with a specific concept. These tables are often related to each other by some variables. Depending on the diversity of the concepts, it can be difficult to remember what these tables represent and how they are connected. Fortunately, because they are tables, a set of data frames can be directly documented using a relational data model.
The datamodelr R package provides tools to document relational data. The generate data models are leveraged by the dm R package to interact more easily with relational data.
Here we present the ReDaMoR package which also allows the manipulation of relational data models in R but with an approach quite different from the one implemented in datamodelr. It provides functions to create, import and save relational data models. These functions are accessible through a graphical user interface made with Shiny.
The main features of ReDaMoR are the following:
The ReDaMoR R package is licensed under GPL-3.
This package is not yet available on CRAN.
The following R packages available on CRAN are required:
devtools::install_github("patzaw/ReDaMoR")
The Shiny app is launched with the following command:
library(ReDaMoR)
m <- model_relational_data()
When the Done button is clicked, the model is returned in R environment. Because the interface can be closed accidentally, the model is also autosaved and can be recovered using the recover_RelDataModel()
function. The recovered model can be provided as modelInput
when calling model_relational_data()
. For example:
m <- model_relational_data(recover_RelDataModel())
A data model example is provided within the package. It represents data extracted from the Human Phenotype Ontology (HPO)(1) and for which a subset is also provided within the ReDaMoR package (more details provided in the Confronting data section).
This example can be imported from the Shiny app when clicking on the Import button and then on the Try an example link. It can be also be loaded and displayed (and edited) using the following commands:
hpo_model <- read_json_data_model(
system.file("examples/HPO-model.json", package="ReDaMoR")
)
plot(hpo_model)
## Edit the model
# m <- model_relational_data(hpo_model)
The view is rendered by the visNetwork package. It means that it can take advantages of all the functions provided by the visNetwork package including the use of the model in Shiny apps.
Each box represents a table. The header of the box corresponds to the name of the table and the following lines document each field:
When the cursor is over a box, table and field comments are displayed.
Each arrow represent a foreign key:
The app is divided in three main parts:
The model view provides a view of the data model rendered by the visNetwork package. Tables can be selected by clicking on it or by searching them using the box just above the model view. Autodraw and autofit capabilities are also provided in this area.
In the edition view the user can:
An help tour can be launched when clicking on the button in the main menu. This help tour is contextual: its content depends on the state of the app.
Some common keyboard shortcuts are implemented:
A public instance of the app is available here.
You can easily deploy your own instance by copying these two lines in an app.R file:
library(ReDaMoR)
model_relational_data()
Data can be confronted to the model using the confront_data()
function. During this process the following checks are performed:
A subset (500 phenotypes among more than 14 000 in the original resource) of the HPO(1) is provided within the package and can be directly confronted to the data model. The function prints a report with messages about the global success of the data model confrontation and additional failure or warning messages for relevant tables.
confrontation_report <- confront_data(
hpo_model,
path=list.files(
system.file("examples/HPO-subset", package="ReDaMoR"),
full.names=TRUE
),
returnData=TRUE
)
#> Processing "HPO_hp" (table 1 / 9)
#> Processing "HPO_altId" (table 2 / 9)
#> Processing "HPO_sourceFiles" (table 3 / 9)
#> Processing "HPO_diseases" (table 4 / 9)
#> Processing "HPO_diseaseHP" (table 5 / 9)
#> Processing "HPO_diseaseSynonyms" (table 6 / 9)
#> Processing "HPO_parents" (table 7 / 9)
#> Processing "HPO_descendants" (table 8 / 9)
#> Processing "HPO_synonyms" (table 9 / 9)
#> Model
#> SUCCESS
#>
#> Check configuration
#> - Optional checks: unique, not nullable, foreign keys
#> - Maximum number of records: Inf
#>
#> HPO_hp
#> SUCCESS
#> Field issues or warnings
#> - description: SUCCESS Missing values 117/500 = 23%
This report can also be formatted using markdown and included in a document such as this one.
# view_confrontation_report(confrontation_report) # Use RStudio viewer
format_confrontation_report_md(
confrontation_report,
title="Example: Confrontation with original data",
level=1, numbered=FALSE
) %>%
cat()
SUCCESS
SUCCESS
The returnData
argument is used to keep the data in the confrontation report.
hpo_tables <- confrontation_report$data
In the example below the data are altered and confronted again to the data model. The confrontation report shows the discrepancies between the model and the data in order to facilitate the correction of one or the other.
hpo_tables$HPO_diseases <- hpo_tables$HPO_diseases %>% slice(1:100)
hpo_tables$HPO_synonyms[1:10, "synonym"] <- NA
hpo_tables$HPO_hp <- hpo_tables$HPO_hp %>% mutate(level=as.character(level))
confront_data(hpo_model, hpo_tables, verbose=FALSE) %>%
format_confrontation_report_md(
title="Example: Confrontation with altered data",
level=1, numbered=FALSE
) %>%
cat()
FAILURE
FAILURE
FAILURE
FAILURE
FAILURE
Beside confronting a model to data, a model can also be drafted from a set of tables. Only column names and types are documented during this process. Uniqueness, mandatory and key constraints still need to be documented by the user to complete the model. It can easily be done with model_relational_data()
.
hpo_tables <- confrontation_report$data
new_model <- df_to_model(
list=names(hpo_tables), envir=as.environment(hpo_tables)
)
new_model %>%
auto_layout(lengthMultiplier=250) %>%
plot()
# model_relational_data(new_model)
This work was entirely supported by UCB Pharma. (Early Solutions department)