1 Introduction

The biocompute package offers a toolkit to create, validate, and export BioCompute Objects (BCO). This package follows the tidyverse design principles and can be seamlessly used together with the other packages with similar designs.

library("biocompute")

2 Design for Reproducibility

To ensure better reproducibility, the composing and validation functions in this package are versioned. This means the BCO creation and validation can be done with fixed versions of the BioCompute Object specification if needed.

For example, to compose the provenance domain, one could use compose_provenance(), which is an alias to the current stable version of the specification. Alternatively, one could use a versioned function compose_provenance_v1.3.0(). As the specification evolves, functions for new spec versions can be added, and compose_provenance() might point to a newer version in the future, while compose_provenance_v1.3.0() will not change over time.

The function biocompute::versions() tells the current and all available versions of the BioCompute Object specification supported in this package:

biocompute::versions()

$current
[1] "1.3.0"

$available
[1] "1.3.0"

3 Compose BioCompute Object Domains

The package takes structured, native R data structures (vector or data frames), and turns them into BioCompute Objects. The functions compose_*() and biocompute::compose() are used to compose BioCompute Object domains and the final BioCompute Object.

For example, to compose the provenance domain, we first prepare the data as data frames or vectors with a fixed set of variables names, and feed them into compose_provenance():

name <- "HCV1a ledipasvir resistance SNP detection"
version <- "1.0.0"
review <- data.frame(
  "status" = c("approved", "approved"),
  "reviewer_comment" = c(
    "Approved by [company name] staff. Waiting for approval from FDA Reviewer",
    "The revised BCO looks fine"
  ),
  "date" = c(
    as.POSIXct("2017-11-12T12:30:48", format = "%Y-%m-%dT%H:%M:%S", tz = "EST"),
    as.POSIXct("2017-12-12T12:30:48", format = "%Y-%m-%dT%H:%M:%S", tz = "America/Los_Angeles")
  ),
  "reviewer_name" = c("Jane Doe", "John Doe"),
  "reviewer_affiliation" = c("Seven Bridges Genomics", "U.S. Food and Drug Administration"),
  "reviewer_email" = c("example@sevenbridges.com", "example@fda.gov"),
  "reviewer_contribution" = c("curatedBy", "curatedBy"),
  "reviewer_orcid" = c("https://orcid.org/0000-0000-0000-0000", NA),
  stringsAsFactors = FALSE
)

derived_from <- "https://github.com/biocompute-objects/BCO_Specification/blob/1.2.1-beta/HCV1a.json"
obsolete_after <- as.POSIXct("2018-11-12T12:30:48", format = "%Y-%m-%dT%H:%M:%S", tz = "EST")

embargo <- c(
  "start_time" = as.POSIXct("2017-10-12T12:30:48", format = "%Y-%m-%dT%H:%M:%S", tz = "EST"),
  "end_time" = as.POSIXct("2017-11-12T12:30:48", format = "%Y-%m-%dT%H:%M:%S", tz = "EST")
)

created <- as.POSIXct("2017-01-20T09:40:17", format = "%Y-%m-%dT%H:%M:%S", tz = "EST")

modified <- as.POSIXct("2019-05-10T09:40:17", format = "%Y-%m-%dT%H:%M:%S", tz = "EST")

contributors <- data.frame(
  "name" = c("Jane Doe", "John Doe"),
  "affiliation" = c("Seven Bridges Genomics", "U.S. Food and Drug Administration"),
  "email" = c("example@sevenbridges.com", "example@fda.gov"),
  "contribution" = I(list(c("createdBy", "curatedBy"), c("authoredBy"))),
  "orcid" = c("https://orcid.org/0000-0000-0000-0000", NA),
  stringsAsFactors = FALSE
)

license <- "https://creativecommons.org/licenses/by/4.0/"

compose_provenance(
  name, version, review, derived_from, obsolete_after,
  embargo, created, modified, contributors, license
) %>% convert_json()

{
  "name": "HCV1a ledipasvir resistance SNP detection",
  "version": "1.0.0",
  "review": [
    {
      "status": "approved",
      "reviewer_comment": "Approved by [company name] staff. Waiting for approval from FDA Reviewer",
      "date": 1510507848,
      "reviewer": [
        {
          "reviewer_name": "Jane Doe",
          "reviewer_affiliation": "Seven Bridges Genomics",
          "reviewer_email": "example@sevenbridges.com",
          "reviewer_contribution": "curatedBy",
          "reviewer_orcid": "https://orcid.org/0000-0000-0000-0000"
        }
      ]
    },
    {
      "status": "approved",
      "reviewer_comment": "The revised BCO looks fine",
      "date": 1513110648,
      "reviewer": [
        {
          "reviewer_name": "John Doe",
          "reviewer_affiliation": "U.S. Food and Drug Administration",
          "reviewer_email": "example@fda.gov",
          "reviewer_contribution": "curatedBy",
          "reviewer_orcid": "NA"
        }
      ]
    }
  ],
  "derived_from": "https://github.com/biocompute-objects/BCO_Specification/blob/1.2.1-beta/HCV1a.json",
  "obsolete_after": "2018-11-12T12:30:48-0500",
  "embargo": ["2017-10-12T13:30:48-0400", "2017-11-12T12:30:48-0500"],
  "created": "2017-01-20T09:40:17-0500",
  "modified": "2019-05-10T09:40:17-0500",
  "contributors": [
    {
      "name": "Jane Doe",
      "affiliation": "Seven Bridges Genomics",
      "email": "example@sevenbridges.com",
      "contribution": ["createdBy", "curatedBy"],
      "orcid": "https://orcid.org/0000-0000-0000-0000"
    },
    {
      "name": "John Doe",
      "affiliation": "U.S. Food and Drug Administration",
      "email": "example@fda.gov",
      "contribution": "authoredBy",
      "orcid": "NA"
    }
  ],
  "license": "https://creativecommons.org/licenses/by/4.0/"
}

4 Compose BioCompute Objects

After all the domains are composed, use compose_tlf() to compose the top level fields, as all the domains will be used to calculate an SHA-256 checksum. Next, use biocompute::compose() to compose the complete BioCompute Object.

tlf <- compose_tlf(
  compose_provenance(), compose_usability(), compose_extension(),
  compose_description(), compose_execution(), compose_parametric(),
  compose_io(), compose_error()
)
biocompute::compose(
  tlf,
  compose_provenance(), compose_usability(), compose_extension(),
  compose_description(), compose_execution(), compose_parametric(),
  compose_io(), compose_error()
) %>% convert_json()

{
  "bco_spec_version": "https://w3id.org/biocompute/1.3.0/",
  "bco_id": "https://biocompute.sbgenomics.com/bco/eb6b41ac-155d-4f51-b712-1d9f371fcfc7",
  "checksum": "d743d2f1407707eadcc2a700f7c506f13deddf6bf9c8d9d888e9087cc67a1c8f",
  "provenance_domain": {
    "name": [],
    "version": [],
    "review": [],
    "derived_from": [],
    "obsolete_after": [],
    "embargo": [],
    "created": [],
    "modified": [],
    "contributors": [],
    "license": []
  },
  "usability_domain": [],
  "extension_domain": {
    "fhir_extension": [],
    "scm_extension": []
  },
  "description_domain": {
    "keywords": [],
    "xref": [],
    "platform": "Seven Bridges Platform",
    "pipeline_steps": []
  },
  "execution_domain": {
    "script": [],
    "script_driver": [],
    "software_prerequisites": [],
    "external_data_endpoints": [],
    "environment_variables": []
  },
  "parametric_domain": [],
  "io_domain": {
    "input_subdomain": [],
    "output_subdomain": []
  },
  "error_domain": {
    "empirical_error": [],
    "algorithmic_error": []
  }
}

5 Convert to JSON or YAML

As we have already seen above, use convert_json() or convert_yaml() to convert the domain objects or BCO objects into the JSON or YAML format.

6 Validate BioCompute Objects

To make sure that a BioCompute Object was not tampered and follows the standard, we can validate them by the checksum, or validate them against the BCO JSON schemas. For example

bco <- tempfile(fileext = ".json")
generate_example("HCV1a") %>%
  convert_json() %>%
  export_json(bco)
bco %>% validate_checksum()

── Loading BioCompute Object ───────────────────────────────────────────────────
── Validating Checksum ─────────────────────────────────────────────────────────
Documented checksum: 31ad400fdc50d044255391288b9f48d33beb82a58bf16ec2185c58596277a243
Calculated checksum: 31ad400fdc50d044255391288b9f48d33beb82a58bf16ec2185c58596277a243
Documented and calculated checksum matched.

bco <- tempfile(fileext = ".json")
generate_example("HCV1a") %>%
  convert_json() %>%
  export_json(bco)
bco %>% validate_schema()

── 0: Validating BioCompute Object ─────────────────────────────────────────────
[1] FALSE
attr(,"errors")
                                 field           message
1 data.extension_domain.fhir_extension is the wrong type

── 1: Validating Provenance Domain ─────────────────────────────────────────────
[1] FALSE
attr(,"errors")
                field                  message
1  data.review.0.date        is the wrong type
2  data.review.1.date        is the wrong type
3 data.obsolete_after must be date-time format
4        data.embargo        is the wrong type
5        data.created must be date-time format
6       data.modified must be date-time format

── 2: Validating Usability Domain ──────────────────────────────────────────────
[1] TRUE

── 3.1: Validating Extension Domain (FHIR Extension) ───────────────────────────
[1] FALSE
attr(,"errors")
                  field           message
1 data.fhir_resources.0 is the wrong type
2 data.fhir_resources.1 is the wrong type
3 data.fhir_resources.2 is the wrong type
4 data.fhir_resources.3 is the wrong type
5 data.fhir_resources.4 is the wrong type

── 3.2: Validating Extension Domain (SCM Extension) ────────────────────────────
[1] TRUE

── 4: Validating Description Domain ────────────────────────────────────────────
[1] FALSE
attr(,"errors")
                              field                  message
1                   data.xref.0.ids        is the wrong type
2           data.xref.0.access_time must be date-time format
3                   data.xref.1.ids        is the wrong type
4           data.xref.1.access_time must be date-time format
5           data.xref.2.access_time must be date-time format
6                   data.xref.3.ids        is the wrong type
7           data.xref.3.access_time must be date-time format
8                     data.platform        is the wrong type
9 data.pipeline_steps.0.step_number        is the wrong type

── 5: Validating Execution Domain ──────────────────────────────────────────────
[1] FALSE
attr(,"errors")
                           field           message
1                    data.script is the wrong type
2 data.external_data_endpoints.0 is the wrong type
3 data.external_data_endpoints.1 is the wrong type
4 data.external_data_endpoints.2 is the wrong type

── 6: Validating Parametric Domain ─────────────────────────────────────────────
[1] TRUE

── 7: Validating I/O Domain ────────────────────────────────────────────────────
[1] TRUE

── 8: Validating Error Domain ──────────────────────────────────────────────────
[1] TRUE

7 Export BioCompute Objects

The biocompute package offers a few convinient functions for exporting the BioCompute Objects to a JSON (export_json()), PDF, HTML, or Word document (export_pdf(), export_html(), export_word()), and the capability to export (upload) to cloud-based platforms (export_sevenbridges()). Check the function documentation for details.

Create and Manipulate BioCompute Objects with R

Nan Xiao <nan.xiao@sevenbridges.com>

2019-11-22