Introduction to DataSpaceR

This package provides a thin wrapper around Rlabkey and connects to the the CAVD DataSpace database, making it easier to fetch datasets from specific studies.

Configuration

First, go to DataSpace now and set yourself up with an account.

In order to connect to the CAVD DataSpace via DataSpaceR, you will need a netrc file in your home directory that will contain a machine name (hostname of DataSpace), and login and password. There are two ways to create a netrc file.

Creating a netrc file with `writeNetrc`

On your R console, create a netrc file using a function from DataSpaceR:

writeNetrc(
  login = "yourEmail@address.com", 
  password = "yourSecretPassword",
  netrcFile = "/your/home/directory/.netrc" # use getNetrcPath() to get the default path 
)

This will create a netrc file in your home directory. Make sure you have a valid login and password.

Manually creating a netrc file

Alternatively, you can manually create a netrc file.

On Windows, this file should be named _netrc
On UNIX/Mac, it should be named .netrc
The file should be located in the user’s home directory, and the permissions on the file should be unreadable for everybody except the owner
To determine your home directory, run Sys.getenv("HOME") in R

The following three lines must be included in the .netrc or _netrc file either separated by white space (spaces, tabs, or newlines) or commas. Multiple such blocks can exist in one file.

machine dataspace.cavd.org
login myuser@domain.com
password supersecretpassword

See here for more information about netrc.

Initiate a connection

We’ll be looking at study cvd256. If you want to use a different study, change that string. You can instantiate multiple connections to different studies simultaneously.

library(DataSpaceR)
#> By exporting data from the CAVD DataSpace, you agree to be bound by the Terms of Use available on the CAVD DataSpace sign-in page at https://dataspace.cavd.org
con <- connectDS()
con
#> <DataSpaceConnection>
#>   URL: https://dataspace.cavd.org
#>   User: jkim2345@scharp.org
#>   Available studies: 254
#>     - 72 studies with data
#>     - 4872 subjects
#>     - 407558 data points
#>   Available groups: 6

The call to connectDS instantiates the connection. Printing the object shows where it’s connected and the available studies.

knitr::kable(head(con$availableStudies))

study_name	short_name	title	type	status	stage	species	start_date	strategy	network	data_availability
cvd232	Parks_RV_232	Limiting Dose Vaginal SIVmac239 Challenge of RhCMV-SIV vaccinated Indian rhesus macaques.	Pre-Clinical NHP	Inactive	Assays Completed	Rhesus macaque	2009-11-24	Vector vaccines (viral or bacterial)	CAVD	NA
cvd234	Zolla-Pazner_Mab_test1 Study	Zolla-Pazner_Mab_Test1	Antibody Screening	Inactive	Assays Completed	Non-Organism Study	2009-02-03	Prophylactic neutralizing Ab	CAVD	NA
cvd235	mAbs potency	Weiss mAbs potency	Antibody Screening	Inactive	Assays Completed	Non-Organism Study	2008-08-21	Prophylactic neutralizing Ab	CAVD	NA
cvd236	neutralization assays	neutralization assays	Antibody Screening	Active	In Progress	Non-Organism Study	2009-02-03	Prophylactic neutralizing Ab	CAVD	NA
cvd238	Gallo_PA_238	HIV-1 neutralization responses in chronically infected individuals	Antibody Screening	Inactive	Assays Completed	Non-Organism Study	2009-01-08	Prophylactic neutralizing Ab	CAVD	NA
cvd239	CAVIMC-015	Lehner_Thorstensson_Allovac	Pre-Clinical NHP	Inactive	Assays Completed	Rhesus macaque	2009-01-08	Protein and peptide vaccines	CAVD	This study has assay data (NAB) in the DataSpace.

con$availableStudies shows the available studies in the CAVD DataSpace. Check out the reference page of DataSpaceConnection for all available fields and methods.

cvd256 <- con$getStudy("cvd256")
cvd256
#> <DataSpaceStudy>
#>   Study: cvd256
#>   URL: https://dataspace.cavd.org/CAVD/cvd256
#>   Available datasets:
#>     - BAMA
#>     - Demographics
#>     - NAb
#>   Available non-integrated datasets:

con$getStudy creates a connection to the study cvd256. Printing the object shows where it’s connected, to what study, and the available datasets.

knitr::kable(cvd256$availableDatasets)

name	label	n	integrated
BAMA	Binding Ab multiplex assay	6740	TRUE
Demographics	Demographics	121	TRUE
NAb	Neutralizing antibody	1419	TRUE

knitr::kable(cvd256$treatmentArm)

arm_id	arm_part	arm_group	arm_name	randomization	coded_label	last_day	description
cvd256-NA-A-A	NA	A	A	Vaccine	Group A Vaccine	168	DNA-C 4 mg administered IM at weeks 0, 4, and 8 AND NYVAC-C 10^7pfu/mL administered IM at week 24
cvd256-NA-B-B	NA	B	B	Vaccine	Group B Vaccine	168	DNA-C 4 mg administered IM at weeks 0 and 4 AND NYVAC-C 10^7pfu/mL administered IM at weeks 20 and 24

Available datasets and treatment arm information for the connection can be accessed by availableDatasets and treatmentArm.

Fetching datasets

We can grab any of the datasets listed in the connection (availableDatasets).

NAb <- cvd256$getDataset("NAb")
dim(NAb)
#> [1] 1419   29
colnames(NAb)
#>  [1] "ParticipantId"          "ParticipantVisit/Visit" "visit_day"             
#>  [4] "assay_identifier"       "summary_level"          "specimen_type"         
#>  [7] "antigen"                "antigen_type"           "virus"                 
#> [10] "virus_type"             "virus_insert_name"      "clade"                 
#> [13] "neutralization_tier"    "tier_clade_virus"       "target_cell"           
#> [16] "initial_dilution"       "titer_ic50"             "titer_ic80"            
#> [19] "response_call"          "nab_lab_source_key"     "lab_code"              
#> [22] "exp_assayid"            "titer_ID50"             "titer_ID80"            
#> [25] "nab_response_ID50"      "nab_response_ID80"      "slope"                 
#> [28] "vaccine_matched"        "study_prot"

The cvd256 object is an R6 class, so it behaves like a true object. Functions (like getDataset) are members of the object, thus the $ semantics to access member functions.

We can get detailed variable information using getDatasetDescription.

knitr::kable(cvd256$getDatasetDescription("NAb"))

fieldName	caption	type	description
ParticipantId	Participant ID	Text (String)	Subject identifier
antigen	Antigen name	Text (String)	The name of the antigen (virus) being tested.
antigen_type	Antigen type	Text (String)	The standardized term for the type of virus used in the construction of the nAb antigen.
assay_identifier	Assay identifier	Text (String)	Name identifying assay
clade	Virus clade	Text (String)	The clade (gene subtype) of the virus (antigen) being tested.
exp_assayid	Experimental Assay Design Code	Integer	Unique ID assigned to the experiment design of the assay for tracking purposes.
initial_dilution	Initial dilution	Number (Double)	Indicates the initial specimen dilution.
lab_code	Lab ID	Text (String)	A code indicating the lab performing the assay.
nab_lab_source_key	Data provenance	Integer	Details regarding the provenance of the assay results.
nab_response_ID50	Response call ID50	True/False (Boolean)	Indicates if neutralization is detected based on ID50 titer.
nab_response_ID80	Response call ID80	True/False (Boolean)	Indicates if neutralization is detected based on ID80 titer.
neutralization_tier	Neutralization tier	Text (String)	A classification specific to HIV NAb assay design, in which an antigen is assessed for its ease of neutralization (1=most easily neutralized, 3=least easily neutralized)
response_call	Response call	True/False (Boolean)	Indicates if neutralization is detected.
slope	Slope	Number (Double)	The slope calculated using the difference between 50% and 80% neutralization.
specimen_type	Specimen type	Text (String)	The type of specimen used in the assay. For nAb assays, this is generally serum or plasma.
study_prot	Study Protocol	Text (String)	Study protocol
summary_level	Data summary level	Text (String)	Defines the level at which the magnitude or response has been summarized (e.g. summarized at the isolate level).
target_cell	Target cell	Text (String)	The cell line used in the assay to determine infection (lack of neutralization). Generally TZM-bl or A3R5, but can also be other cell lines or non-engineered cells.
tier_clade_virus	Neutralization tier + Antigen clade + Virus	Text (String)	A combination of neutralization tier, antigen clade, and virus used for filtering.
titer_ID50	Titer ID50	Number (Double)	The adjusted value of 50% maximal inhibitory dilution (ID50).
titer_ID80	Titer ID80	Number (Double)	The adjusted value of 80% maximal inhibitory dilution (ID80).
titer_ic50	Titer IC50	Number (Double)	The half maximal inhibitory concentration (IC50).
titer_ic80	Titer IC80	Number (Double)	The 80% maximal inhibitory concentration (IC80).
vaccine_matched	Antigen vaccine match indicator	True/False (Boolean)	Indicates if the interactive part of the antigen was designed to match the immunogen in the vaccine.
virus	Virus name	Text (String)	The term for the virus (antigen) being tested.
virus_insert_name	Virus insert name	Text (String)	The amino acid sequence inserted in the virus construct.
virus_type	Virus type	Text (String)	The type of virus used in the construction of the nAb antigen.
visit_day	Visit Day	Integer	Target study day defined for a study visit. Study days are relative to Day 0, where Day 0 is typically defined as enrollment and/or first injection.

To get only a subset of the data and speed up the download, filters can be passed to getDataset. The filters are created using the makeFilter function of the Rlabkey package.

cvd256Filter <- makeFilter(c("visit_day", "EQUAL", "0"))
NAb_day0 <- cvd256$getDataset("NAb", colFilter = cvd256Filter)
dim(NAb_day0)
#> [1] 709  29

See ?makeFilter for more information on the syntax.

Creating a connection to all studies

To fetch data from multiple studies, create a connection at the project level.

cavd <- con$getStudy("")

This will instantiate a connection at the CAVD level. Most functions work cross study connections just like they do on single studies.

You can get a list of datasets available across all studies.

cavd
#> <DataSpaceStudy>
#>   Study: CAVD
#>   URL: https://dataspace.cavd.org/CAVD
#>   Available datasets:
#>     - BAMA
#>     - Demographics
#>     - ELISPOT
#>     - ICS
#>     - NAb
#>   Available non-integrated datasets:
knitr::kable(cavd$availableDatasets)

name	label	n	integrated
BAMA	Binding Ab multiplex assay	169071	TRUE
Demographics	Demographics	4872	TRUE
ELISPOT	Enzyme-Linked ImmunoSpot	5610	TRUE
ICS	Intracellular Cytokine Staining	182779	TRUE
NAb	Neutralizing antibody	50098	TRUE

In all-study connection, getDataset will combine the requested datasets. Note that in most cases, the datasets will have too many subjects for quick data transfer, making filtering of the data a necessity. The colFilter argument can be used here, as described in the getDataset section.

conFilter <- makeFilter(c("species", "EQUAL", "Human"))
human <- cavd$getDataset("Demographics", colFilter = conFilter)
dim(human)
#> [1] 3087   36
colnames(human)
#>  [1] "SubjectId"                       "SubjectVisit/Visit"             
#>  [3] "species"                         "subspecies"                     
#>  [5] "sexatbirth"                      "race"                           
#>  [7] "ethnicity"                       "country_enrollment"             
#>  [9] "circumcised_enrollment"          "bmi_enrollment"                 
#> [11] "agegroup_range"                  "agegroup_enrollment"            
#> [13] "age_enrollment"                  "study_label"                    
#> [15] "study_start_date"                "study_first_enr_date"           
#> [17] "study_fu_complete_date"          "study_public_date"              
#> [19] "study_network"                   "study_last_vaccination_day"     
#> [21] "study_type"                      "study_part"                     
#> [23] "study_group"                     "study_arm"                      
#> [25] "study_arm_summary"               "study_arm_coded_label"          
#> [27] "study_randomization"             "study_product_class_combination"
#> [29] "study_product_combination"       "study_short_name"               
#> [31] "study_grant_pi_name"             "study_strategy"                 
#> [33] "study_prot"                      "genderidentity"                 
#> [35] "studycohort"                     "bmi_category"

Check out the reference page of DataSpaceStudy for all available fields and methods.

Connect to a saved group

A group is a curated collection of participants from filtering of treatments, products, studies, or species, and it is created in the DataSpace App.

Let’s say you are using the App to filter and visualize data and want to save them for later or explore in R with DataSpaceR. You can save a group by clicking the Save button on the Active Filter Panel.

We can browse available the saved groups or the curated groups by DataSpace Team via availableGroups.

knitr::kable(con$availableGroups)

id	label	original_label	description	created_by	shared	n	studies
216	mice	mice	NA	readjk	FALSE	75	c(“cvd468”, “cvd483”, “cvd316”, “cvd331”)
217	CAVD 242	CAVD 242	This is a fake group for CAVD 242	readjk	FALSE	30	cvd242
220	NYVAC durability comparison	NYVAC_durability	Compare durability in 4 NHP studies using NYVAC-C (vP2010) and NYVAC-KC-gp140 (ZM96) products.	ehenrich	TRUE	78	c(“cvd281”, “cvd434”, “cvd259”, “cvd277”)
224	cvd338	cvd338	NA	readjk	FALSE	36	cvd338
228	HVTN 505 case control subjects	HVTN 505 case control subjects	Participants from HVTN 505 included in the case-control analysis	drienna	TRUE	189	vtn505
230	HVTN 505 polyfunctionality vs BAMA	HVTN 505 polyfunctionality vs BAMA	Compares ICS polyfunctionality (CD8+, Any Env) to BAMA mfi-delta (single Env antigen) in the HVTN 505 case control cohort	drienna	TRUE	170	vtn505

To fetch data from a saved group, create a connection at the project level with a group ID. For example, we can connect to the “NYVAC durability comparison” group which has group ID 220 by getGroup.

nyvac <- con$getGroup(220)
nyvac
#> <DataSpaceStudy>
#>   Group: NYVAC durability comparison
#>   URL: https://dataspace.cavd.org/CAVD
#>   Available datasets:
#>     - BAMA
#>     - Demographics
#>     - ELISPOT
#>     - ICS
#>     - NAb
#>   Available non-integrated datasets:

Retrieving a dataset is the same as before.

NAb_nyvac <- nyvac$getDataset("NAb")
dim(NAb_nyvac)
#> [1] 4281   29

Access monoclonal antibody data

See other vignette for a tutorial on accessing monoclonal antibody data with DataSpaceR:

vignette("Accessing_Monoconal_Antibody_Data")

Reference Tables

The followings are the tables of all fields and methods that work on DataSpaceConnection and DataSpaceStudy objects and could be used as a quick reference.

`DataSpaceConnection`

Name	Description
`availableStudies`	The table of available studies.
`availableGroups`	The table of available groups.
`mabGrid`	The filtered mAb grid.
`mabGridSummary`	The summarized mAb grid with updated `n_` columns and `geometric_mean_curve_ic50`.
`filterMabGrid`	Filter rows in the mAb grid by specifying the values to keep in the columns found in the `mabGrid` field.
`resetMabGrid`	Reset the mAb grid to the unfiltered state.
`getMab`	Create a `DataSpaceMab` object by filtered `mabGrid`.
`getStudy`	Create a `DataSpaceStudy` object by study.
`getGroup`	Create a `DataSpaceStudy` object by group.

`DataSpaceStudy`

Name	Description
`study`	The study name.
`group`	The group name.
`availableDatasets`	The table of datasets available in the study object.
`treatmentArm`	The table of treatment arm information for the connected study. Not available for all study connection.
`dataDir`	The default target directory for downloading non-integrated datasets.
`studyInfo`	Stores the information about the study.
`getDataset`	Get a dataset from the connection.
`getDatasetDescription`	Get variable information.
`setDataDir`	Set default target directory for downloading non-integrated datasets.

`DataSpaceMab`

Name	Description
`studyAndMabs`	The table of available mAbs by study.
`mabs`	The table of available mAbs and their attributes.
`nabMab`	The table of mAbs and their neutralizing measurements against viruses.
`studies`	The table of available studies.
`assays`	The table of assay status by study.
`variableDefinitions`	The table of variable definitions.

Introduction to DataSpaceR

Ju Yeong Kim

2020-01-07

Configuration

Creating a netrc file with `writeNetrc`

Manually creating a netrc file

Initiate a connection

Fetching datasets

Creating a connection to all studies

Connect to a saved group

Access monoclonal antibody data

Reference Tables

`DataSpaceConnection`

`DataSpaceStudy`

`DataSpaceMab`

Session information

Introduction to DataSpaceR

Ju Yeong Kim

2020-01-07

Configuration

Creating a netrc file with writeNetrc

Manually creating a netrc file

Initiate a connection

Fetching datasets

Creating a connection to all studies

Connect to a saved group

Access monoclonal antibody data

Reference Tables

DataSpaceConnection

DataSpaceStudy

DataSpaceMab

Session information

Creating a netrc file with `writeNetrc`

`DataSpaceConnection`

`DataSpaceStudy`

`DataSpaceMab`