Data Archiving

This vignette describes how to archive data into Dataverse directly from R.

library("dataverse")
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")

SWORD-based Data Archiving

The main data archiving (or “deposit”) workflow for Dataverse is built on SWORD v2.0. This means that to create a new dataset listing, you will have first initialize a dataset entry with some metadata, add one or more files to the dataset, and then publish it. This looks something like the following:

# retrieve your service document
d <- service_document()

# list current datasets in a dataverse
list_datasets("mydataverse")

# create a new dataset
## create a list of metadata
metadat <- list(title = "My Study",
                creator = "Doe, John",
                description = "An example study")
## initiate the dataset
dat <- initiate_sword_dataset("mydataverse", body = metadat)

Once the dataset is initiated, it is possible to add and delete files:

tmp <- tempfile()
write.csv(iris, file = tmp)
f <- add_file(dat, file = tmp)

The add_file() function accepts, as its first argument, a character vector of file names, a data.frame, or a list of R objects. Files can be deleted using delete_file(). Once the dataset is finalized, it can be published using publish_dataset():

publish_dataset(dat)

And it will then show up in the list of published datasets returned by list_datasets(dat).

Native API

Dataverse also implements a second way to release datasets, called the “native” API. It is similar to to the SWORD API:

# create the dataset
ds <- create_dataset("mydataverse")

# add files
tmp <- tempfile()
write.csv(iris, file = tmp)
f <- add_dataset_file(file = tmp, dataset = ds)

# publish dataset
publish_dataset(ds)

# dataset will now be published
get_dataverse("mydataverse")

Data Archiving

2017-06-15

SWORD-based Data Archiving

Native API