README

Import and Export CSV Data With a YAML Metadata Header

CSVY is a file format that combines the simplicity of CSV (comma-separated values) with the metadata of other plain text and binary formats (JSON, XML, Stata, etc.). The CSVY file specification is simple: place a YAML header on top of a regular CSV. The yaml header is formatted according to the Table Schema of a Tabular Data Package.

A CSVY file looks like this:

#--- #profile: tabular-data-resource #name: my-dataset #path: https://raw.githubusercontent.com/csvy/csvy.github.io/master/examples/example.csvy #title: Example file of csvy #description: Show a csvy sample file. #format: csvy #mediatype: text/vnd.yaml #encoding: utf-8 #schema: # fields: # - name: var1 # type: string # - name: var2 # type: integer # - name: var3 # type: number #dialect: # csvddfVersion: 1.0 # delimiter: "," # doubleQuote: false # lineTerminator: "\r\n" # quoteChar: "\"" # skipInitialSpace: true # header: true #sources: #- title: The csvy specifications # path: http://csvy.org/ # email: '' #licenses: #- name: CC-BY-4.0 # title: Creative Commons Attribution 4.0 # path: https://creativecommons.org/licenses/by/4.0/ #--- var1,var2,var3 A,1,2.0 B,3,4.3

Which we can read into R like this:

library("csvy")
str(read_csvy(system.file("examples", "example1.csvy", package = "csvy")))

## 'data.frame': 2 obs. of 3 variables: ## $ var1: chr "A" "B" ## $ var2: int 1 3 ## $ var3: num 2 4.3 ## - attr(*, "profile")= chr "tabular-data-resource" ## - attr(*, "title")= chr "Example file of csvy" ## - attr(*, "description")= chr "Show a csvy sample file." ## - attr(*, "name")= chr "my-dataset" ## - attr(*, "format")= chr "csvy" ## - attr(*, "sources")=List of 1 ## ..$ :List of 3 ## .. ..$ name : chr "CC-BY-4.0" ## .. ..$ title: chr "Creative Commons Attribution 4.0" ## .. ..$ path : chr "https://creativecommons.org/licenses/by/4.0/"

Optional comment characters on the YAML lines make the data readable with any standard CSV parser while retaining the ability to import and export variable- and file-level metadata. The CSVY specification does not use these, but the csvy package for R does so that you (and other users) can continue to rely on utils::read.csv() or readr::read_csv() as usual. The import() function in rio supports CSVY natively.

Export

To create a CSVY file from R, just do:

library("csvy")
library("datasets")
write_csvy(iris, "iris.csvy")

It is also possible to export the metadata to separate YAML or JSON file (and then also possible to import from those separate files) by specifying the metadata field in write_csvy() and read_csvy().

Import

To read a CSVY into R, just do:

## 'data.frame': 150 obs. of 5 variables: ## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... ## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... ## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... ## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... ## $ Species : chr "setosa" "setosa" "setosa" "setosa" ... ## ..- attr(*, "levels")= chr "setosa" "versicolor" "virginica" ## - attr(*, "profile")= chr "tabular-data-package" ## - attr(*, "name")= chr "iris"

or use any other appropriate data import function to ignore the YAML metadata:

## 'data.frame': 150 obs. of 5 variables: ## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... ## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... ## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... ## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... ## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Import and Export CSV Data With a YAML Metadata Header

Export

Import

Package Installation