README

io is an R package for reading from, writing to, and plotting to files in a unified way:

Several file formats and data types can be read using qread and written using qwrite. Through unit tests, we ensure that a file written by qwrite can be read by qread; futher, a subsequent qwrite will produce an identical file as the original.

File formats are automatically inferred from file extensions or R object object types. qread handles gzip, bzip2, or xz compressed files and can iteratively read all files in a directory. qwrite creates directory structure for output files.

With the filenamer package, output filenames (of the filename class) can also be automatically organized into directories with time stamps and symbolically linked to the base directory (see the filenamer package for details).

Say no to generic file extensions. Say yes to standardized file formats.

Dependencies

Build

Clone the repository, build the documentation with roxygen2, then install.

$ git clone https://bitbucket.org/djhshih/io.git
$ cd io
$ R

R> library(roxygen2)
R> roxygenize()
R> quit()

$ R CMD INSTALL .

Usage

Load the library and use qwrite to write data and qread to read data. Output type is inferred automatically from the file extension.

library(io)
data(cars)

qwrite(cars, "cars.rds")
qwrite(cars, "cars.csv")

Input type is also inferred automatically.

cars_rds <- qread("cars.rds")
cars_csv <- qread("cars.csv")

The RDS format preserves the R object.

identical(cars_rds, cars)
## TRUE

The CSV format does not preserve the R object (but it may be more easily read by an external programs).

identical(cars_csv, cars)
## FALSE

In cars_csv, the data is read in as integer instead of numeric by the underlying read.table function.

str(cars_csv)

To share data with other programs, consider using HDF5 (binary) or YAML (text) format. For the HDF5 format, attributes are normally stripped from R objects. qwrite ensures that names and dimnames are saved to and loaded from HDF5 files. To avoid data loss, you should export only objects of vector, matrix, data.frame or array without additional attributes.

Supports for other formats are available via optional dependencies:

io_supported(c("xml", "yaml", "json", "hdf5"))

Plotting in multiple file formats is easy with qdraw. By default, plots are drawn to screen and then printed to file (in the the specified format).

plot_cars <- function() plot(dist ~ speed, cars)
qdraw(plot_cars())
qdraw(plot_cars(), "cars.pdf")
qdraw(plot_cars(), "cars.png")
qdraw(plot_cars(), "cars.svg")

As shown, we change the output format by simply changing the extension. We no longer manually call initilization or finalization functions, nor do we worry about the possibility of extension and format mismatch for our output files.

See ?io.

Extension

Owing to R’s S3 class system, it is trivial to extend io with read and write functions for additional formats, without modifying the package itself. Simply fill in the blank:

qread.EXTENSION <- function(file, type, ...) {
  ## BLANK: read data from `file` to `x`
    x
}

qwrite.EXTENSION <- function(x, file, type, ...) {
  ## BLANK: write data from `x` to `file`
}

Replace EXTENSION with the file extension of the new format, and complete the implementation details. Then, you use qread and qwrite just as you would any currently supported format and benefit from the features of io and filenamer.

And if you have not done your good deed of the day yet, write a test file using qwrite, read the data back in using qread, ensure the new file outputted by another qwrite is the same as the original test file, and kindly send us a pull request to contribute to the package.