Travis build status Coverage status Build status CRAN_Status_Badge lifecycle CRAN RStudio mirror downloads DOI

arkdb

The goal of arkdb is to provide a convienent way to move data from large compressed text files (tsv, csv, etc) into any DBI-compliant database connection (e.g. MYSQL, Postgres, SQLite; see DBI), and move tables out of such databases into text files. The key feature of arkdb is that files are moved between databases and text files in chunks of a fixed size, allowing the package functions to work with tables that would be much to large to read into memory all at once.

Installation

You can install arkdb from GitHub with:

# install.packages("devtools")
devtools::install_github("cboettig/arkdb")

Basic use

library(arkdb)

# additional libraries just for this demo
library(dbplyr)
library(dplyr)
library(fs)

Creating an archive of a database

Consider the nycflights database in SQLite:

tmp <- tempdir() # Or can be your working directory, "."
db <- dbplyr::nycflights13_sqlite(tmp)
#> Caching nycflights db at /var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T//Rtmp3oFbR8/nycflights13.sqlite
#> Creating table: airlines
#> Creating table: airports
#> Creating table: flights
#> Creating table: planes
#> Creating table: weather

Create an archive of the database:

dir <- fs::dir_create(fs::path(tmp, "nycflights"))
ark(db, dir, lines = 50000)
#> Exporting airlines in 50000 line chunks:
#>  ...Done! (in 0.00859499 secs)
#> Exporting airports in 50000 line chunks:
#>  ...Done! (in 0.03927302 secs)
#> Exporting flights in 50000 line chunks:
#>  ...Done! (in 27.67571 secs)
#> Exporting planes in 50000 line chunks:
#>  ...Done! (in 0.2452919 secs)
#> Exporting weather in 50000 line chunks:
#>  ...Done! (in 2.398423 secs)

Unarchive

Import a list of compressed tabular files (i.e. *.csv.bz2) into a local SQLite database:

files <- fs::dir_ls(dir)
new_db <- src_sqlite(fs::path(tmp, "local.sqlite"), create=TRUE)

unark(files, new_db, lines = 50000)
#> Importing airlines.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.04621601 secs)
#> Importing airports.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.146152 secs)
#> Importing flights.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 17.44235 secs)
#> Importing planes.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.06386805 secs)
#> Importing weather.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 1.224544 secs)

new_db
#> src:  sqlite 3.22.0 [/var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T/Rtmp3oFbR8/local.sqlite]
#> tbls: airlines, airports, flights, planes, weather

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

ropensci_footer