This CRAN task view contains a list of packages related to accessibility of different databases. This does not include data import/export or data management.
As datasets become larger and larger, it is impossible for people to save them in traditional file formats such as spreadsheet, raw text file, etc., which could not fit on devices with limited storage and could not be easily shared across collaborators. Instead, people nowadays tend to store data in databases for more scalable and reliable data management.
Database systems are often classified based on the
database models
that they support.
Relational databases
became dominant in the 1980s. The data in relational databases is modeled as rows and columns in a series of tables with the use of
SQL
to express the logic for writing and querying data. The tables are relational, e.g. you have a user who users your softwares and those softwares have creators and contributors. Non-relational databases became popular in recent years due to huge demand in storing unstructured data with the use of
NoSQL
as the query language. Users generally don't need to define the data schema up front. If there are changing requirements in the applications, non-relational databases can be much easier to use and manage.
The content presented in this Task View are undergoing rapid
changes in industries and academia. Please send any suggestions to the
task view maintainer
or submit a pull request or issue to the
Github repository of this task view
.
The
ctv
package supports these Task Views. Its functions
install.views
and
update.views
allow,
respectively, installation or update of packages from a given Task View;
the option
coreOnly
can restrict operations to packages labeled as
core
below.
Suggestions and corrections by Achim Zeileis, Kirill Müller, Hannes Mühleisen, Rich FitzJohn, Dirk Eddelbuettel, and Hadley Wickham (as well as others I may have forgotten to add here) are gratefully acknowledged. Thanks to Dirk Eddelbuettel who made the initial
.ctv
file and the Markdown conversion script available at the Github repository of CRAN Task View for High Performance Computing
here
. Last but not least, thanks to Achim Zeileis who helped me get started on organizing this task view.
Relational Databases
This section includes packages that provides access to relational databases within R.
-
The
DBI
package provides a database interface definition for communication between R and relational database management systems. It's worth noting that some packages try to follow this interface definition (DBI-compliant) but many existing packages don't.
-
The
RODBC
package provides access to databases through an ODBC interface.
-
The
RMariaDB
package provides a DBI-compliant interface to
MariaDB
and
MySQL
.
-
The
RMySQL
package provides the interface to MySQL. Note that this is the legacy DBI interface to MySQL and MariaDB based on old code ported from S-PLUS. A modern MySQL client based on Rcpp is available from the RMariaDB package we listed above.
-
Packages for
PostgreSQL
, an open-source relational database:
-
The
RPostgreSQL
package and
RPostgres
package both provide fully DBI-compliant Rcpp-backed interfaces to PostgreSQL.
-
The
rpostgis
package provides the interface to its spatial extension
PostGIS
.
-
The
RGreenplum
provides a fully DBI-compliant interface to
Greenplum
, an open-source parallel database on top of PostgreSQL.
-
The
ROracle
package is a DBI-compliant
Oracle database
driver based on the OCI.
The
ora
package provides convenience functions to query and browse a database through the
ROracle
connection.
-
Packages for
SQLite
, a self-contained, high-reliability, embedded, full-featured, public-domain, SQL database engine:
-
The
RSQLite
package embeds the SQLite database engine in R and provides an interface compliant with the DBI package.
-
The
filehashSQLite
package is a simple key-value database using SQLite as the backend.
-
The
liteq
package provides temporary and permanent message queues for R, built on top of SQLite.
-
The
bigrquery
package provides the interface to
Google BigQuery
, Google's fully managed, petabyte scale, low cost analytics data warehouse.
-
The
RDruid
package provides the interface to
Apache Druid
, a high performance analytics data store for event-driven data.
-
The
RH2
package provides the interface to
H2 Database Engine
, the Java SQL database.
-
The
influxdbr
package provides the interface to
InfluxDB
, a time series database designed to handle high write and query loads.
-
The
odbc
package provides a DBI-compliant interface to drivers of
Open Database Connectivity (ODBC)
, which is a low-level, high-performance interface that is designed specifically for relational data stores.
-
The
RPresto
package implements a DBI-compliant interface to
Presto
, an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
-
The
RJDBC
package is an implementation of R's DBI interface using JDBC as a back-end. This allows R to connect to any DBMS that has a JDBC driver.
-
The
implyr
package provides the back-end for
Apache Impala
, which enables low-latency SQL queries on data stored in the Hadoop Distributed File System (HDFS), Apache HBase, Apache Kudu, Amazon Simple Storage Service (S3), Microsoft Azure Data Lake Store (ADLS), and Dell EMC Isilon.
-
The
dbx
package provides intuitive functions for high performance batch operations and safe inserts/updates/deletes without writing SQL on top of
DBI. It is designed for both research and production environments and supports multiple database backends such as Postgres, MySQL, MariaDB, and SQLite.
-
The
sparklyr
package provides provides a
dplyr
interface to
Apache Spark
DataFrames as well as an R interface to Spark’s distributed machine learning pipelines.
-
The
RClickhouse
is a
DBI
interface for
Yandex Clickhouse
, which is a high-performance relational column-store database to enable big data exploration and scaling to petabytes of data. It provides basic
dplyr
support by auto-generating SQL-commands using
dbplyr.
Non-Relational Databases
This section includes packages that provides access to non-relational databases within R.
-
Packages for
Redis
, an open-source, in-memory data structure store that can be used as a database, cache and message broker:
-
The
RcppRedis
package provides interface to Redis using the
hiredis library
.
-
The
redux
package provides a low-level interface to Redis, allowing execution of arbitrary Redis commands with almost no interface, and a high-level generated interface to more than 200 redis commands.
-
Packages for
Elasticsearch
, an open-source, RESTful, distributed search and analytics engine:
-
The
elastic
package provides a general purpose interface to Elasticsearch.
-
The
uptasticsearch
package is a Elasticsearch client tailored to data science workflows.
-
The
mongolite
package provides a high-level, high-performance
MongoDB
client based on
libmongoc
, including support for aggregation, indexing, map-reduce, streaming, SSL encryption and SASL authentication.
-
The
R4CouchDB
package provides a collection of functions for basic database and document management operations in
CouchDB
.
-
The
RCassandra
package provides a direct interface (without the use of Java) to the most basic functionality of
Apache Cassanda
such as login, updates and queries.
-
The
aws.dynamodb
package provides access to
Amazon DynamoDB
.
-
The
rrocksdb
package provides access to
RocksDB
.
Databases Tools
This section includes packages that provides tools for working and testing with databases, databases table manipulations, etc.
-
The
pool
package enables the creation of object pools, which make it less computationally expensive to fetch a new object.
-
The
DBItest
package is a helper that tests DBI back ends for conformity to the interface.
-
The
dbplyr
package is a
dplyr
back-end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a DBI back-end; more advanced features require SQL translation to be provided by the package author.
-
The
sqldf
package provides functionalities to manipulate R Data Frames Using SQL.
-
The
pointblank
package provides tools to validate data tables in databases such as PostgreSQL and MySQL.
-
The
TScompare
package provides utilities for comparing the equality of series on two databases.