CRAN Task View: Web Technologies and Services

Maintainer:Scott Chamberlain, Thomas Leeper, Patrick Mair, Karthik Ram, Christopher Gandrud
Contact:myrmecocystus at gmail.com
Version:2020-08-02
URL:https://CRAN.R-project.org/view=WebTechnologies

This Task View contains information about to use R and the world wide web together. The base version of R does not ship with many tools for interacting with the web. Thankfully, there are an increasingly large number of tools for interacting with the web. This task view focuses on packages for obtaining web-based data and information, frameworks for building web-based R applications, and online services that can be accessed from R. A list of available packages and functions is presented below, grouped by the type of activity. The rOpenSci Task View: Open Data provides further discussion of online data sources that can be accessed from R.

If you have any comments or suggestions for additions or improvements for this Task View, go to GitHub and submit an issue , or make some changes and submit a pull request . If you can’t contribute on GitHub, send Scott an email . If you have an issue with one of the packages discussed below, please contact the maintainer of that package. If you know of a web service, API, data source, or other online resource that is not yet supported by an R package, consider adding it to the package development to do list on GitHub .

Tools for Working with the Web from R

Core Tools For HTTP Requests

There are three main packages that should cover most use cases of interacting with the web from R. crul is an R6-based HTTP client that provides asynchronous HTTP requests, a pagination helper, HTTP mocking via webmockr, and request caching for unit tests via vcr. crul targets R developers more so than end users. httr provides more of a user facing client for HTTP requests and differentiates from the former package in that it provides support for OAuth. Note that you can pass in additional curl options when you instantiate R6 classes in crul, and the config parameter in httr. curl is a lower-level package that provides a closer interface between R and the libcurl C library , but is less user-friendly. curl underlies both crul and httr. curl may be useful for operations on web-based XML or to perform FTP operations (as crul and httr are focused primarily on HTTP). curl::curl() is an SSL-compatible replacement for base R’s url() and has support for http 2.0, SSL (https, ftps), gzip, deflate and more. For websites serving insecure HTTP (i.e. using the “http” not “https” prefix), most R functions can extract data directly, including read.table and read.csv; this also applies to functions in add-on packages such as jsonlite::fromJSON() and XML::parseXML. For more specific situations, the following resources may be useful:

Handling HTTP Errors/Codes

Parsing Structured Web Data

The vast majority of web-based data is structured as plain text, HTML, XML, or JSON (javascript object notation). Web service APIs increasingly rely on JSON, but XML is still prevalent in many applications. There are several packages for specifically working with these format. These functions can be used to interact directly with insecure webpages or can be used to parse locally stored or in-memory web files.

Tools for Working with URLs

Tools for Working with Scraped Webpage Contents

Security

Other Useful Packages and Functions

Web and Server Frameworks

Web Services

Cloud Computing and Storage

Document and Code Sharing

Data Analysis and Processing Services

Social Media Clients

Web Analytics Services

Web Services for R Package Development

Other Web Services

CRAN packages:

Related links: