Mark Edmondson


These go into more edge cases: paging API responses, no parsing content, batching and caching API responses.

Paging API responses

A common need for APIs is to read multiple API calls since all data can not fit in one response. googleAuthR provides the gar_api_page function to help with this. It operates upon the generated function you create in gar_api_generator(), after it has been parsed by the data_parse_function. The output is a list of the API responses, which you then will need to parse into one object if you want (e.g. A list of data.frames, that you turn into one data.frame via Reduce(rbind, response_list))

Depending on the API, you have two strategies to consider:

  1. A lot of the Google APIs provide a nextLink field - if you make this available in your data_parse_function you can then use this to fetch all the results.
  2. In some cases the nextLink field is not available - in those cases you can use parameters in the API responses to walk through the pages. These are typically called start-index and page-size, to control what rows in the response you want each call.

An example of the two approaches can be shown below on a Google Analytics segments API response. Both provide equivalent output.

Skip parsing

In some cases you may want to skip all parsing of API content, perhaps if it is not JSON or some other reason.

For this, you can use the option options("googleAuthR.rawResponse" = TRUE) to skip all tests and return the raw response.

Here is an example of this from the googleCloudStorageR library:

Batching API requests

If you are doing many API calls, you can speed this up a lot by using the batch option. This takes the API functions you have created and wraps them in the gar_batch function to request them all in one POST call. You then recieve the responses in a list.

Note that this does not count as one call for API limits purposes, it just speeds up the processing.

Setting batch endpoint

From version googleAuthR 0.6.0 you also need to set an option of the batch endpoint. This is due to multi-batch endpoints being deprecated by Google. You are also no longer able to send batches for multiple APIs in one call.

The batch endpoint is usually of the form:

e.g. For BigQuery, the option is:

Walking through batch requests

A common batch task is to walk through the same API call, modifying only one parameter. An example includes walking through Google Analytics API calls by date to avoid sampling. This is implemented at gar_batch_walk()

This modifies the API function parameters, so you need to supply it which parameters you will (and will not) vary, as well as the values you want to walk through. Some rules to help you get started are:

  • The f function needs to be a gar_api_generator() function that uses at least one of path_args, pars_args or body_args to construct the URL (rather than say using sprintf() to create the API URL)
  • You don’t need to set the headers as described in the Google docs for batching API functions - those are done for you.
  • The argument walk_vector needs to be a vector of the values of the arguments to walk over, which you indicate will walk over the pars/path or body arguments on the function via on of the *_walk arguments e.g. if walking over id=1, id=2, for a path argument then it would be path_walk="id" and walk_vector=c(1,2,3,4)
  • gar_batch_walk() only supports changing one value at a time, for one or multiple arguments (I think only changing the start-date, end-date example would be the case when you walk through more than one per call)
  • batch_size should be over 1 for batching to be of any benefit at all
  • The batch_function argument gives you a way to operate on the parsed output of each call, before they are merged.

An example is shown below - this gets a Google Analytics WebProperty object for each Account ID, in batches of 100 per API call:

Caching API calls

You can also set up local caching of API calls. This uses the memoise package to let you write API responses to memory or disk and to call them from there instead of the API, if its using the same parameters.

A demonstration is shown below:

Caching is activated by using the gar_cache_setup() function.

The default uses memoise::cache_memory() which will cache the response to RAM, but you can change this to any of the memoise cache functions such as cache_s3() or cache_filesytem()

cache_filesystem() will write to a local folder, meaning you can save API responses between R sessions.

Other cache functions

You can see the current cache location function via gar_cache_get_loc and stop caching via gar_cache_empty

Invalidating cache

There are two hard things in Computer Science: cache invalidation and naming things.

In some cases, you may only want to cache the API responses under certain conditions. A common use case is if an API call is checking if a job is running or finished. You would only want to cache the finished state, otherwise the function will run indefinitely.

For those circumstances, you can supply a function that takes the API response as its only input, and outputs TRUE or FALSE whether to do the caching. This allows you to introduce a check e.g. for finished jobs.

The default will only cache for when a successful request 200 is found:

For more advanced use cases, examine the response of failed and successful API calls, and create the appropriate function. Pass that function when creating the cache:

Batching and caching

If you are caching a batched call, your cache invalidation function will need to take account that it will recieve a response which is a multipart/mixed; boundary=batch_{random_string} as its content-type header. This response will need to be parsed into JSON first, before applying your data parsing functions and/or deciding to cache. To get you started here is a cache function:

Using caching

Once set, if the function call name, arguments and body are the same, it will attempt to find a cache. If it exists it will read from there rather than making the API call. If it does not it will make the API call, and save the response to where you have specified.

Applications include saving large responses to RAM during paging calls, so that if the response fails retries are quickly moved to where the API left off, or to write API responses to disk for multi-session caching. You could also use this for unit testing, although its recommend to use httptest library for that as it has wider support for authentication obscuration etc.

Be careful to only use caching where you know the API request won’t change - if you want to reset the cache you can run the gar_cache_setup function again or delete the individual cache file in the directory.


All packages should ideally have tests to ensure that any changes do not break functionality.

If new to testing, first read this guide on using the tidyverse’s testthat, then read this for a beginner’s guide to travis-CI for R, which is a continuous integration system that will run and test your code everytime it is committed to GitHub.

This is this package’s current Travis badge: Travis-CI Build Status

However, testing APIs that need authentication is more complicated, as you need to deal with the authentication token to get a correct response.

One option is to encrypt and upload your token, for which you can read a guide by Jenny Bryan here -

Mocking tests with httptest

However, I recommend using mocking to run tests online.

Mocking means you do not have to upload your authentication token - instead you first run your tests locally with your normal authentication token and the responses are saved to disk. When those tests are then run online, instead of calling the API the saved responses on disk are used to test the function.

In truth, you are most interested in if your functions call the API correctly, rather than if the API is working (the API provider has their own tests for that), so you don’t lose coverage testing against a mock file, and this has the added advantage of being much quicker so you can run tests more often.

When creating your tests, I suggest two types: unit tests that will run against your mocks; and integration tests that will run against the API but only when used locally.

To help with mocking, Neal Richardson has created a package called httptest which we use in the examples below:


One of the nicest features of continuous integration / Travis is it can be used for code coverage. This uses your tests to find out how many lines of code are triggered within them. This lets you see how thorough yours tests are. Developers covert the 100% coverage badge as it shows all of your code is checked every time you push to GitHub and helps mitigate bugs.

This is this package’s current Codecov status: codecov

Using code coverage is a case of adding another metafile and a command to your travis file. This post by Eryk has some details

Offline code coverage

As mentioned above, some tests you may not be able to run online due to authentication issues. If you want, you can run these tests offline locally - go to your repositories settings on Codecov, select settings and get your Repository Upload Token. Place this in an environment var like this:


…then run covr::codecov() in the package home directory. It will run your tests, and upload them to Codecov.