The import package

Stefan Holst Milton Bache. This version: June, 2015.

import-logo

Introduction and Motivation

One of the most important aspects of the R ecosystem is the ease with which extensions and new features can be developed and distributed in the form of packages. The main distribution channel is the Comprehensive R Archive Network, from which packages can be installed directly from R. Another popular option is using GitHub repositories from which packages can also be painlessly installed, e.g. using install_github from the devtools package. The drat package is a third option which provides functionality to use and manage package repositories.

The import package provides an alternative approach to using external functionality in R programs; first however, it is useful to describe the standard approach to clarify how import may serve as improvement. The most common way to include the functionality provided by a package is to use the library function:

library(PackageA)
library(PackageB)

value1 <- function_a(...) # Supposedly this comes from PackageA, 
value2 <- function_b(...) # and this from PackageB, but who knows?!
...

In some situations this is fine; however there are some subtle shortcomings:

  1. Packages are attached and all of their exported objects are exposed,
  2. When using more packages this way, the order in which they are attached can be important,
  3. It quickly becomes unclear to the reader of a script which package provides certain functionality, and
  4. the terms “library” and “package” are often used incorrectly (although a minor point, it seems to confuse somewhat).

The problem with (1) is that the search path is populated with more objects than are needed and it is not immediately clear whether name clashes will occur. Problem (2) refers to the case where packages export different objects with the same names, say if function_b is exported in both PackageA and PackageB above. In this case the name will point to the object from the package attached last. The earlier exposed objects are said to be masked. Even if this is not a problem when writing the script, an update of packages may cause this problem later on when executing the script; and tracking down the resulting errors may be tough and time consuming. Problem (3) may appear unimportant, but it is not to be underestimated. Code snippets are very commonly shared and spending time figuring out where functionality comes from is not a very satisfying nor value-adding activity.

To overcome the above, one may alternatively import exported objects, one at a time, using the (double) “colon syntax”,

function_a <- PackageA::function_a
function_b <- PackageB::function_b

or functions can be explicitly qualified with pkg:: when used, but this is often overly verbose and does not provide an easily accessible overview of what external functionality is used in a script.

While packages form the backbone of code distribution, another option comes in the form of scripts, but these are usually task specific and not commonly used to “bundle” functionality for use in other scripts. In particular, when source is used to include contents from one script in another, once again all objects produced by the script will be “exposed” and may “over populate” the working environment, masking other objects, if not only producing some mental clutter. Scope management is therefore not too comfortable when splitting functionality across files in a modular way.

The import package sets out to improve the way external functionality is included in your code by alleviating some of the concerns raised above by providing an expressive way of importing objects from both packages and scripts. The latter provides a bridge between the package approach to distribution and simple stand-alone script files. This allows for the use of scripts as modules, a collection of related object definitions, each of which may be used at different places without exposing more than necessary.

Usage

The import package itself should not to be attached (don’t include it via library, you will get a warning). Rather, it is designed to be expressive when using the colon syntax. A first pseudo-example is:

import::from(magrittr, "%>%")
import::from(dplyr, mutate, keep_when = filter)
import::from(tidyr, spread)
import::from(broom, tidy)

ready_data <-
  raw_data %>% 
  mutate(var2 = fun(var1)) %>% 
  keep_when(var1 > 0) %>% 
  spread(key, value) 

linear_model <- 
  lm(var1 ~ ., data = ready_data) %>% 
  tidy
  
# ... and more code below.

In the above, it is clear which package provides which functions (one could e.g. otherwise be tempted to think that tidy belonged to tidyr). Note that ordering is irrelevant, even if tidyr at some point exposes a function tidy after an update, as import is explicit about importing. Surely the import statements are more verbose than the simpler library alternative, but they are much clearer, safer, and informative.

The example also shows that one can import multiple objects in a single statement, and even rename objects if desired; for example, in the above one can imagine that filter from stats is needed later on, and so dplyr’s filter is renamed to avoid confusion. Sometimes, it is not at all clear what purpose a package has; e.g. the name magrittr does not immediately reveal that it’s main purpose is to provide the pipe operator, %>%.

When import::from is used to import objects it will place them in an environment attached in the search path under the name “imports”. This name is a default, and can be specified:

import::from(dplyr, mutate, select, .into = "wrangling")

# or equivalently:
import::into("wrangling", mutate, select, .from = dplyr)

ls("wrangling") # Also viewable in Rstudio's environment browser!
# => [1] "mutate" "select"

Note, that the .from and .into arguments are prefixed by a dot to clearly distinguish them from other named arguments. The two alternatives above are equivalent (the choice is a matter of preference).

If it is desired to import objects directly into the current environment, this can be accomplished by import::here. This is particularly useful when importing inside a function definition, or module scripts as described below.

Finally, as with library, package (and object) names can be quoted or unquoted:

import::from("magrittr", "%>%", "%$%")
import::from(magrittr, "%>%", "%$%") # Special names, however, need the quotes.

“Module” scripts

Suppose that you have some related functionality that you wish to bundle, and that authoring a full package seems excessive or inappropriate for the specific task, for example bundling related UI components for a shiny application. Since import can also import from stand-alone R files (identified as existing files ending in .R), One option is to author a module (script), say as outlined below:

# File: foo.R
# Desc: Functionality related to foos.
# Imports from other_resources.R
import::here(fun_a, fun_b, .from = "other_resources.R")

internal_fun <- function(...) ...

fun_c <- function(...) 
{
  ...
  a <- fun_a(...)
  i <- internal_fun(...)
  ...
}

fun_d <- function(...) ...

Then in another file we need fun_c, but have no interest in accessing the other objects:

# File: bar.R
# Desc: Functionality related to bars. 
# Imports from foo.R
import::here(fun_c, .from = "foo.R")
...

In the above, only fun_c is visible inside bar.R. The functions on which it depends exist, but are not exposed. Also, note that imported scripts may themselves import externally defined objects. To avoid imports becoming global, it is suggested to use import::here rather than import::from, in which case imports are only exposed to the module itself. Note how this approach gives better (or cleaner) way of managing scope, than when using source.

It is ill-advised to use library in module scripts, although not enforced. But attachments are detached but loaded namespaces remain loaded. This means that values created by functions in an attached namespace will work with import, but functions to be exported should not rely on such functions (use function importing in the modules instead).

When importing from a module, it is sourced into an environment managed by import, and will not be sourced again upon subsequent imports (unless the file has changed). For example, in a shiny application, importing some objects in server.R and others in ui.R from the same module will not cause it to be sourced twice.

Modules, as described here, are not meant as an alternative for packages. For general distribution of reusable code and functionality packages are obviously far superior. The idea is that in some cases a full package may not be needed, and modules are more light-weight, require less development time, and may on occasion be rather convenient.

Specifying a Library

The import package will by default only use the latest specified library (i.e. the result of .libPaths()[1L]). It is possible to specify a different library using the .library argument in any of the import functions. One import call can only use one library so there will not be ambiguity as to where imports come from.

Development

To follow the development of import, visit http://github.com/smbache/import; to file an issue, provide feedback and/or suggestions, visit http://github.com/smbache/import/issues.

***