Package overview

This vignette provides a brief overview of the most important functions in eia. Other vignettes go into greater depth on specific topics and API endpoints.

API key

Register a key with EIA

Obtaining an API key is easy and free.

Pulling data from the US Energy Information Administration (EIA) API requires a registered API key. A key can be obtained at no cost here. A valid email and agreement to the API Terms of Service is required to obtain a key.

It is important to store your API key somewhere secure. Do not commit it to a repository or otherwise share it. For example, you could store it in your .Renviron file.

Key storage and retrieval

You can always provide the key argument to every API function call, but you do not have to. There are getter and setter helpers available to make using eia functions a more seamless experience.

eia_set_key gives you the option of storing your key for the duration of your R session.

library(eia)
# eia_set_key("yourkey")
# eia_get_key() # retrieve it

If the key already exists in the system environment and you plan to pass key to functions explicitly, you could start as follows.

key <- Sys.getenv("EIA_KEY")

# or:
key <- eia_get_key()

In general, however, if your key is set globally such as in .Renviron, you do not need to do anything regarding the key when you use the package. See the vignette on API details for more information about all the options you have for key storage.

EIA categories

Once you have your EIA registered API key and have it in place for your R session by whichever method you prefer, you are ready to begin accessing data from the EIA API.

It is helpful to be aware of various categories of data that are available. Each category has a unique ID number that is required to access associated data. The first call below does not include an ID. The result is a list of two data frames. The first is metadata associated with that position in the category hierarchy. The second is the child category information.

Here is the top-level category information.

eia_cats()
#> $category
#> # A tibble: 1 x 3
#>   category_id name          notes
#>   <chr>       <chr>         <chr>
#> 1 371         EIA Data Sets ""   
#> 
#> $childcategories
#> # A tibble: 14 x 2
#>    category_id name                               
#>          <int> <chr>                              
#>  1           0 Electricity                        
#>  2       40203 State Energy Data System (SEDS)    
#>  3      714755 Petroleum                          
#>  4      714804 Natural Gas                        
#>  5      711224 Total Energy                       
#>  6      717234 Coal                               
#>  7      829714 Short-Term Energy Outlook          
#>  8      964164 Annual Energy Outlook              
#>  9     1292190 Crude Oil Imports                  
#> 10     2123635 U.S. Electric System Operating Data
#> 11     2134384 International Energy Data          
#> 12     2251604 CO2 Emissions                      
#> 13     2631064 International Energy Outlook       
#> 14     2889994 U.S. Nuclear Outages

The child category IDs can be used to query results from that category.

eia_cats(0)
#> $category
#> # A tibble: 1 x 4
#>   category_id parent_category_id name        notes
#>   <chr>       <chr>              <chr>       <chr>
#> 1 0           371                Electricity ""   
#> 
#> $childcategories
#> # A tibble: 19 x 2
#>    category_id name                                                              
#>          <int> <chr>                                                             
#>  1           1 Net generation                                                    
#>  2          35 Total consumption                                                 
#>  3          32 Total consumption (Btu)                                           
#>  4          36 Consumption for electricity generation                            
#>  5          33 Consumption for electricity generation (Btu)                      
#>  6          37 Consumption for useful thermal output                             
#>  7          34 Consumption for useful thermal output (Btu)                       
#>  8        1017 Plant level data                                                  
#>  9          38 Retail sales of electricity                                       
#> 10          39 Revenue from retail sales of electricity                          
#> 11          40 Average retail price of electricity                               
#> 12     1718389 Number of customer accounts                                       
#> 13       41137 Fossil-fuel stocks for electricity generation                     
#> 14       41138 Receipts of fossil fuels by electricity plants                    
#> 15       41139 Receipts of fossil fuels by electricity plants (Btu)              
#> 16       41140 Average cost of fossil fuels for electricity generation           
#> 17       41141 Average cost of fossil fuels for electricity generation (per Btu) 
#> 18       41142 Quality of fossil fuels in electricity generation : sulfur content
#> 19       41143 Quality of fossil fuels in electricity generation : ash content

EIA time series data

Time series data is obtained by series ID. Most columns contain metadata. The data column contains the time series data.

library(dplyr)
library(tidyr)
library(ggplot2)

id <- "ELEC.GEN.ALL-AK-99.A"
(x <- eia_series(id))
#> # A tibble: 1 x 13
#>   series_id    name                 units      f     description                 copyright source          iso3166 geography start end   updated     data    
#>   <chr>        <chr>                <chr>      <chr> <chr>                       <chr>     <chr>           <chr>   <chr>     <chr> <chr> <chr>       <list>  
#> 1 ELEC.GEN.AL~ Net generation : al~ thousand ~ A     "Summation of all fuels us~ None      EIA, U.S. Ener~ USA-AK  USA-AK    2001  2019  2020-03-23~ <tibble~

x$data[[1]]
#> # A tibble: 19 x 3
#>    value date        year
#>    <dbl> <date>     <int>
#>  1 6340. 2019-01-01  2019
#>  2 6247. 2018-01-01  2018
#>  3 6497. 2017-01-01  2017
#>  4 6335. 2016-01-01  2016
#>  5 6285. 2015-01-01  2015
#>  6 6043. 2014-01-01  2014
#>  7 6497. 2013-01-01  2013
#>  8 6946. 2012-01-01  2012
#>  9 6871. 2011-01-01  2011
#> 10 6760. 2010-01-01  2010
#> 11 6702. 2009-01-01  2009
#> 12 6775. 2008-01-01  2008
#> 13 6821. 2007-01-01  2007
#> 14 6674. 2006-01-01  2006
#> 15 6577. 2005-01-01  2005
#> 16 6527. 2004-01-01  2004
#> 17 6339. 2003-01-01  2003
#> 18 6767. 2002-01-01  2002
#> 19 6744. 2001-01-01  2001

select(x, units, data) %>% unnest(cols = data) %>%
  ggplot(aes(date, value)) + geom_line() +
  labs(y = x$units[1], title = "Net electricity generation, Alaska, all fuels")

plot of chunk series1

You can provide arguments like the following:

eia_series(id) # max results
eia_series(id, n = 5) # most recent five
eia_series(id, end = 2016, n = 5) # ending in 2016
eia_series(id, start = 2000, end = 2016) # specific period

As with eia_cats, the output format does not need to be tidy:

eia_series(id, n = 5, tidy = FALSE) # results of jsonlite::fromJSON
eia_series(id, n = 5, tidy = NA) # origina JSON as character string

This allows you to use the returned results with existing code you may have that requires data in one of these less processed structures.

EIA geosets

Geosets are metadata structures organizing time series datasets that can be mapped. Arguments to eia_geoset are the same as eia_series with the addition of region. Like id, region can be a vector. Most of the details are the same as before.

In the example below using total electricity generation, get the last two data points for each of and two US states. dplyr and tidyr are used here to clean up the result a bit for purposes of display. gpplot2 is used to graph the data after it has been unnested for each state.

id <- c("ELEC.GEN.ALL-99.M") # monthly
region <- c("USA-CA", "USA-NY")
(x <- eia_geoset(id, region, start = "201801", end = "201812"))
#> # A tibble: 2 x 11
#>   geoset_id      setname                        f     units          series_id       name                               region latlon start end   data       
#> * <chr>          <chr>                          <chr> <chr>          <chr>           <chr>                              <chr>  <chr>  <chr> <chr> <list>     
#> 1 ELEC.GEN.ALL-~ Net generation : all fuels : ~ M     thousand mega~ ELEC.GEN.ALL-C~ Net generation : all fuels : Cali~ USA-CA <NA>   2001~ 2020~ <tibble [1~
#> 2 ELEC.GEN.ALL-~ Net generation : all fuels : ~ M     thousand mega~ ELEC.GEN.ALL-N~ Net generation : all fuels : New ~ USA-NY <NA>   2001~ 2020~ <tibble [1~

select(x, region, data) %>% unnest(cols = data)
#> # A tibble: 24 x 5
#>    region  value date        year month
#>    <chr>   <dbl> <date>     <int> <int>
#>  1 USA-CA 14346. 2018-12-01  2018    12
#>  2 USA-CA 14587. 2018-11-01  2018    11
#>  3 USA-CA 16844. 2018-10-01  2018    10
#>  4 USA-CA 17574. 2018-09-01  2018     9
#>  5 USA-CA 20972. 2018-08-01  2018     8
#>  6 USA-CA 22332. 2018-07-01  2018     7
#>  7 USA-CA 17500. 2018-06-01  2018     6
#>  8 USA-CA 15922. 2018-05-01  2018     5
#>  9 USA-CA 14800. 2018-04-01  2018     4
#> 10 USA-CA 14024. 2018-03-01  2018     3
#> # ... with 14 more rows

unnest(x, cols = data) %>%
  ggplot(aes(date, value, color = region)) +
  geom_line() +
  labs(y = x$units[1], title = "Net electricity generation, all fuels")

plot of chunk geoset1

Another convenience of eia_geoset is the ability to provide regions in the following forms.

These shortcuts make it easier to construct an API call involving several states.

region <- c("AK", "New England")
x <- eia_geoset(id, region, n = 2)
select(x, region, data) %>% unnest(cols = data)
#> # A tibble: 14 x 5
#>    region value date        year month
#>    <chr>  <dbl> <date>     <int> <int>
#>  1 USA-AK  583. 2020-02-01  2020     2
#>  2 USA-AK  607. 2020-01-01  2020     1
#>  3 USA-CT 3516. 2020-02-01  2020     2
#>  4 USA-CT 3806. 2020-01-01  2020     1
#>  5 USA-MA 1576. 2020-02-01  2020     2
#>  6 USA-MA 1903. 2020-01-01  2020     1
#>  7 USA-ME  829. 2020-02-01  2020     2
#>  8 USA-ME  859. 2020-01-01  2020     1
#>  9 USA-NH 1394. 2020-02-01  2020     2
#> 10 USA-NH 1520. 2020-01-01  2020     1
#> 11 USA-RI  628. 2020-02-01  2020     2
#> 12 USA-RI  686. 2020-01-01  2020     1
#> 13 USA-VT  208. 2020-02-01  2020     2
#> 14 USA-VT  206. 2020-01-01  2020     1

region <- "Middle Atlantic"
x <- eia_geoset(id, region, n = 12)
select(x, region, data) %>% unnest(cols = data)
#> # A tibble: 36 x 5
#>    region value date        year month
#>    <chr>  <dbl> <date>     <int> <int>
#>  1 USA-NJ 4928. 2020-02-01  2020     2
#>  2 USA-NJ 5497. 2020-01-01  2020     1
#>  3 USA-NJ 6302. 2019-12-01  2019    12
#>  4 USA-NJ 5586. 2019-11-01  2019    11
#>  5 USA-NJ 5715. 2019-10-01  2019    10
#>  6 USA-NJ 6144. 2019-09-01  2019     9
#>  7 USA-NJ 6878. 2019-08-01  2019     8
#>  8 USA-NJ 7468. 2019-07-01  2019     7
#>  9 USA-NJ 5740. 2019-06-01  2019     6
#> 10 USA-NJ 4909. 2019-05-01  2019     5
#> # ... with 26 more rows

unnest(x, cols = data) %>%
  ggplot(aes(date, value, color = region)) +
  geom_line() +
  labs(y = x$units[1], title = "Net electricity generation, all fuels")

plot of chunk geoset2

Even more convenient is that these names are available in R. See the datasets::state.* functions and the geoset vignette.