Geocoding services are used to provide data about locations such as longitude and latitude coordinates. The goal of tidygeocoder is to make getting data from these services easy. The two main functions to use are geocode()
which takes a dataframe as an input and geo()
which takes character values as inputs.
The geocode()
function extracts specified address columns from the input dataframe and passes them to geo()
to perform geocoding. All extra arguments (...
) given to geocode()
are passed to geo()
so refer to the documentation in geo()
for all the possible arguments you can give to the geocode()
function.
library(tibble)
library(DT)
library(dplyr)
library(tidygeocoder)
address_single <- tibble(singlelineaddress = c('11 Wall St, NY, NY',
'600 Peachtree Street NE, Atlanta, Georgia'))
address_components <- tribble(
~street , ~cty, ~st,
'11 Wall St', 'NY', 'NY',
'600 Peachtree Street NE', 'Atlanta', 'GA'
)
You can use the address
argument to specify single-line addresses. Note that when multiple addresses are provided, the batch geocoding functionality of the Census geocoder service is used. Additionally, verbose = TRUE
displays logs to the console.
address_single %>% geocode(address = singlelineaddress, method = 'census',
verbose = TRUE)
#> Number of Unique Addresses: 2
#> Passing 2 addresses to the census batch geocoder
#> Querying API URL: https://geocoding.geo.census.gov/geocoder/locations/addressbatch
#> Passing the following parameters to the API:
#> format : "json"
#> benchmark : "Public_AR_Current"
#> vintage : "Current_Current"
#>
#> Query completed in: 1.5 seconds
#> # A tibble: 2 x 3
#> singlelineaddress lat long
#> <chr> <dbl> <dbl>
#> 1 11 Wall St, NY, NY 40.7 -74.0
#> 2 600 Peachtree Street NE, Atlanta, Georgia 33.8 -84.4
Alternatively you can run the same query with the geo()
function by passing the address values from the dataframe directly. In either geo()
or geocode()
, the lat
and long
arguments are used to name the resulting latitude and longitude fields. Here the method
argument is used to specify the OSM (Nominatim) geocoder service.
geo(address = address_single$singlelineaddress, method = 'osm',
lat = latitude, long = longitude)
#> # A tibble: 2 x 3
#> address latitude longitude
#> <chr> <dbl> <dbl>
#> 1 11 Wall St, NY, NY 40.7 -74.0
#> 2 600 Peachtree Street NE, Atlanta, Georgia 33.8 -84.4
Instead of single-line addresses, you can use any combination of the following arguments to specify your addresses: street
, city
, state
, county
, postalcode
, and country
.
address_components %>% geocode(street = street, city = cty, state = st,
method = 'census')
#> # A tibble: 2 x 5
#> street cty st lat long
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 11 Wall St NY NY 40.7 -74.0
#> 2 600 Peachtree Street NE Atlanta GA 33.8 -84.4
The cascade
method first tries to use one geocoder service and then again attempts to geocode addresses that were not found using a second geocoder service. By default it first uses the Census Geocoder and then OSM, but you can specify any two methods you want (in order) with the cascade_order
argument.
addr_comp1 <- address_components %>%
bind_rows(tibble(cty = c('Toronto', 'Tokyo'), country = c('Canada', 'Japan')))
addr_comp1 %>% geocode(street = street, state = st, city = cty,
country = country, method = 'cascade')
#> # A tibble: 4 x 7
#> street cty st country lat long geo_method
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 11 Wall St NY NY <NA> 40.7 -74.0 census
#> 2 600 Peachtree Street NE Atlanta GA <NA> 33.8 -84.4 census
#> 3 <NA> Toronto <NA> Canada 43.6 -79.4 osm
#> 4 <NA> Tokyo <NA> Japan 35.7 139. osm
To return more data than just the latitude and longitude coordinates, specify full_results = TRUE
. Additionally, for the Census geocoder you can get fields for geographies such as Census tracts by specifying return_type = 'geographies'
. Be sure to use full_results = TRUE
with return_type = 'geographies'
in order to allow the Census geography columns to be returned.
census_full1 <- address_single %>% geocode(address = singlelineaddress,
method = 'census', full_results = TRUE, return_type = 'geographies')
glimpse(census_full1)
#> Rows: 2
#> Columns: 14
#> $ singlelineaddress <chr> "11 Wall St, NY, NY", "600 Peachtree Street NE, Atl…
#> $ lat <dbl> 40.70747, 33.77085
#> $ long <dbl> -74.01122, -84.38505
#> $ id <int> 1, 2
#> $ input_address <chr> "11 Wall St, NY, NY, , , ", "600 Peachtree Street N…
#> $ match_indicator <chr> "Match", "Match"
#> $ match_type <chr> "Exact", "Non_Exact"
#> $ matched_address <chr> "11 WALL ST, NEW YORK, NY, 10005", "600 PEACHTREE S…
#> $ tiger_line_id <int> 59659656, 17343689
#> $ tiger_side <chr> "R", "L"
#> $ state_fips <int> 36, 13
#> $ county_fips <int> 61, 121
#> $ census_tract <int> 700, 1900
#> $ census_block <int> 1008, 2003
As mentioned earlier, the geocode()
function passes addresses in dataframes to the geo()
function for geocoding so we can also directly use geo()
function in a similar way:
salz <- geo('Salzburg, Austria', method = 'osm', full_results = TRUE)
glimpse(salz)
#> Rows: 1
#> Columns: 13
#> $ address <chr> "Salzburg, Austria"
#> $ lat <dbl> 47.79813
#> $ long <dbl> 13.04648
#> $ place_id <int> 206608
#> $ licence <chr> "Data © OpenStreetMap contributors, ODbL 1.0. https://os…
#> $ osm_type <chr> "node"
#> $ osm_id <int> 34964314
#> $ boundingbox <list> [<"47.6381346", "47.9581346", "12.8864806", "13.2064806…
#> $ display_name <chr> "Salzburg, 5020, Österreich"
#> $ class <chr> "place"
#> $ type <chr> "city"
#> $ importance <dbl> 0.6854709
#> $ icon <chr> "https://nominatim.openstreetmap.org/images/mapicons/poi…
Only unique addresses are passed to geocoder services even if your data contains duplicates. Missing/NA and blank addresses are excluded from queries.
duplicate_addrs <- address_single %>%
bind_rows(address_single) %>%
bind_rows(tibble(singlelineaddress = rep(NA, 3)))
duplicates_geocoded <- duplicate_addrs %>%
geocode(singlelineaddress, verbose = T)
#> Number of Unique Addresses: 2
#> Passing 2 addresses to the census batch geocoder
#> Querying API URL: https://geocoding.geo.census.gov/geocoder/locations/addressbatch
#> Passing the following parameters to the API:
#> format : "json"
#> benchmark : "Public_AR_Current"
#> vintage : "Current_Current"
#>
#> Query completed in: 1.4 seconds
knitr::kable(duplicates_geocoded)
singlelineaddress | lat | long |
---|---|---|
11 Wall St, NY, NY | 40.70747 | -74.01122 |
600 Peachtree Street NE, Atlanta, Georgia | 33.77085 | -84.38505 |
11 Wall St, NY, NY | 40.70747 | -74.01122 |
600 Peachtree Street NE, Atlanta, Georgia | 33.77085 | -84.38505 |
NA | NA | NA |
NA | NA | NA |
NA | NA | NA |
As shown above, duplicates will not be removed from your results by default. However, you can return only unique results by using unique_only = TRUE
. Note that passing unique_only = TRUE
to geocode()
will result in the original dataframe format (including column names) to be discarded in favor of the standard field names (ie. “address”, “city”, “state”, etc.).
The limit
argument can be specified to return multiple matches per address if available:
geo_limit <- geo(c('Lima, Peru', 'Cairo, Egypt'), method = 'osm',
limit = 3, full_results = TRUE)
glimpse(geo_limit)
#> Rows: 4
#> Columns: 13
#> $ address <chr> "Lima, Peru", "Lima, Peru", "Lima, Peru", "Cairo, Egypt"
#> $ lat <dbl> -12.06211, -12.20011, -11.99997, 30.04882
#> $ long <dbl> -77.03653, -76.28506, -76.83322, 31.24367
#> $ place_id <int> 286976132, 235673177, 235480647, 236205997
#> $ licence <chr> "Data © OpenStreetMap contributors, ODbL 1.0. https://os…
#> $ osm_type <chr> "relation", "relation", "relation", "relation"
#> $ osm_id <int> 1944756, 1944659, 1944670, 5466227
#> $ boundingbox <list> [<"-12.0797663", "-12.0303496", "-77.0884555", "-77.001…
#> $ display_name <chr> "Lima, Peru", "Lima, Peru", "Lima, Peru", "القاهرة, Egyp…
#> $ class <chr> "boundary", "boundary", "boundary", "place"
#> $ type <chr> "administrative", "administrative", "administrative", "c…
#> $ importance <dbl> 0.8930015, 0.7219761, 0.7034835, 0.7960286
#> $ icon <chr> "https://nominatim.openstreetmap.org/images/mapicons/poi…
To directly specify specific API parameters for a given method
you can use the custom_query
parameter. For example, the Nominatim (OSM) geocoder has a ‘polygon_geojson’ argument that can be used to return GeoJSON geometry content. To pass this parameter you can insert it with a named list using the custom_query
argument:
cairo_geo <- geo('Cairo, Egypt', method = 'osm', full_results = TRUE,
custom_query = list(polygon_geojson = 1), verbose = TRUE)
#> Number of Unique Addresses: 1
#> Querying API URL: http://nominatim.openstreetmap.org/search
#> Passing the following parameters to the API:
#> limit : "1"
#> q : "Cairo, Egypt"
#> polygon_geojson : "1"
#> format : "json"
#>
#> Query completed in: 0.3 seconds
#> Total query time (including sleep): 1 seconds
#>
glimpse(cairo_geo)
#> Rows: 1
#> Columns: 15
#> $ address <chr> "Cairo, Egypt"
#> $ lat <dbl> 30.04882
#> $ long <dbl> 31.24367
#> $ place_id <int> 236205997
#> $ licence <chr> "Data © OpenStreetMap contributors, ODbL 1.0. htt…
#> $ osm_type <chr> "relation"
#> $ osm_id <int> 5466227
#> $ boundingbox <list> [<"29.7483062", "30.3209168", "31.2200331", "31.…
#> $ display_name <chr> "القاهرة, Egypt / مصر"
#> $ class <chr> "place"
#> $ type <chr> "city"
#> $ importance <dbl> 0.7960286
#> $ icon <chr> "https://nominatim.openstreetmap.org/images/mapic…
#> $ geojson.type <chr> "Polygon"
#> $ geojson.coordinates <list> [<array[1 x 119 x 2]>]
To test a query without sending any data to a geocoder service, you can use no_query = TRUE
(NA results are returned).
geo(c('Vancouver, Canada', 'Las Vegas, NV'), no_query = TRUE,
method = 'osm')
#> Number of Unique Addresses: 2
#> Executing single address geocoding...
#>
#> Number of Unique Addresses: 1
#> Querying API URL: http://nominatim.openstreetmap.org/search
#> Passing the following parameters to the API:
#> limit : "1"
#> q : "Vancouver, Canada"
#> format : "json"
#>
#> Number of Unique Addresses: 1
#> Querying API URL: http://nominatim.openstreetmap.org/search
#> Passing the following parameters to the API:
#> limit : "1"
#> q : "Las Vegas, NV"
#> format : "json"
#>
#> # A tibble: 2 x 3
#> address lat long
#> <chr> <lgl> <lgl>
#> 1 Vancouver, Canada NA NA
#> 2 Las Vegas, NV NA NA
Here are some additional usage notes for the geocode()
and geo()
functions:
api_url
argument. Alternatively, the iq_region
and geocodio_v
arguments are helper functions for customizing the API URL.min_time
argument defaults to 1 second for Nominatim (OSM) and Location IQ to abide by usage limits. If you are using a local Nominatim server or have commercial Location IQ plan that has less restrictive usage limits, you can manually set min_time
to 0 or a lower value.mode
argument.You can refer to the api_parameter_reference
dataset to see which which parameters are supported with each geocoder service. This dataset is displayed below.
Refer to ?api_parameter_reference
for more details and links to the API documentation for each geocoder service.