taxizedb
- Tools for Working with Taxonomic Databases on your Machine
taxize is a heavily used taxonomic toolbelt package in R - However, it makes web requests for nearly all methods. That is fine for most cases, but when the user has many, many names it is much more efficient to do requests to a local SQL database.
Not all taxonomic databases are publicly available, or possible to mash into a SQLized version. Taxonomic DB’s supported thus far:
Get in touch in the issues with any ideas on new data sources.
This package for each data sources performs the following tasks:
db_download_*
db_load_*
dplyr
SQL backend - src_*
cran version
install.packages("taxizedb")
dev version
devtools::install_github("ropensci/taxizedb")
library("taxizedb")
library("dplyr")
Remember to start your PostgreSQL database for ITIS and ThePlantList and your MySQL database for COL
ITIS
x <- db_download_itis()
db_load_itis(x)
The Plant List (TPL)
x <- db_download_tpl()
db_load_tpl(x)
Catalogue of Life (COL)
x <- db_download_col()
db_load_col(x)
ITIS
src <- src_itis(user = "<user name>", password = "<password>")
TPL
src <- src_tpl()
COL
src <- src_col()
sql_collect(src, "select * from hierarchy limit 5")
#> # A tibble: 5 x 5
#> hierarchy_string tsn parent_tsn level childrencount
#> * <chr> <int> <int> <int> <int>
#> 1 202422 202422 0 0 154282
#> 2 202422-846491 846491 202422 1 2666
#> 3 202422-846491-660046 660046 846491 2 2654
#> 4 202422-846491-660046-846497 846497 660046 3 7
#> 5 202422-846491-660046-846497-846508 846508 846497 4 6
# or pipe the src to sql_collect
src %>% sql_collect("select * from hierarchy limit 5")
#> # A tibble: 5 x 5
#> hierarchy_string tsn parent_tsn level childrencount
#> * <chr> <int> <int> <int> <int>
#> 1 202422 202422 0 0 154282
#> 2 202422-846491 846491 202422 1 2666
#> 3 202422-846491-660046 660046 846491 2 2654
#> 4 202422-846491-660046-846497 846497 660046 3 7
#> 5 202422-846491-660046-846497-846508 846508 846497 4 6
get a tbl
hiers <- src %>% tbl("hierarchy")
#> # Source: table<hierarchy> [?? x 5]
#> # Database: postgres 9.6.0 [sacmac@localhost:5432/ITIS]
#> hierarchy_string tsn parent_tsn level childrencount
#> <chr> <int> <int> <int> <int>
#> 1 202422 202422 0 0 154282
#> 2 202422-846491 846491 202422 1 2666
#> 3 202422-846491-660046 660046 846491 2 2654
#> 4 202422-846491-660046-846497 846497 660046 3 7
#> 5 202422-846491-660046-846497-846508 846508 846497 4 6
#> 6 202422-846491-660046-846497-846508-846553 846553 846508 5 5
#> 7 202422-846491-660046-846497-846508-846553-954935 954935 846553 6 3
#> 8 202422-846491-660046-846497-846508-846553-954935-5549 5549 954935 7 2
#> 9 202422-846491-660046-846497-846508-846553-954935-5549-5550 5550 5549 8 0
#> 10 202422-846491-660046-846497-846508-846553-954936 954936 846553 6 0
#> # ... with more rows
select certain fields
hiers %>% select(tsn, level)
#> # Source: lazy query [?? x 2]
#> # Database: postgres 9.6.0 [sacmac@localhost:5432/ITIS]
#> tsn level
#> <int> <int>
#> 1 202422 0
#> 2 846491 1
#> 3 660046 2
#> 4 846497 3
#> 5 846508 4
#> 6 846553 5
#> 7 954935 6
#> 8 5549 7
#> 9 5550 8
#> 10 954936 6
#> # ... with more rows
taxizedb
in R doing citation(package = 'taxizedb')