taxize
allows users to search over many taxonomic data sources for species names (scientific and common) and download up and downstream taxonomic hierarchical information - among other things.
The taxize book: https://taxize.dev
Package documentation: https://docs.ropensci.org/taxize/
Souce | Function prefix | API Docs | API key |
---|---|---|---|
Encylopedia of Life |
eol
|
link | none |
Taxonomic Name Resolution Service |
tnrs
|
none | none |
Integrated Taxonomic Information Service |
itis
|
link | none |
Global Names Resolver |
gnr
|
link | none |
Global Names Index |
gni
|
link | none |
IUCN Red List |
iucn
|
link | link |
Tropicos |
tp
|
link | link |
Theplantlist dot org |
tpl
|
** | none |
National Center for Biotechnology Information |
ncbi
|
none | none |
CANADENSYS Vascan name search API |
vascan
|
link | none |
International Plant Names Index (IPNI) |
ipni
|
none | none |
Barcode of Life Data Systems (BOLD) |
bold
|
link | none |
National Biodiversity Network (UK) |
nbn
|
link | none |
Index Fungorum |
fg
|
none | none |
EU BON |
eubon
|
link | none |
Index of Names (ION) |
ion
|
link | none |
Open Tree of Life (TOL) |
tol
|
link | none |
World Register of Marine Species (WoRMS) |
worms
|
link | none |
NatureServe |
natserv
|
link | link |
Wikipedia |
wiki
|
link | none |
Kew’s Plants of the World |
pow
|
none | none |
**: There are none! We suggest using TPL
and TPLck
functions in the taxonstand package. We provide two functions to get bulk data: tpl_families
and tpl_get
.
***: There are none! The function scrapes the web directly.
See the datasources tag in the issue tracker
Windows users install Rtools first.
Alot of taxize
revolves around taxonomic identifiers. Because, as you know, names can be a mess (misspelled, synonyms, etc.), it’s better to get an identifier that a particular data source knows about, then we can move forth acquiring more fun taxonomic data.
uids <- get_uid(c("Chironomus riparius", "Chaetopteryx"))
#> ══ 2 queries ═══════════════
#> ✔ Found: Chironomus+riparius
#> ✔ Found: Chaetopteryx
#> ══ Results ═════════════════
#>
#> ● Total: 2
#> ● Found: 2
#> ● Not Found: 0
Classifications - think of a species, then all the taxonomic ranks up from that species, like genus, family, order, class, kingdom.
out <- classification(uids)
lapply(out, head)
#> $`315576`
#> name rank id
#> 1 cellular organisms no rank 131567
#> 2 Eukaryota superkingdom 2759
#> 3 Opisthokonta no rank 33154
#> 4 Metazoa kingdom 33208
#> 5 Eumetazoa no rank 6072
#> 6 Bilateria no rank 33213
#>
#> $`492549`
#> name rank id
#> 1 cellular organisms no rank 131567
#> 2 Eukaryota superkingdom 2759
#> 3 Opisthokonta no rank 33154
#> 4 Metazoa kingdom 33208
#> 5 Eumetazoa no rank 6072
#> 6 Bilateria no rank 33213
Get immediate children of Salmo. In this case, Salmo is a genus, so this gives species within the genus.
children("Salmo", db = 'ncbi')
#> $Salmo
#> childtaxa_id childtaxa_name childtaxa_rank
#> 1 2705433 Salmo ghigii species
#> 2 2304090 Salmo abanticus species
#> 3 2126688 Salmo ciscaucasicus species
#> 4 1509524 Salmo marmoratus x Salmo trutta species
#> 5 1484545 Salmo cf. cenerinus BOLD:AAB3872 species
#> 6 1483130 Salmo zrmanjaensis species
#> 7 1483129 Salmo visovacensis species
#> 8 1483128 Salmo rhodanensis species
#> 9 1483127 Salmo pellegrini species
#> 10 1483126 Salmo opimus species
#> 11 1483125 Salmo macedonicus species
#> 12 1483124 Salmo lourosensis species
#> 13 1483123 Salmo labecula species
#> 14 1483122 Salmo farioides species
#> 15 1483121 Salmo chilo species
#> 16 1483120 Salmo cettii species
#> 17 1483119 Salmo cenerinus species
#> 18 1483118 Salmo aphelios species
#> 19 1483117 Salmo akairos species
#> 20 1201173 Salmo peristericus species
#> 21 1035833 Salmo ischchan species
#> 22 700588 Salmo labrax species
#> 23 602068 Salmo caspius species
#> 24 237411 Salmo obtusirostris species
#> 25 235141 Salmo platycephalus species
#> 26 234793 Salmo letnica species
#> 27 62065 Salmo ohridanus species
#> 28 33518 Salmo marmoratus species
#> 29 33516 Salmo fibreni species
#> 30 33515 Salmo carpio species
#> 31 8032 Salmo trutta species
#> 32 8030 Salmo salar species
#>
#> attr(,"class")
#> [1] "children"
#> attr(,"db")
#> [1] "ncbi"
Get all species in the genus Apis
downstream(as.tsn(154395), db = 'itis', downto = 'species', mesages = FALSE)
#> $`154395`
#> tsn parentname parenttsn rankname taxonname rankid
#> 1 154396 Apis 154395 species Apis mellifera 220
#> 2 763550 Apis 154395 species Apis andreniformis 220
#> 3 763551 Apis 154395 species Apis cerana 220
#> 4 763552 Apis 154395 species Apis dorsata 220
#> 5 763553 Apis 154395 species Apis florea 220
#> 6 763554 Apis 154395 species Apis koschevnikovi 220
#> 7 763555 Apis 154395 species Apis nigrocincta 220
#>
#> attr(,"class")
#> [1] "downstream"
#> attr(,"db")
#> [1] "itis"
Get all genera up from the species Pinus contorta (this includes the genus of the species, and its co-genera within the same family).
upstream("Pinus contorta", db = 'itis', upto = 'Genus', mesages = FALSE)
#> ══ 1 queries ═══════════════
#> ✔ Found: Pinus contorta
#> ══ Results ═════════════════
#>
#> ● Total: 1
#> ● Found: 1
#> ● Not Found: 0
#> $`Pinus contorta`
#> tsn parentname parenttsn rankname taxonname rankid
#> 1 18031 Pinaceae 18030 genus Abies 180
#> 2 18033 Pinaceae 18030 genus Picea 180
#> 3 18035 Pinaceae 18030 genus Pinus 180
#> 4 183396 Pinaceae 18030 genus Tsuga 180
#> 5 183405 Pinaceae 18030 genus Cedrus 180
#> 6 183409 Pinaceae 18030 genus Larix 180
#> 7 183418 Pinaceae 18030 genus Pseudotsuga 180
#> 8 822529 Pinaceae 18030 genus Keteleeria 180
#> 9 822530 Pinaceae 18030 genus Pseudolarix 180
#>
#> attr(,"class")
#> [1] "upstream"
#> attr(,"db")
#> [1] "itis"
synonyms("Acer drummondii", db="itis")
#> ══ 1 queries ═══════════════
#> ✔ Found: Acer drummondii
#> ══ Results ═════════════════
#>
#> ● Total: 1
#> ● Found: 1
#> ● Not Found: 0
#> $`Acer drummondii`
#> sub_tsn acc_name acc_tsn acc_author
#> 1 183671 Acer rubrum var. drummondii 526853 (Hook. & Arn. ex Nutt.) Sarg.
#> 2 183671 Acer rubrum var. drummondii 526853 (Hook. & Arn. ex Nutt.) Sarg.
#> 3 183671 Acer rubrum var. drummondii 526853 (Hook. & Arn. ex Nutt.) Sarg.
#> syn_author syn_name syn_tsn
#> 1 (Hook. & Arn. ex Nutt.) E. Murray Acer rubrum ssp. drummondii 28730
#> 2 Hook. & Arn. ex Nutt. Acer drummondii 183671
#> 3 (Hook. & Arn. ex Nutt.) Small Rufacer drummondii 183672
#>
#> attr(,"class")
#> [1] "synonyms"
#> attr(,"db")
#> [1] "itis"
get_ids("Salvelinus fontinalis", db = c('itis', 'ncbi'), mesages = FALSE)
#> ══ db: itis ═════════════════
#> ══ 1 queries ═══════════════
#> ✔ Found: Salvelinus fontinalis
#> ══ Results ═════════════════
#>
#> ● Total: 1
#> ● Found: 1
#> ● Not Found: 0
#> ══ db: ncbi ═════════════════
#> ══ 1 queries ═══════════════
#> ✔ Found: Salvelinus+fontinalis
#> ══ Results ═════════════════
#>
#> ● Total: 1
#> ● Found: 1
#> ● Not Found: 0
#> $itis
#> Salvelinus fontinalis
#> "162003"
#> attr(,"class")
#> [1] "tsn"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "https://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=162003"
#>
#> $ncbi
#> Salvelinus fontinalis
#> "8038"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "https://www.ncbi.nlm.nih.gov/taxonomy/8038"
#>
#> attr(,"class")
#> [1] "ids"
You can limit to certain rows when getting ids in any get_*()
functions
get_ids("Poa annua", db = "gbif", rows=1)
#> ══ db: gbif ═════════════════
#> ══ 1 queries ═══════════════
#> ✔ Found: Poa annua
#> ══ Results ═════════════════
#>
#> ● Total: 1
#> ● Found: 1
#> ● Not Found: 0
#> $gbif
#> Poa annua
#> "2704179"
#> attr(,"class")
#> [1] "gbifid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] TRUE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "https://www.gbif.org/species/2704179"
#>
#> attr(,"class")
#> [1] "ids"
Furthermore, you can just back all ids if that’s your jam with the get_*_()
functions (all get_*()
functions with additional _
underscore at end of function name)
get_ids_(c("Chironomus riparius", "Pinus contorta"), db = 'nbn', rows=1:3)
#> ══ db: nbn ══════════════════
#> $nbn
#> $nbn$`Chironomus riparius`
#> guid scientificName rank taxonomicStatus
#> 1 NBNSYS0000027573 Chironomus riparius species accepted
#> 2 NBNSYS0000007169 Elaphrus riparius species accepted
#> 3 NBNSYS0000023573 Quedius riparius species accepted
#>
#> $nbn$`Pinus contorta`
#> guid scientificName rank taxonomicStatus
#> 1 NBNSYS0000004786 Pinus contorta species accepted
#> 2 NHMSYS0000494848 Pinus contorta var. contorta variety accepted
#> 3 NHMSYS0000494858 Pinus contorta var. murrayana variety accepted
#>
#>
#> attr(,"class")
#> [1] "ids"
sci2comm('Helianthus annuus', db = 'itis')
#> ══ 1 queries ═══════════════
#> ✔ Found: Helianthus annuus
#> ══ Results ═════════════════
#>
#> ● Total: 1
#> ● Found: 1
#> ● Not Found: 0
#> $`Helianthus annuus`
#> [1] "common sunflower" "sunflower" "wild sunflower" "annual sunflower"
comm2sci("black bear", db = "itis")
#> $`black bear`
#> [1] "Ursus americanus luteolus" "Ursus americanus"
#> [3] "Ursus americanus" "Ursus americanus americanus"
#> [5] "Chiropotes satanas" "Ursus thibetanus"
#> [7] "Ursus thibetanus"
spp <- c("Sus scrofa", "Homo sapiens", "Nycticebus coucang")
lowest_common(spp, db = "ncbi")
#> ══ 3 queries ═══════════════
#> ✔ Found: Sus+scrofa
#> ✔ Found: Homo+sapiens
#> ✔ Found: Nycticebus+coucang
#> ══ Results ═════════════════
#>
#> ● Total: 3
#> ● Found: 3
#> ● Not Found: 0
#> ══ 3 queries ═══════════════
#> ✔ Found: Sus+scrofa
#> ✔ Found: Homo+sapiens
#> ✔ Found: Nycticebus+coucang
#> ══ Results ═════════════════
#>
#> ● Total: 3
#> ● Found: 3
#> ● Not Found: 0
#> name rank id
#> 21 Boreoeutheria below-class 1437010
numeric
to uid
as.uid(315567)
#> [1] "315567"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "https://www.ncbi.nlm.nih.gov/taxonomy/315567"
list
to uid
as.uid(list("315567", "3339", "9696"))
#> [1] "315567" "3339" "9696"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found" "found" "found"
#> attr(,"multiple_matches")
#> [1] FALSE FALSE FALSE
#> attr(,"pattern_match")
#> [1] FALSE FALSE FALSE
#> attr(,"uri")
#> [1] "https://www.ncbi.nlm.nih.gov/taxonomy/315567"
#> [2] "https://www.ncbi.nlm.nih.gov/taxonomy/3339"
#> [3] "https://www.ncbi.nlm.nih.gov/taxonomy/9696"
out <- as.uid(c(315567, 3339, 9696))
(res <- data.frame(out))
#> ids class match multiple_matches pattern_match
#> 1 315567 uid found FALSE FALSE
#> 2 3339 uid found FALSE FALSE
#> 3 9696 uid found FALSE FALSE
#> uri
#> 1 https://www.ncbi.nlm.nih.gov/taxonomy/315567
#> 2 https://www.ncbi.nlm.nih.gov/taxonomy/3339
#> 3 https://www.ncbi.nlm.nih.gov/taxonomy/9696
See our CONTRIBUTING document.
Check out our milestones to see what we plan to get done for each version.
taxize
in R doing citation(package = 'taxize')