This package provides programmatic access to the Chromosome Counts Database (CCDB) API. The CCDB is a community resource for plant chromosome numbers. For more details on the database, see the associated publication by Rice et al. in New Phytologist. This package is maintained by Matthew Pennell (who is not affiliated with the CCDB group).
The package can be installed directly from CRAN
install.packages("chromer")
or, for the latest version, you can install directly from GitHub using devtools
## install.packages("devtools")
devtools::install_github("ropensci/chromer")
It is possible to query the database in three ways: by species
, genus
, family
, and majorGroup
. For example, if we are interested in the genus Solanum (Solanaceae), which contains the potato, tomato, and eggplant, we would query the database as follows
library(chromer)
sol_gen <- chrom_counts(taxa="Solanum", rank="genus")
head(sol_gen)
nrow(sol_gen)
There are over 3000 records for Solanum alone! If we are interested in a particular species, such as tomatoes, we can search for the species directly.
sol_tom <- chrom_counts(taxa="Solanum_lycopersicum", rank="species")
head(sol_tom)
Note that taxa="Solanum lycopersicum"
(including a space between the genus and species name) will also work here.
If we wanted to get data on the whole family, we simply type
sol_fam <- chrom_counts(taxa="Solanaceae", rank="family")
head(sol_fam)
Or, expand the scope much further and get all Angiosperms (this will take some time)
ang <- chrom_counts(taxa="Angiosperms", rank="majorGroup")
head(ang)
There are two options for returning data. The first (default) is to only return the species name information (including taxonomic resolutions made by Taxonome) and the haploid and diploid counts. Setting the argument full=TRUE
sol_gen_full <- chrom_counts("Solanum", rank="genus", full=TRUE)
returns a bunch more info on the records.
head(sol_gen_full)
The Chromosome Counts Database is a fantastic resource but as it is a compilation of a large number of resources and studies, the data is somewhat messy and challenging to work with. We have written a little function that does some post-processing to make it easier to handle. The function summarize_counts()
does the following:
Aggregates multiple records for the same species
Infers the gametophytic (haploid) number of chromosomes when only the sporophytic (diploid) counts are available.
Parses the records for numeric values. In some cases chromosomal counts also include text characters (e.g., #-#; c.#; #,#,#; and many other varieties). As there are many possible ways that chromosomal counts may be listed in the database, the function takes the naive approach and simply searches the strings for integers. In most cases, this is sensible but may produces weird results on occasion. Some degree of manual curation will probably be necessary and the output of the summary should be used with caution in downstream analyses.
To summarize and clean the count data obtained from chrom_counts()
simply use
summarize_counts(sol_gen)