An introduction to the phyloregion package

3. Input data

Phylogenies

In R, phylogenetic relationships among species / taxa are often represented as a phylo object implemented in the ape package¹. Phylogenies (often in the Newick or Nexus formats) can be imported into R with the read.tree or read.nexus functions of the ape package¹.

library(ape)
library(Matrix)
library(sp)
data(africa)
sparse_comm <- africa$comm

tree <- africa$phylo
tree <- keep.tip(tree, intersect(tree$tip.label, colnames(sparse_comm)))
par(mar=c(2,2,2,2))
plot(tree, show.tip.label=FALSE)

Figure 2. Phylogenetic tree of the woody plants of southern Africa inferred from DNA barcodes using a maximum likelihood approach and transforming branch lengths to millions of years ago by enforcing a relaxed molecular clock and multiple calibrations.²

Distribution data input

The phyloregion package has functions for manipulating three kinds of distribution data: point records, polygons and raster layers. An overview can be easily obtained with the functions points2comm, polys2comm and raster2comm for point records, polygons, or raster layers, respectively. Depending on the data source, all three functions ultimately provide convenient interfaces to convert the distribution data to a community matrix at varying spatial grains and extents for downstream analyses.

We will play around with these functions in turn.

Function `points2comm`

Here, we will generate random points in geographic space, similar to data obtained from museum records, GBIF, iDigBio, or CIESIN which typically have columns of geographic coordinates for each observation.

s <- readRDS(system.file("nigeria/nigeria.rds", package = "phyloregion"))

set.seed(1)
m <- data.frame(sp::spsample(s, 10000, type = "nonaligned"))
names(m) <- c("lon", "lat")
species <- paste0("sp", sample(1:1000))
m$taxon <- sample(species, size = nrow(m), replace = TRUE)

pt <- points2comm(dat = m, mask = s, res = 0.5, lon = "lon", lat = "lat",
            species = "taxon")
head(pt[[1]][1:5, 1:5])

## 5 x 5 sparse Matrix of class "dgCMatrix"
##      sp1 sp10 sp100 sp1000 sp101
## v100   .    .     .      .     1
## v101   .    .     .      .     .
## v102   .    .     .      .     .
## v103   .    1     .      .     .
## v104   .    .     .      .     .

Function `polys2comm`

This function converts polygons to a community matrix at varying spatial grains and extents for downstream analyses. Polygons can be derived from the IUCN Redlist spatial database (https: //www.iucnredlist.org/resources/spatial-data-download), published monographs or field guides validated by taxonomic experts. To illustrate this function, we will use the function random_species to generate random polygons for five random species over the landscape of Nigeria as follows:

s <- readRDS(system.file("nigeria/nigeria.rds", package="phyloregion"))
sp <- random_species(100, species=5, shp=s)
pol <- polys2comm(dat = sp, species = "species", trace=0)
head(pol[[1]][1:5, 1:5])

## 5 x 5 sparse Matrix of class "dgCMatrix"
##     species1 species2 species3 species4 species5
## v10        .        .        1        .        .
## v12        1        .        1        1        1
## v13        1        .        1        1        1
## v14        1        .        1        1        1
## v15        1        .        1        1        1

Function `raster2comm`

This third function, converts raster layers (often derived from species distribution modeling, such as aquamaps³) to a community matrix.

fdir <- system.file("NGAplants", package="phyloregion")
files <- file.path(fdir, dir(fdir))
ras <- raster2comm(files)
head(ras[[1]])

## 6 x 16 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 16 column names 'Chytranthus_gilletii', 'Commelina_ramulosa', 'Cymbopogon_caesius' ... ]]

##                                     
## v100 . . . 1 . . . . . . . . . . . .
## v101 . . . 1 . . . . . . . . . . . .
## v102 . . . 1 . . . . . . . . . . . .
## v103 . . . 1 . . . . . . . . . . . .
## v104 . . . 1 . . . . . . . . . . . .
## v105 . . . 1 . . . . . . . . . . . .

The object ras above also returns two objects: a community data frame and a shapefile of grid cells with the numbers of species per cell and can be plotted as a heatmap using our nice plot_swatch function as follows:

s <- readRDS(system.file("nigeria/SR_Naija.rds", package = "phyloregion"))
par(mar=rep(0,4))
plot_swatch(s, values = s$SR, k = 20, leg=1, border=NA)

Figure 3. Species richness of plants in Nigeria across equal area grid cells. This is to demonstrate how the function plot_swatch works.

Community data

Community data are commonly stored in a matrix with the sites as rows and species / operational taxonomic units (OTUs) as columns. The elements of the matrix are numeric values indicating the abundance/observations or presence/absence (0/1) of OTUs in different sites. In practice, such a matrix can contain many zero values because species are known to generally have unimodal distributions along environmental gradients,⁴ and storing and analyzing every single element of that matrix can be computationally challenging and expensive.

phyloregion differs from other R packages (e.g. vegan,⁵ picante⁶ or betapart⁷) in that the data are not stored in a (dense) matrix or data.frame but as a sparse matrix making use of the infrastructure provided by the Matrix package.⁸ A sparse matrix is a matrix with a high proportion of zero entries⁹, of which only the non-zero entries are stored and used for downstream analysis.

A sparse matrix representation has two advantages. First the community matrix can be stored in a much memory efficient manner, allowing analysis of larger datasets. Second, for very large datasets spanning thousands of taxa and spatial scales, computations with a sparse matrix are often much faster.
The phyloregion package contains functions to conveniently change between data formats.

library(Matrix) 
data(africa)
sparse_comm <- africa$comm
dense_comm <- as.matrix(sparse_comm) 
object.size(dense_comm)

## 4216952 bytes

object.size(sparse_comm)

## 885952 bytes

Here, the data set in the dense matrix representation consumes roughly five times more memory than the sparse representation.

4. Analysis

Alpha diversity

We demonstrate the utility of phyloregion in mapping standard conservation metrics of species richness, weighted endemism (weighted_endemism) and threat (map_traits) as well as fast computations of phylodiversity measures such as phylogenetic diversity (PD), phylogenetic endemism (phylo_endemism), and evolutionary distinctiveness and global endangerment (EDGE). The major advantage of these functions compared to available tools e.g. biodiverse,¹⁰ is the ability to utilize sparse matrix that speeds up the analyses without exhausting computer memories, making it ideal for handling any data from small local scales to large regional and global scales.

Function `weighted_endemism`

Weighted endemism is species richness inversely weighted by species ranges¹¹,¹²,¹³.

library(raster)

## 
## Attaching package: 'raster'

## The following objects are masked from 'package:ape':
## 
##     rotate, zoom

data(africa)
Endm <- weighted_endemism(africa$comm)
head(Endm)

##    v3635    v3636    v3637    v3638    v3639    v3640 
## 1.770041 2.637894 1.825862 1.270093 1.043782 0.259324

m <- merge(africa$polys, data.frame(grids=names(Endm), WE=Endm), by="grids")
m <- m[!is.na(m@data$WE),]

par(mar=rep(0,4))
plot_swatch(m, values = m$WE, k=20, leg = 3, border = NA)

Figure 4. Geographic distributions of weighted endemism for woody plants of southern Africa.

Function `PD` – phylogenetic diversity

Phylogenetic diversity (PD) represents the length of evolutionary pathways that connects a given set of taxa on a rooted phylogenetic tree.¹⁴ This metric is often characterised in units of time (millions of years, for dated phylogenies). We will map PD for plants of southern Africa.

data(africa)
comm <- africa$comm
tree <- africa$phylo
poly <- africa$polys

mypd <- PD(comm, tree)
head(mypd)

##    v3635    v3636    v3637    v3638    v3639    v3640 
## 4226.216 5372.009 4377.735 3783.992 3260.111 1032.685

M <- merge(poly, data.frame(grids=names(mypd), pd=mypd), by="grids")
M <- M[!is.na(M@data$pd),]
head(M)

##   grids       pd
## 1 v3635 4226.216
## 2 v3636 5372.009
## 3 v3637 4377.735
## 4 v3638 3783.992
## 5 v3639 3260.111
## 6 v3640 1032.685

par(mar=rep(0,4))
plot_swatch(M, values = M$pd, k=20, border=NA, leg=3)

Figure 5. Geographic distributions of phylogenetic diversity for woody plants of southern Africa.

Function `phylo_endemism` – phylogenetic endemism

Phylogenetic endemism is not influenced by variations in taxonomic opinion because it measures endemism based on the relatedness of species before weighting it by their range sizes¹⁵,¹³.

library(raster)
data(africa)
comm <- africa$comm
tree <- africa$phylo
poly <- africa$polys

pe <- phylo_endemism(comm, tree)
head(pe)

##     v3635     v3636     v3637     v3638     v3639     v3640 
## 32.536530 45.262625 35.004944 27.603721 23.183947  6.439589

mx <- merge(poly, data.frame(grids=names(pe), pe=pe), by="grids")
mx <- mx[!is.na(mx@data$pe),]
head(mx)

##   grids        pe
## 1 v3635 32.536530
## 2 v3636 45.262625
## 3 v3637 35.004944
## 4 v3638 27.603721
## 5 v3639 23.183947
## 6 v3640  6.439589

par(mar=rep(0,4))
plot_swatch(mx, values = mx$pe, k=20, border=NA, leg=3)

Figure 6. Geographic distributions of phylogenetic endemism for woody plants of southern Africa.

Function `EDGE` – Evolutionary Distinctiveness and Global Endangerment

This function calculates EDGE by combining evolutionary distinctiveness (ED; i.e., phylogenetic isolation of a species) with global endangerment (GE) status as defined by the International Union for Conservation of Nature (IUCN).

data(africa)
comm <- africa$comm
threat <- africa$IUCN
tree <- africa$phylo
poly <- africa$polys

x <- EDGE(threat, tree, Redlist = "IUCN", species="Species")
head(x)

##        Abutilon_angulatum_OM1934    Abutilon_sonneratianum_LTM034 
##                         2.903551                         2.903551 
## Acalypha_glabrata_glabrata_OM441  Acalypha_glabrata_pilosa_OM1979 
##                         2.480505                         2.480505 
##       Acalypha_sonderiana_OM2163  Acokanthera_oblongifolia_OM2240 
##                         2.914481                         2.211561

y <- map_trait(comm, x, FUN = sd, shp=poly)

par(mar=rep(0,4))
plot_swatch(y, y$traits, k=20, border=NA, leg=3)

Figure 7. Geographic distributions of evolutionary distinctiveness and global endangerment for woody plants of southern Africa.

Analysis of beta diversity (phylogenetic and non-phylogenetic)

The three commonly used methods for quantifying -diversity, the variation in species composition among sites, – Simpson, Sorenson and Jaccard¹⁶. The phyloregion’s functions beta_diss and phylobeta compute efficiently pairwise dissimilarities matrices for large sparse community matrices and phylogenetic trees for taxonomic and phylogenetic turnover, respectively. The results are stored as distance objects for subsequent analyses.

Phylogenetic beta diversity

phyloregion offers a fast means of computing phylogenetic beta diversity, the turnover of branch lengths among sites, making use of and improving on the infrastructure provided by the betapart package⁷ allowing a sparse community matrix as input.

data(africa)
sparse_comm <- africa$comm

tree <- africa$phylo
tree <- keep.tip(tree, intersect(tree$tip.label, colnames(sparse_comm)))
pb <- phylobeta(sparse_comm, tree)

y <- phyloregion(pb[[1]], shp=africa$polys)

plot_NMDS(y, cex=3)
text_NMDS(y)

par(mar=rep(0,4))
plot(y, palette="NMDS")

Session Information

sessionInfo()

## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] raster_3.3-7      sp_1.4-2          Matrix_1.2-18     ape_5.4          
## [5] phyloregion_1.0.4
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.5       highr_0.8        compiler_4.0.2   tools_4.0.2     
##  [5] magic_1.5-9      betapart_1.5.1   digest_0.6.25    evaluate_0.14   
##  [9] nlme_3.1-148     lattice_0.20-41  mgcv_1.8-31      pkgconfig_2.0.3 
## [13] rlang_0.4.6      fastmatch_1.1-0  igraph_1.2.5     rgdal_1.5-12    
## [17] yaml_2.2.1       parallel_4.0.2   xfun_0.15        stringr_1.4.0   
## [21] knitr_1.29       cluster_2.1.0    rgeos_0.5-3      rcdd_1.2-2      
## [25] grid_4.0.2       rmarkdown_2.3    phangorn_2.5.5   magrittr_1.5    
## [29] codetools_0.2-16 htmltools_0.5.0  MASS_7.3-51.6    splines_4.0.2   
## [33] abind_1.4-5      permute_0.9-5    picante_1.8.2    colorspace_1.4-1
## [37] quadprog_1.5-8   stringi_1.4.6    geometry_0.4.5   vegan_2.5-6

REFERENCES

1. Paradis, E. & Schliep, K. Ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2018).

2. Daru, B. H., Bank, M. van der & Davies, T. J. Spatial incongruence among hotspots and complementary areas of tree diversity in southern africa. Diversity and Distributions 21, 769–780 (2015).

3. Kaschner, K. et al. AquaMaps: Predicted range maps for aquatic species. World wide web electronic publication, wwwaquamapsorg, Version 10, 2008 (2008).

4. Ter Braak, C. J. F. & Prentice, I. A theory of gradient analysis. in Advances in ecological research: Classic papers vol. 34 235–282 (Academic Press, 2004).

5. Oksanen, J. et al. Vegan: Community ecology package. (2019).

6. Kembel, S. W. et al. Picante: R tools for integrating phylogenies and ecology. Bioinformatics 26, 1463–1464 (2010).

7. Baselga, A. & Orme, C. D. L. Betapart: An r package for the study of beta diversity. Methods in Ecology and Evolution 3, 808–812 (2012).

8. Bates, D. & Maechler, M. Matrix: Sparse and dense matrix classes and methods. (2019).

9. Duff, I. S. A survey of sparse matrix research. Proceedings of the IEEE 65, 500–535 (1977).

10. Laffan, S. W., Lubarsky, E. & Rosauer, D. F. Biodiverse, a tool for the spatial analysis of biological and related diversity. Ecography 33, 643–647 (2010).

11. Crisp, M. D., Laffan, S., Linder, H. P. & Monro, A. Endemism in the australian flora. Journal of Biogeography 28, 183–198 (2001).

12. Laffan, S. W. & Crisp, M. D. Assessing endemism at multiple spatial scales, with an example from the australian vascular flora. Journal of Biogeography 30, 511–520 (2003).

13. Daru, B. H., Farooq, H., Antonelli, A. & Faurby, S. Endemism patterns are scale dependent. Nature Communications 11, 2115 (2020).

14. Faith, D. P. Conservation evaluation and phylogenetic diversity. Biological Conservation 61, 1–10 (1992).

15. Rosauer, D., Laffan, S. W., Crisp, M. D., Donnellan, S. C. & Cook, L. G. Phylogenetic endemism: A new approach for identifying geographical concentrations of evolutionary history. Molecular Ecology 18, 4061–4072 (2009).

16. Laffan, S. W. et al. Range-weighted metrics of species and phylogenetic turnover can better resolve biogeographic transition zones. Methods in Ecology and Evolution 7, 580–588 (2016).

17. Daru, B. H., Karunarathne, P. & Schliep, K. Phyloregion: R package for biogeographic regionalization and macroecology. bioRxiv (2020) doi:10.1101/2020.02.12.945691.

18. Daru, B. H., Elliott, T. L., Park, D. S. & Davies, T. J. Understanding the processes underpinning patterns of phylogenetic regionalization. Trends in ecology & evolution 32, 845–860 (2017).

An introduction to the phyloregion package

Barnabas H. Daru, Piyal Karunarathne & Klaus Schliep

July 19, 2020

1. Installation

2. Overview and general workflow of `phyloregion`

3. Input data

Phylogenies

Distribution data input

Function `points2comm`

Function `polys2comm`

Function `raster2comm`

Community data

4. Analysis

Alpha diversity

Function `weighted_endemism`

Function `PD` – phylogenetic diversity

Function `phylo_endemism` – phylogenetic endemism

Function `EDGE` – Evolutionary Distinctiveness and Global Endangerment

Analysis of beta diversity (phylogenetic and non-phylogenetic)

Phylogenetic beta diversity

Session Information

REFERENCES

An introduction to the phyloregion package

Barnabas H. Daru, Piyal Karunarathne & Klaus Schliep

July 19, 2020

1. Installation

2. Overview and general workflow of phyloregion

3. Input data

Phylogenies

Distribution data input

Function points2comm

Function polys2comm

Function raster2comm

Community data

4. Analysis

Alpha diversity

Function weighted_endemism

Function PD – phylogenetic diversity

Function phylo_endemism – phylogenetic endemism

Function EDGE – Evolutionary Distinctiveness and Global Endangerment

Analysis of beta diversity (phylogenetic and non-phylogenetic)

Phylogenetic beta diversity

Session Information

REFERENCES

2. Overview and general workflow of `phyloregion`

Function `points2comm`

Function `polys2comm`

Function `raster2comm`

Function `weighted_endemism`

Function `PD` – phylogenetic diversity

Function `phylo_endemism` – phylogenetic endemism

Function `EDGE` – Evolutionary Distinctiveness and Global Endangerment