Profile Parsimony (Faith & Trueman, 2001) finds the tree that is most faithful to the information contained within a given dataset. It is the ‘exact solution’ that implied weights parsimony approximates.
Profile Parsimony is currently implemented in ‘TreeSearch’ only for binary characters with no ambiguous tokens.
A companion vignette gives details on installing the package and getting up and running.
Once installed, load the inapplicable package into R using
library('TreeSearch')
In order to reproduce the random elements of this document, set a random seed:
# Set a random seed so that random functions in this document are reproducible
suppressWarnings(RNGversion("3.5.0")) # Until we can require R3.6.0
set.seed(888)
Here’s an example of using the package to conduct tree search with profile parsimony. You can load your own dataset, but for this example, we’ll use a simulated dataset that comes bundled with the TreeSearch
package.
data(congreveLamsdellMatrices)
congreveLamsdellMatrices[[10]]
my.data <- phangorn::phyDat(my.data, type = 'USER', levels = c(1, 2)) my.phyDat <-
We then need to prepare our dataset for Profile Parsimony by calculating the information loss implied by each additional step in each character. Ideally we’d pick a high value of precision (> 1e+06): this takes a while, but only needs doing once for each dataset of a given size.
suppressWarnings(PrepareDataProfile(my.phyDat, precision = 4e+04)) my.prepdata <-
To start analysis, we need to load a starting tree. We can do this at random:
TreeTools::RandomTree(my.phyDat, root = TRUE) tree <-
Or using a neighbour joining method, to start at a reasonably good tree:
TreeTools::NJTree(my.phyDat)
tree <-par(mar = rep(0.25, 4), cex = 0.75) # make plot easier to read
plot(tree)
Let’s calculate this random tree’s parsimony score, then search for a better tree:
ProfileScore(tree, my.prepdata)
## [1] -279.5769
ProfileTreeSearch(tree, my.prepdata, EdgeSwapper = RootedTBRSwap) better.tree <-
## - Performing tree search. Initial score: -279.576861720899
## - Final score -279.576861720899 found 0 times after 100 rearrangements.
The parsimony ratchet (Nixon, 1999) is better at finding globally optimal trees. ProfileRatchet
is a convenient wrapper for the underlying function Ratchet
.
# Longwinded approach:
Ratchet(better.tree, my.prepdata, searchHits = 10,
better.tree <-searchIter = 100, ratchIter = 5,
swappers = list(RootedTBRSwap, RootedSPRSwap,
RootedNNISwap),InitializeData = ProfileInitMorphy,
CleanUpData = ProfileDestroyMorphy,
TreeScorer = ProfileScoreMorphy,
Bootstrapper = ProfileBootstrap)
# Equivalent, but less typing!
list(RootedTBRSwap, RootedSPRSwap, RootedNNISwap)
RootedSwappers <- ProfileRatchet(better.tree, my.prepdata,
better.tree <-swappers=RootedSwappers,
searchHits=10, searchIter=100, ratchIter=5)
Let’s see the resultant tree, and its score
attr(better.tree, 'score')
## [1] -309.1615
par(mar = rep(0.25, 4), cex = 0.75) # make plot easier to read
plot(better.tree)
The default parameters may not be enough to find the optimal tree; type ?Ratchet
to view all search parameters.
In parsimony search, it is good practice to consider trees that are slightly suboptimal (Smith, 2019).
Here, we’ll take a consensus that includes all trees that are suboptimal by up to 1.5 bits. To sample this region of tree space well, the trick is to use large values of ratchHits
and ratchIter
, and small values of searchHits
and searchiter
, so that many runs don’t quite hit the optimal tree. In a serious study, you would want to sample many more than the 25 Ratchet hits (ratchHits
) we’ll settle for here, probably using many more Ratchet iterations.
ProfileRatchet(better.tree, my.prepdata,
suboptimals <-swappers = list(RootedTBRSwap),
returnAll = TRUE, suboptimal = 5,
ratchHits = 25, ratchIter = 500,
bootstrapHits = 15, bootstrapIter = 450,
searchHits = 10, searchIter = 100)
The consensus of these slightly suboptimal trees provides a less resolved, but typically more reliable, summary of the signal with the phylogenetic dataset (Smith, 2019):
par(mar=rep(0.25, 4), cex=0.75)
plot(my.consensus <- ape::consensus(suboptimals))
Faith, D. P., & Trueman, J. W. H. (2001). Towards an inclusive philosophy for phylogenetic inference. Systematic Biology, 50(3), 331–350. doi:10.1080/10635150118627
Nixon, K. C. (1999). The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics, 15(4), 407–414. doi:10.1111/j.1096-0031.1999.tb00277.x
Smith, M. R. (2019). Bayesian and parsimony approaches reconstruct informative trees from simulated morphological datasets. Biology Letters, 15(2), 20180632. doi:10.1098/rsbl.2018.0632