Toggle Main Menu Toggle Search

Open Access padlockePrints

Incorporating compositional heterogeneity into Lie Markov models for phylogenetic inference

Lookup NU author(s): Dr Naomi Hannaford, Dr Sarah Heaps, Dr Tom Nye, Emeritus Professor T. Martin Embley FMedSci FRSORCiD

Downloads


Licence

This is the authors' accepted manuscript of an article that has been published in its final definitive form by Institute of Mathematical Statistics, 2020.

For re-use rights please refer to the publisher's terms and conditions.


Abstract

Phylogenetics uses alignments of molecular sequence data to learn about evolutionary trees. Substitutions in sequences are modelled through a continuous-time Markov process, characterised by an instantaneous rate matrix, which standard models assume is time-reversible and stationary. These assumptions are biologically questionable and induce a likelihood function which is invariant to a tree's root position. This hampers inference because a tree's biological interpretation depends critically on where it is rooted. Relaxing both assumptions, we introduce a model whose likelihood can distinguish between rooted trees. The model is non-stationary, with step changes in the instantaneous rate matrix at each speciation event. Exploiting recent theoretical work, each rate matrix belongs to a non-reversible family of Lie Markov models. These models are closed under matrix multiplication, so our extension offers the conceptually appealing property that a tree and all its sub-trees could have arisen from the same family of non-stationary models.We adopt a Bayesian approach, describe an MCMC algorithm for posterior inference and provide software. The biological insight that our model can provide is illustrated through an analysis in which non-reversible but stationary, and non-stationary but reversible models cannot identify a plausible root.


Publication metadata

Author(s): Hannaford NE, Heaps SE, Nye TMW, Williams TA, Embley TM

Publication type: Article

Publication status: Published

Journal: The Annals of Applied Statistics

Year: 2020

Volume: 14

Issue: 4

Pages: 1964-1983

Online publication date: 19/12/2020

Acceptance date: 02/07/2020

Date deposited: 17/07/2020

ISSN (print): 1932-6157

ISSN (electronic): 1941-7330

Publisher: Institute of Mathematical Statistics

URL: https://doi.org/10.1214/20-AOAS1369

DOI: 10.1214/20-AOAS1369


Altmetrics

Altmetrics provided by Altmetric


Funding

Funder referenceFunder name
EP/L015358/1EPSRC

Share