Toggle Main Menu Toggle Search

Open Access padlockePrints

A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant

Lookup NU author(s): Dr Naomi Hannaford, Dr Sarah Heaps, Dr Tom Nye, Professor Thomas CurtisORCiD, Dr Ben Allen



This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).


© 2022 The Author(s)Proper function of a wastewater treatment plant (WWTP) relies on maintaining a delicate balance between a multitude of competing microorganisms. Gaining a detailed understanding of the complex network of interactions therein is essential to maximising not only current operational efficiencies, but also for the effective design of new treatment technologies. Metagenomics offers an insight into these dynamic systems through the analysis of the microbial DNA sequences present. Unique taxa are deduced through sequence clustering to form operational taxonomic units (OTUs), with per-taxa abundance estimates obtained from corresponding sequence counts. The data in this study comprise weekly OTU counts from an activated sludge (AS) tank of a WWTP along with corresponding measurements of chemical and environmental (CE) covariates. Directly fitting a model to the OTU data is incredibly challenging because of the high dimensionality and sparsity of the observations. The first step is therefore to aggregate the OTUs into twelve microbial communities or “bins” using a seasonal phase-based clustering approach. The mean abundances in the twelve bins are assumed to vary over time according to a multivariate linear regression on the CE covariates. Deviations from the mean are then modelled using a vector autoregressive (VAR) model of order one, which is a linear approximation to the commonly used generalised Lotka-Volterra (gLV) model. Sparsity is assumed in the interactions between microbial communities by carrying out inference in a hierarchical Bayesian framework which uses a shrinkage prior for the autoregressive coefficient matrix of the VAR model. Different shrinkage priors are explored by analysing simulated data sets before selecting the regularised horseshoe prior for the biological application. It is found that ammonia and chemical oxygen demand have a positive relationship with several bins and pH has a positive relationship with one bin. These results are supported by findings in the biological literature. Several negative interactions are also identified. These novel biological findings suggest OTUs in different bins may be competing for resources and that these relationships are complex. Although simpler than a gLV model, the VAR model is still able to offer valuable insight into the microbial dynamics of the WWTP.

Publication metadata

Author(s): Hannaford NE, Heaps SE, Nye TMW, Curtis TP, Allen B, Golightly A, Wilkinson DJ

Publication type: Article

Publication status: Published

Journal: Computational Statistics and Data Analysis

Year: 2023

Volume: 179

Print publication date: 01/03/2023

Online publication date: 17/11/2022

Acceptance date: 17/10/2022

Date deposited: 05/12/2022

ISSN (print): 0167-9473

ISSN (electronic): 1872-7352

Publisher: Elsevier B.V.


DOI: 10.1016/j.csda.2022.107659


Altmetrics provided by Altmetric


Funder referenceFunder name
Centre for Doctoral Training in Cloud Computing for Big Data (grant number EP/L015358/1)
EPSRC (grant number EP/N510129/1) via the Alan Turing Institute project “Streaming data modelling for real-time monitoring and forecasting
Engineering and Physical Sciences Research Council (EPSRC)