Modelling Computational Resources for Next Generation Sequencing Bioinformatics Analysis of 16S rRNA Samples

Wade, MJ; Curtis, TP; Davenport, RJ

Modelling Computational Resources for Next Generation Sequencing Bioinformatics Analysis of 16S rRNA Samples

Lookup NU author(s): Dr Matthew Wade ORCiD, Professor Thomas Curtis ORCiD, Professor Russell Davenport ORCiD

Downloads

Full text for this publication is not currently held within this repository. Alternative links are provided below where available.

Abstract

In the rapidly evolving domain of next generation sequencing andbioinformatics analysis, data generation is one aspect that is increasing at aconcomitant rate. The burden associated with processing large amounts ofsequencing data has emphasised the need to allocate sufficient computingresources to complete analyses in the shortest possible time with manageableand predictable costs. A novel method for predicting time to completion for apopular bioinformatics software (QIIME), was developed using key variablescharacteristic of the input data assumed to impact processing time. MultipleLinear Regression models were developed to determine run time for two denoisingalgorithms and a general bioinformatics pipeline. The models were able toaccurately predict clock time for denoising sequences from a naturallyassembled community dataset, but not an artificial community. Speedup andefficiency tests for AmpliconNoise also highlighted that caution was neededwhen allocating resources for parallel processing of data. Accurate modellingof computational processing time using easily measurable predictors can assistNGS analysts in determining resource requirements for bioinformatics softwareand pipelines. Whilst demonstrated on a specific group of scripts, themethodology can be extended to encompass other packages running on multiplearchitectures, either in parallel or sequentially.

Publication metadata

Author(s): Wade MJ, Curtis TP, Davenport RJ

Publication type: Online Publication

Publication status: Published

Series Title:

Year: 2015

Access Year: 2021

Acceptance date: 10/03/2015

Publisher: arXiv

Place Published: Ithaca NY

Access Date: 13 May

Type of Medium: E-print

URL: http://arxiv.org/abs/1503.02974

ePrints

Modelling Computational Resources for Next Generation Sequencing Bioinformatics Analysis of 16S rRNA Samples

Downloads

Abstract

Publication metadata

Share