Browse by author
Lookup NU author(s): Dr Matthew Wade,
Professor Thomas Curtis,
Professor Russell Davenport
Full text for this publication is not currently held within this repository. Alternative links are provided below where available.
In the rapidly evolving domain of next generation sequencing andbioinformatics analysis, data generation is one aspect that is increasing at aconcomitant rate. The burden associated with processing large amounts ofsequencing data has emphasised the need to allocate sufficient computingresources to complete analyses in the shortest possible time with manageableand predictable costs. A novel method for predicting time to completion for apopular bioinformatics software (QIIME), was developed using key variablescharacteristic of the input data assumed to impact processing time. MultipleLinear Regression models were developed to determine run time for two denoisingalgorithms and a general bioinformatics pipeline. The models were able toaccurately predict clock time for denoising sequences from a naturallyassembled community dataset, but not an artificial community. Speedup andefficiency tests for AmpliconNoise also highlighted that caution was neededwhen allocating resources for parallel processing of data. Accurate modellingof computational processing time using easily measurable predictors can assistNGS analysts in determining resource requirements for bioinformatics softwareand pipelines. Whilst demonstrated on a specific group of scripts, themethodology can be extended to encompass other packages running on multiplearchitectures, either in parallel or sequentially.
Author(s): Wade MJ, Curtis TP, Davenport RJ
Publication type: Online Publication
Publication status: Published
Access Year: 2021
Acceptance date: 10/03/2015
Place Published: Ithaca NY
Access Date: 13 May
Type of Medium: E-print