Toggle Main Menu Toggle Search

Open Access padlockePrints

Scalable and Efficient Whole-exome Data Processing Using Workflows on the Cloud

Lookup NU author(s): Dr Jacek CalaORCiD, Eyad Marei, Dr Yaobo Xu, Professor Paolo MissierORCiD



This is the final published version of a report that has been published in its final definitive form by School of Computing Science, University of Newcastle upon Tyne, 2016.

For re-use rights please refer to the publisher's terms and conditions.


Dataflow-style workflows offer a simple, high-level programming model for flexible prototyping of scientific applications as an attractive alternative to low-level scripting. At the same time, workflow management systems (WfMS) may support data parallelism over big datasets by providing scalable, distributed deployment and execution of the workflow over a cloud infrastructure. In theory, the combination of these properties makes workflows a natural choice for implementing Big Data processing pipelines, common for instance in bioinformatics. In practice, however, correct workflow design for parallel Big Data problems can be complex and very time-consuming.

Publication metadata

Author(s): Cala J, Marei E, Xu Y, Takeda K, Missier P

Publication type: Report

Publication status: Published

Series Title: School of Computing Science Technical Report Series

Year: 2016

Pages: 25

Print publication date: 01/01/2016

Acceptance date: 01/01/2016

Report Number: 1491

Institution: School of Computing Science, University of Newcastle upon Tyne

Place Published: Newcastle upon Tyne