Toggle Main Menu Toggle Search

Open Access padlockePrints

Sharing and performance optimization of reproducible workflows in the cloud

Lookup NU author(s): Rawaa Qasha, Dr Zhenyu Wen, Dr Jacek CalaORCiD, Professor Paul WatsonORCiD

Downloads


Licence

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND).


Abstract

Scientific workflows play a vital role in modern science as they enable scientists to specify, share and reuse computational experiments. To maximise the benefits, workflows need to support the reproducibility of the experimental methods they capture. Reproducibility enables effective sharing as scientists can re-execute experiments developed by others and quickly derive new or improved results. However, achieving reproducibility in practice is problematic - previous analyses highlight issues due to uncontrolled changes in the input data, configuration parameters, workflow description and the software used to implement the workflow tasks. The resulting problems have become known as workflow decay.In this paper we present a novel framework that addresses workflow decay through the integration of system description, version control, container management and automated deployment techniques. It then introduces a set ofperformance optimization techniques that significantly reduce the runtime overheads caused by making workflows re-producible. The resulting system significantly improves the performance, repeatability and also the ability to shareand re-use workflows by combining a method to uniquely identify task and workow images with an automated image capture facility and a multi-level cache.The system is evaluated through an extensive set of experiments that validate the approach and highlight the keybenefits of the proposed optimisations. This includes methods for reducing the runtime of workflows by up to an orderof magnitude in cases where they are enacted concurrently on the same host VM and in different Clouds, and wherethey share tasks.


Publication metadata

Author(s): Qasha R, Wen Z, Cala J, Watson P

Publication type: Article

Publication status: Published

Journal: Future Generation Computer Systems

Year: 2019

Volume: 29

Pages: 487-502

Print publication date: 01/09/2019

Online publication date: 03/04/2019

Acceptance date: 24/03/2019

Date deposited: 06/04/2019

ISSN (print): 0167-739X

ISSN (electronic): 1872-7115

Publisher: Elsevier BV

URL: https://doi.org/10.1016/j.future.2019.03.045

DOI: 10.1016/j.future.2019.03.045


Altmetrics

Altmetrics provided by Altmetric


Share