Toggle Main Menu Toggle Search

Open Access padlockePrints

Prediction of workflow execution time using provenance traces: practical applications in medical data processing

Lookup NU author(s): Dr Hugo Hiden, Dr Simon Woodman, Professor Paul WatsonORCiD



This is the authors' accepted manuscript of a conference proceedings (inc. abstract) that has been published in its final definitive form by Institute of Electrical and Electronics Engineers Inc., 2016.

For re-use rights please refer to the publisher's terms and conditions.


© 2016 IEEE. The use of cloud resources for processing and analysing medical data has the potential to revolutionise the treatment of a number of chronic conditions. For example, it has been shown that it is possible to manage conditions such as diabetes, obesity and cardiovascular disease by increasing the right forms of physical activity for the patient. Typically, movement data is collected for a patient over a period of several weeks using a wrist worn accelerometer. This data, however, is large and its analysis can require significant computational resources. Cloud computing offers a convenient solution as it can be paid for as needed and is capable of scaling to store and process large numbers of data sets simultaneously. However, because the charging model for the cloud represents, to some extent, an unknown cost and therefore risk to project managers, it is important to have an estimate of the likely data processing and storage costs that will be required to analyse a set of data. This could take the form of data collected from a patient in clinic or of entire cohorts of data collected from large studies. If, however, an accurate model was available that could predict the compute and storage requirements associated with a piece of analysis code, decisions could be made as to the scale of resources required in order to obtain results within a known timescale. This paper makes use of provenance and performance data collected as part of routine e-Science Central workflow executions to examine the feasibility of automatically generating predictive models for workflow execution times based solely on observed characteristics such as data volumes processed, algorithm settings and execution durations. The utility of this approach will be demonstrated via a set of benchmarking examples before being used to model workflow executions performed as part of two large medical movement analysis studies..

Publication metadata

Author(s): Hiden HG, Woodman SJ, Watson P

Publication type: Conference Proceedings (inc. Abstract)

Publication status: Published

Conference Name: 2016 IEEE 12th International Conference on e-Science

Year of Conference: 2016

Pages: 21-30

Online publication date: 06/03/2017

Acceptance date: 01/08/2016

Date deposited: 14/09/2016

Publisher: Institute of Electrical and Electronics Engineers Inc.


DOI: 10.1109/eScience.2016.7870882

Library holdings: Search Newcastle University Library for this item

ISBN: 9781509042739