Toggle Main Menu Toggle Search

Open Access padlockePrints

Prediction of workflow execution time using provenance traces: practical applications in medical data processing

Lookup NU author(s): Dr Hugo Hiden, Dr Simon Woodman, Professor Paul WatsonORCiD



The use of cloud resources for processing and analysing medical data has the potential to revolutionise the treatment of a number of chronic conditions. For example, it has been shown that it is possible to manage conditions such as diabetes, obesity and cardiovascular disease by increasing the right forms of physical activity for the patient. Typically, movement data is collected for a patient over a period of several weeks using a wrist worn accelerometer. This data, however, is large and its analysis can require significant computational resources. Cloud computing offers a convenient solution as it can be paid for as needed and is capable of scaling to store and process large numbers of data sets simultaneously. However, because the charging model for the cloud represents, to some extent, an unknown cost and therefore risk to project managers, it is important to have an estimate of the likely data processing and storage costs that will be required to analyse a set of data. This could take the form of data collected from a patient in clinic or of entire cohorts of data collected from large studies. If, however, an accurate model was available that could predict the compute and storage requirements associated with a piece of analysis code, decisions could be made as to the scale of resources required in order to obtain results within a known timescale. This paper makes use of provenance and performance data collected as part of routine e-Science Central workflow executions to examine the feasibility of automatically generating predictive models for workflow execution times based solely on observed characteristics such as data volumes processed, algorithm settings and execution durations. The utility of this approach will be demonstrated via a set of benchmarking examples before being used to model workflow executions performed as part of two large medical movement analysis studies.

Publication metadata

Author(s): Hiden H, Woodman S, Watson P

Publication type: Report

Publication status: Published

Series Title: School of Computing Science Technical Report Series

Year: 2016

Pages: 10

Print publication date: 14/09/2016

Acceptance date: 14/09/2016

Report Number: 1499

Institution: School of Computing Science, University of Newcastle upon Tyne

Place Published: Newcastle upon Tyne