Toggle Main Menu Toggle Search

Open Access padlockePrints

Why-Diff: Explaining differences amongst similar workflow runs by exploiting scientific metadata

Lookup NU author(s): Priyaa Thavasimani, Dr Jacek CalaORCiD, Professor Paolo MissierORCiD


Full text for this publication is not currently held within this repository. Alternative links are provided below where available.


© 2017 IEEE. Majority of workflows executed nowadays need to process a massive amount of data. Re-execution of such dataintensive scientific workflows often results in different outputs. Scientific research progresses when discoveries are reproduced and verified. However, simply re-enacting a scientific computation, such as a workflow, does not guarantee the correctness of results because of unintentional changes that may have interfered with the re-enactment process. We investigate the hypothesis that the metadata of a workflow execution can be used to explain why the experimenter observes different results (cause analysis). Similarly, Scientific metadata can be used to determine the impact of intentional variations that the experimenter may have injected into a new version of the workflow. We explore these two complementary cases using a simple algorithm for traversing two metadata traces in lock-step mode, which we illustrate through two human genomics data analysis workflows.

Publication metadata

Author(s): Thavasimani P, Cala J, Missier P

Publication type: Conference Proceedings (inc. Abstract)

Publication status: Published

Conference Name: International Conference on Big Data

Year of Conference: 2017

Pages: 3031-3041

Online publication date: 15/01/2018

Acceptance date: 02/04/2016

Publisher: IEEE


DOI: 10.1109/BigData.2017.8258275

Library holdings: Search Newcastle University Library for this item

ISBN: 9781538627150