Toggle Main Menu Toggle Search

Open Access padlockePrints

Golden-Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository

Lookup NU author(s): Professor Paolo MissierORCiD



Experimental science can be thought of as the exploration of a large research space, in search of a few valuable results. While it is this “Golden Data” that gets published, the history of the exploration is often as valuable to the scientists as some of its outcomes. We envision an e-research infrastructure that is capable of systematically and automatically recording such history – an assumption that holds today for a number of workflow management systems routinely used in e-science. In keeping with our goldrush metaphor, the provenance of a valuable result is a Golden Trail: logically it represents a detailed account of how the Golden Data was arrived at, technically it is a sub-graph in the much larger graph of provenance traces that collectively tell the story of the entire research (or of some of it).In this paper we describe a model and architecture for a repository dedicated to storing provenance traces and selectively retrieving Golden Trails from it. As traces from multiple experiments over long periods of time are accommodated, the trails may be sub-graphs of one trace, or they may be the logical representation of a virtual experiment obtained by joining together traces that share common data.The project has been carried out within the Provenance Working Group of the Data Observation Network for Earth (DataONE) NSF project. Ultimately, our longer-term plan is to integrate the provenance repository into the data preservation architecture currently being developed by DataONE.

Publication metadata

Author(s): Missier P, Ludäscher B, Dey S, Wang M, McPhillips T, Bowers S, Agun M, Altintas I

Publication type: Report

Publication status: Published

Series Title: School of Computing Science Technical Report Series

Year: 2011

Pages: 14

Print publication date: 01/11/2011

Source Publication Date: November 2011

Report Number: 1300

Institution: School of Computing Science, University of Newcastle upon Tyne

Place Published: Newcastle upon Tyne