Toggle Main Menu Toggle Search

Open Access padlockePrints

Is newer better?-evaluating the effects of data curation on integrated analyses in Saccharomyces cerevisiae

Lookup NU author(s): Dr Katherine JamesORCiD, Professor Anil Wipat, Dr Jennifer Hallinan


Full text for this publication is not currently held within this repository. Alternative links are provided below where available.


Recent high-throughput experiments have produced a wealth of heterogeneous datasets, each of which provides information about different aspects of the cell. Consequently, integration of diverse data types is essential in order to address many biological questions. The quality of any integrated analysis system is dependent upon the quality of its component data, and upon the Gold Standard data used to evaluate it. It is commonly assumed that the quality of data improves as databases grow and change, particularly for manually curated databases. However, the validity of this assumption can be questioned, given the constant changes in the data coupled with the high level of noise associated with high-throughput experimental techniques. One of the most powerful approaches to data integration is the use of Probabilistic Functional Integrated Networks (PFINs). Here, we systematically analyse the changes in four highly-curated and widely-used online databases and evaluate the extent to which these changes affect the protein function prediction performance of PFINs in the yeast Saccharomyces cerevisiae. We find that the global trend in network performance improves over time. Where individual areas of biology are concerned, however, the most recent files do not always produce the best results. Individual datasets have unique biases towards different biological processes and by selecting and integrating relevant datasets performance can be improved. When using any type of integrated system to answer a specific biological question careful selection of raw data and Gold Standard is vital, since the most recent data may not be the most appropriate.

Publication metadata

Author(s): James K, Wipat A, Hallinan J

Publication type: Article

Publication status: Published

Journal: Integrative Biology

Year: 2012

Volume: 4

Issue: 7

Pages: 715-727

Print publication date: 23/04/2012

ISSN (print): 1757-9694

ISSN (electronic): 1757-9708

Publisher: Royal Society of Chemistry


DOI: 10.1039/c2ib00123c


Altmetrics provided by Altmetric


Funder referenceFunder name
Newcastle University Centre for the Integrative Systems Biology of Ageing and Nutrition (CISBAN)
BB/F529038/1Biotechnology and Biological Sciences Research Council (BBSRC) Systems Approaches to Biological Research (SABR) initiative
BB/F006039/1Biotechnology and Biological Sciences Research Council (BBSRC) Systems Approaches to Biological Research (SABR) initiative