Toggle Main Menu Toggle Search

Open Access padlockePrints

Architecting Holistic Fault Tolerance

Lookup NU author(s): Rem Gensh, Dr Ashur Rafiev, Professor Alexander RomanovskyORCiD, Dr Fei Xia, Professor Alex Yakovlev

Downloads


Licence

This is the final published version of a conference proceedings (inc. abstract) that has been published in its final definitive form by IEEE Computer Society, 2017.

For re-use rights please refer to the publisher's terms and conditions.


Abstract

The optimality and maintainability of fault tolerance mechanisms in a computer system has typically not been a major topic of concern, mostly because fault tolerance is a non-functional system requirement. This paper proposes a Holistic Fault Tolerance architecture, based on a centralised fault tolerance management, with related functionality distributed across the entire system. The most suitable error detection and error recovery strategies for a given application are chosen by a special crosscutting controller depending on error rates, system performance and resource utilisation requirements. We discuss the motivation for introducing this holistic fault tolerance architecture and reason about its benefits from the point of view of optimal system operation and improved maintainability. The advantages and possible implementation challenges of the proposed approach are demonstrated by a real-world application


Publication metadata

Author(s): Gensh R, Rafiev A, Romanovsky A, Garcia A, Xia F, Yakovlev A

Publication type: Conference Proceedings (inc. Abstract)

Publication status: Published

Conference Name: 18th IEEE International Symposium on High-Assurance Systems Engineering (HASE 2017)

Year of Conference: 2017

Pages: 5-8

Online publication date: 27/04/2017

Acceptance date: 02/04/2016

Date deposited: 27/06/2017

Publisher: IEEE Computer Society

URL: https://doi.org/10.1109/HASE.2017.13

DOI: 10.1109/HASE.2017.13

Library holdings: Search Newcastle University Library for this item

ISBN: 9781509046355


Share