Toggle Main Menu Toggle Search

Open Access padlockePrints

Architecting Fault Tolerant Systems

Lookup NU author(s): Professor Alexander RomanovskyORCiD



As building trustworthy (dependable) systems is one of the major challenges faced by software developers, dealing with various threats (such as errors, faults and failures) is becoming one of the main foci of software and system research and development. In the core of ensuring system dependability is acceptance of the fact that errors always happen in spite of all the efforts to eliminate faults in the system, its components and its environment. To this end, various fault tolerance mechanisms have been developed by researchers and used in industry. Unfortunately, more often than not these solutions ignore earlier development phases - most importantly, the architecture design - exclusively focusing on the implementation instead. This creates a dangerous gap between the requirement to build dependable (and fault tolerant) systems and the failure to deal with these issues until the implementation step. Software Architecture (SA) has been widely accepted as a way to achieve a better software quality while reducing the time and cost of production. It provides both a high-level behavioural abstraction of components and their interactions (connectors) and a description of the static structure of the system. While typical SA specifications model only the normal behaviour of the system, ignoring the abnormal ones (so that faults and errors the system will face may cause it to fail in unexpected ways), we have recently seen several approaches being developed which break the wrong pattern by specifically considering abnormal system behaviour and dealing with errors to prevent system failures. The aim of this paper is to survey the existing approaches to architecting fault tolerant systems, allowing its readers to gain better understanding of the state of the art research in this emerging area. This survey is built on developing a two-dimensional classification of the existing solutions: the first dimension is based on the traditional software engineering characteristics while the second one uses fault tolerance related parameters. The paper provides a joined unified view of the area, analyses the major trends and identifies possible directions for future research.

Publication metadata

Author(s): Muccini H, Romanovsky A

Publication type: Report

Publication status: Published

Series Title: School of Computing Science Technical Report Series

Year: 2007

Pages: 70

Print publication date: 01/09/2007

Source Publication Date: September 2007

Report Number: 1051

Institution: School of Computing Science, University of Newcastle upon Tyne

Place Published: Newcastle upon Tyne