Browse by author
Lookup NU author(s): Emeritus Professor Alexander RomanovskyORCiD
As building trustworthy (dependable) systems is one of the major challenges faced by software developers, dealing with various threats (such as errors, faults and failures) is becoming one of the main foci of software and system research and development. In the core of ensuring system dependability is acceptance of the fact that errors always happen in spite of all the efforts to eliminate faults in the system, its components and its environment. To this end, various fault tolerance mechanisms have been developed by researchers and used in industry. Unfortunately, more often than not these solutions ignore earlier development phases - most importantly, the architecture design - exclusively focusing on the implementation instead. This creates a dangerous gap between the requirement to build dependable (and fault tolerant) systems and the failure to deal with these issues until the implementation step. Software Architecture (SA) has been widely accepted as a way to achieve a better software quality while reducing the time and cost of production. It provides both a high-level behavioural abstraction of components and their interactions (connectors) and a description of the static structure of the system. While typical SA specifications model only the normal behaviour of the system, ignoring the abnormal ones (so that faults and errors the system will face may cause it to fail in unexpected ways), we have recently seen several approaches being developed which break the wrong pattern by specifically considering abnormal system behaviour and dealing with errors to prevent system failures. The aim of this paper is to survey the existing approaches to architecting fault tolerant systems, allowing its readers to gain better understanding of the state of the art research in this emerging area. This survey is built on developing a two-dimensional classification of the existing solutions: the first dimension is based on the traditional software engineering characteristics while the second one uses fault tolerance related parameters. The paper provides a joined unified view of the area, analyses the major trends and identifies possible directions for future research.
Author(s): Muccini H, Romanovsky A
Publication type: Report
Publication status: Published
Series Title: School of Computing Science Technical Report Series
Year: 2007
Pages: 70
Print publication date: 01/09/2007
Source Publication Date: September 2007
Report Number: 1051
Institution: School of Computing Science, University of Newcastle upon Tyne
Place Published: Newcastle upon Tyne
URL: http://www.cs.ncl.ac.uk/publications/trs/papers/1051.pdf