Browse by author
Lookup NU author(s): Professor Brian RandellORCiD
We discuss a general approach to the design of fault-tolerant computing systems, concentrating on issues of system structuring rather than on the design of particular algorithms. Three forms of structuring are described. The first is based on the use of what we term ""idealised fault-tolerant components"". Such components provide a means of system structuring which makes it easy to identify what parts of a system have what responsibilities for trying to cope with what sorts of faults. The second is a ""recursive structuring"" scheme. It involves using complete computers as the basic idealised fault-tolerant components of a distributed computing system whose functionality matches that of its component computers. Finally we discuss a generalisation of the usual concepts of an ""atomic action"", which provides a means of structuring both forward and backward error recovery in distributed systems. These discussions are given in general terms, and also illustrated by brief accounts of recent and current work at Newcastle on the construction of UNIX-based fault-tolerant and distributed systems.
Author(s): Randell B
Publication type: Report
Publication status: Published
Series Title: Computing Laboratory Technical Report Series
Year: 1983
Pages: 24
Report Number: 189
Institution: Computing Laboratory, University of Newcastle upon Tyne
Place Published: Newcastle upon Tyne
URL: http://www.cs.ncl.ac.uk/publications/trs/papers/189.pdf