Browse by author
Lookup NU author(s): Professor Brian RandellORCiD
This paper concerns an important aspect of the problem of designing fault-tolerated distributed computing systems. The concepts involved in ""backward error recovery"", ie restoring a system, or some part of a system, to a previous state which it is hoped or believed preceded the occurence of any existing errors, are formalised and generalised so as to apply to concurrent, eg distributed, systems. The formalisation is based on the use of what we term ""Occurence Graphs"" to represent the cause-effect relationships that exist between the events that occur when a system is operational, and to indicate existing possibilities for the state restoration. A protocol is presented which could be used in each of the nodes in a distributed computing system in order to provide system recoverability in the face even of multiple faults. this presentation includes a proof of the protocol's correctness.
Author(s): Merlin PM, Randell B
Publication type: Report
Publication status: Published
Series Title: Computing Laboratory Technical Report Series
Year: 1977
Pages: 46
Report Number: 113
Institution: Computing Laboratory, University of Newcastle upon Tyne
Place Published: Newcastle upon Tyne
URL: http://www.cs.ncl.ac.uk/publications/trs/papers/113.pdf