Toggle Main Menu Toggle Search

Open Access padlockePrints

State Restoration in Distributed Systems

Lookup NU author(s): Professor Brian RandellORCiD

Downloads

Full text for this publication is not currently held within this repository. Alternative links are provided below where available.


Abstract

This paper concerns an important aspect of the problem of designing fault-tolerant distributed computing systems. The concepts involved in "backward error recovery", i.e. restoring a system, or some part of a system, to a previous state which it is hoped or believed preceded the occurrence of any existing errors are formalised, and generalised so as to apply to concurrent, e.g. distributed, systems. Since in distributed systems there may exist a great deal of independence between activities, the system can be restored to a state that could have existed rather than to a state that actually existed. The formalisation is based on the use of what we term "Occurrence Graphs" to represent the cause-effect relationships that exist between the events that occur when a system is operational, and to indicate existing possibilities for state restoration. A protocol is presented which could be used in each of the nodes in a distributed computing system in order to provide system recoverability in the face even of multiple faults.


Publication metadata

Author(s): Merlin PM, Randell B

Publication type: Conference Proceedings (inc. Abstract)

Publication status: Published

Conference Name: 8th International Conference on Fault-Tolerant Computing (FTCS)

Year of Conference: 1978

Pages: 129-134

Publisher: IEEE Computer Society Press


Share