Browse by author
Lookup NU author(s): Professor Pete Lee
Full text is not currently available for this publication.
This paper focuses on the problem of fault tolerance in shared memory multiprocessors, and describes an architecture designed for transparently tolerating processor failures. The Recoverable Shared Memory (RSM) is the novel component of this architecture, providing a hardware supported backward error recovery mechanism which minimises the propagation of recovery when a processor fails. The RSM permits a shared memory multiprocessor to be constructed using standard caches and cache coherence protocols, and does not require any changes to be made to applications software. A prototype design for the RSM is also described. The performance of the recovery scheme supported by the RSM is evaluated and compared with other schemes that have been proposed for fault tolerant shared memory multiprocessors. The performance study has been conducted by simulation using address traces collected from real parallel applications.
Author(s): Banatre M, Gefflaut A, Joubert P, Morin C, Lee PA
Publication type: Report
Publication status: Published
Series Title: Department of Computing Science Technical Report Series
Print publication date: 01/01/1994
Source Publication Date: 1994
Report Number: 485
Institution: Department of Computing Science, University of Newcastle upon Tyne
Place Published: Newcastle upon Tyne