Implementing fail-silent nodes for distributed systems

Brasileiro, FV; Ezhilchelvan, PD; Shrivastava, SK; Speirs, NA; Tao, S

doi:10.1109/12.544479

Implementing fail-silent nodes for distributed systems

Lookup NU author(s): Dr Paul Ezhilchelvan ORCiD, Emeritus Professor Santosh Shrivastava, Dr Neil Speirs

Downloads

Accepted version [.pdf]

Abstract

A fail-silent node is a self-checking node that either functions correctly or stops functioning after an internal failure is detected. Such a node can be constructed from a number of conventional processors. In a software-implemented fail-silent node, the nonfaulty processors of the node need to execute message order and comparison protocols to "keep in step" and check each other, respectively. In this paper, the design and implementation of efficient protocols for a two processor fail-silent node are described in detail. The performance figures obtained indicate that in a wide class of applications requiring a high degree of fault-tolerance, software-implemented fail-silent nodes constructed simply by utilizing standard "off-the-shelf" components are an attractive alternative to their hardware-implemented counterparts that do require special-purpose hardware components, such as fault-tolerant clocks, comparator, and bus interface circuits. ©1996 IEEE.

Publication metadata

Author(s): Brasileiro FV, Ezhilchelvan PD, Shrivastava SK, Speirs NA, Tao S

Publication type: Article

Publication status: Published

Journal: IEEE Transactions on Computers

Year: 1996

Volume: 45

Issue: 11

Pages: 1226-1238

Print publication date: 01/01/1996

Date deposited: 07/02/2011

ISSN (print): 0018-9340

ISSN (electronic): 1557-9956

Publisher: IEEE

URL: http://dx.doi.org/10.1109/12.544479

DOI: 10.1109/12.544479

Altmetrics

Altmetrics provided by Altmetric

ePrints

Implementing fail-silent nodes for distributed systems

Downloads

Abstract

Publication metadata

Altmetrics

Share