Toggle Main Menu Toggle Search

Open Access padlockePrints

Design and Performance Analysis of Fail-Signal Based Consensus Protocols for Byzantine Faults

Lookup NU author(s): Qurat-Ul-Ain Tariq

Downloads


Abstract

Services offered by computing systems continue to play a crucial role in our every day lives. This thesis examines and solves a challenging problem in making these services dependable using means that can be assured not to compromise service responsiveness, particularly when no failure occurs. Causes of undependability are faults and faults of all known origins, including malicious attacks, are collectively referred to as Byzantine faults. Service or state machine replication is the only known technique for tolerating Byzantine faults. It becomes more effective when replicas are spaced out over a wide area network (WAN) such as the Internet – adding tolerance to localised disasters. It requires that replicas process the randomly arriving user requests in an identical order. Achieving this requirement together with deterministic termination guarantees is impossible in a fail-prone environment. This impossibility prevails because of the inability to accurately estimate a bound on inter-replica communication delays over a WAN. Canonical protocols in the literature are designed to delay termination until the WAN preserves convergence between actual delays and the estimate used. They thus risk performance degradation of the replicated service. We eliminate this risk by using Fail-Signal processes to circumvent the impossibility. A fail-signal (FS) process is made up of redundant, Byzantine-prone processes that continually check each other’s performance. Consequently, it fails only by crashing and also signals its imminent failure. Using FS process constructs, a family of three order protocols has been developed: Protocol-0, Protocol-I and Protocol-II. Each protocol caters for a particular set of assumptions made in the FS process construction and the subsequent FS process behaviour. Protocol-I is extensively compared with a canonical protocol of Castro and Liskov which is widely acknowledged for its desirable performance. The study comprehensively establishes the cost and benefits of our approach in a variety of both real and emulated network settings, by varying number of replicas, system load and cryptographic techniques. The study shows that Protocol-I has superior performance when no failures occur.


Publication metadata

Author(s): Tariq QI

Publication type: Report

Publication status: Published

Series Title: School of Computing Science Technical Report Series

Year: 2008

Pages: 203

Print publication date: 01/01/2008

Source Publication Date: January 2008

Report Number: 1065

Institution: School of Computing Science, University of Newcastle upon Tyne

Place Published: Newcastle upon Tyne

URL: http://www.cs.ncl.ac.uk/publications/trs/papers/1065.pdf


Share