Browse by author
Lookup NU author(s): Dr Neil Speirs
In this paper, two software-based architectures for providing fail-silent processes, Voltan and Chameleon ARMORs, are analyzed using fault injection. The goal is to compare the fail-silence coverage provided by the internal error detection techniques in Chameleon ARMORs with an ideal case of full duplication provided by Voltan. Rather than providing fault tolerance through redundant customized hardware, Voltan and Chameleon take the alternate approach of providing fail-silence in software using ""off-the-shelf"" hardware components. Voltan uses duplicated processes to provide the abstraction of a fail-silent node running on a conventional processor. Chameleon supports a range of execution modes including replication and a variety of error detection techniques to provide node and process fail-silence. The goal of this study is to compare only the self-checking features of Chameleon ARMORs (i.e., ARMORs provided with the internal detection techniques) with full duplication in Voltan. The paper presents results from three different injection campaigns with two applications: Fast Fourier Transform and the radix sort. The first campaign to exercise the specific detection techniques in each system yielded a fail-silence coverage of 100% for Voltan and 99.5% for Chameleon ARMORs. The second campaign, where injection was done to areas not directly protected by any detection technique, gave a coverage of 43.3% for Voltan and 45.8% for Chameleon ARMORs. The third campaign, where random injections were done to the heap, stack, and code segments of the application processes, showed Voltan to be fail-silent 97.5% of the time and Chameleon ARMORs 84.6% or the time. In addition to providing an assessment of the fail-silence achieved by the two systems, the study also gives insights into the issues in comparing systems with different designs, implementations, and assumptions, through fault injection experiments.
Author(s): Stott DT, Speirs NA, Xu J, Bagchi S, Whisnant K, Kalbarczyk Z, Iyer RK
Publication type: Report
Publication status: Published
Series Title: Department of Computing Science Technical Report Series
Year: 2000
Pages: 24
Print publication date: 01/01/2000
Source Publication Date: 2000
Report Number: 694
Institution: Department of Computing Science, University of Newcastle upon Tyne
Place Published: Newcastle upon Tyne
URL: http://www.cs.ncl.ac.uk/publications/trs/papers/694.pdf