Browse by author
Lookup NU author(s): Emeritus Professor Alexander RomanovskyORCiD
Developers of fault-tolerant distributed systems must guarantee that the fault tolerance mechanisms they build are, themselves, reliable. Otherwise, these mechanisms might end up contributing negatively to overall system dependability, thus defeating the purpose of introducing fault tolerance into the system. To achieve the desired levels of reliability, the development of mechanisms for detecting and handling errors should be rigorous or formal. We present an approach to modeling and verifying fault-tolerant distributed systems that use exception handling as the main fault tolerance mechanism. The proposed approach is based on a formal model for specifying the structure of a system in terms of cooperating participants that handle exceptions in a coordinated manner. We employ coordinated atomic actions as a representative of mechanisms for exception handling in concurrent systems. We have validated the proposed approach by means of two case studies: (i) a system responsible for managing a production cell; and (ii) a medical control system. For both systems, the proposed approach helped us to uncover design faults in the form of implicit assumptions and omissions in the original specifications.
Author(s): Castor Filho F, Romanovsky A, Rubira CMF
Publication type: Report
Publication status: Published
Series Title: School of Computing Science Technical Report Series
Year: 2008
Pages: 43
Print publication date: 01/06/2008
Source Publication Date: June 2008
Report Number: 1105
Institution: School of Computing Science, University of Newcastle upon Tyne
Place Published: Newcastle upon Tyne
URL: http://www.cs.ncl.ac.uk/publications/trs/papers/1105.pdf