Improving reliability of cooperative concurrent systems with exception flow analysis

Castor, F; Romanovsky, A; Rubira, CMF

doi:10.1016/j.jss.2008.12.015

Improving reliability of cooperative concurrent systems with exception flow analysis

Lookup NU author(s): Emeritus Professor Alexander Romanovsky ORCiD

Downloads

Accepted version [.pdf]

Abstract

Developers of fault-tolerant distributed systems need to guarantee that fault tolerance mechanisms they build are in themselves reliable. Otherwise, these mechanisms might in the end negatively affect overall system dependability, thus defeating the purpose of introducing fault tolerance into the system. To achieve the desired levels of reliability, mechanisms for detecting and handling errors should be developed rigorously or formally. We present an approach to modeling and verifying fault-tolerant distributed systems that use exception handling as the main fault tolerance mechanism. In the proposed approach, a formal model is employed to specify the structure of a system in terms of cooperating participants that handle exceptions in a coordinated manner, and coordinated atomic actions serve as representatives of mechanisms for exception handling in concurrent systems. We validate the approach through two case studies: (i) a system responsible for managing a production cell, and (ii) a medical control system. In both systems, the proposed approach has helped us to uncover design faults in the form of implicit assumptions and omissions in the original specifications. (C) 2008 Elsevier Inc. All rights reserved.

Publication metadata

Author(s): Castor F, Romanovsky A, Rubira CMF

Publication type: Article

Publication status: Published

Journal: Journal of Systems and Software

Year: 2009

Volume: 82

Issue: 5

Pages: 874-890

Date deposited: 28/09/2010

ISSN (print): 0164-1212

ISSN (electronic): 1873-1228

Publisher: Elsevier Inc.

URL: http://dx.doi.org/10.1016/j.jss.2008.12.015

DOI: 10.1016/j.jss.2008.12.015

Altmetrics

Funding

Funder reference	Funder name
	EPSRC/UK TrAmS
02/13996-2	FAPESP/Brazil
06/04976-9	FAPESP/Brazil
301446/2006-7	CNPq/Brazil
481147/2007-1	CNPq/Brazil
550895/2007-8	CNPq/Brazil
484138/2006-5	CNPq/Brazil

ePrints

Improving reliability of cooperative concurrent systems with exception flow analysis

Downloads

Abstract

Publication metadata

Altmetrics

Funding

Share