Browse by author
Lookup NU author(s): Dr James Smith, Professor Paul WatsonORCiD
It is argued that there is a significant class of pipelined large grain data flow computations whose wide area distribution and long running nature suggest a need for fault-tolerance, but for which existing approaches appear either costly or incomplete. This paper presents an approach which exploits limited input from the application layer to implement a low overhead recovery protocol for such data flow computations. Over a large range of possible data flow graphs, the protocol supports tolerance of a single machine failure, per execution of the computation, and in many cases a greater degree of fault-tolerance. The protocol is implemented within an emulation of a distributed query processing system. Preliminary performance measurements suggest that the overhead is indeed low.
Author(s): Smith J, Watson P
Publication type: Report
Publication status: Published
Series Title: School of Computing Science Technical Report Series
Year: 2004
Pages: 15
Print publication date: 01/10/2004
Source Publication Date: October 2004
Report Number: 861
Institution: School of Computing Science, University of Newcastle upon Tyne
Place Published: Newcastle upon Tyne
URL: http://www.cs.ncl.ac.uk/publications/trs/papers/861.pdf