Toggle Main Menu Toggle Search

Open Access padlockePrints

Applying Low-Overhead Rollback-Recovery to Wide Area Distributed Query Processing

Lookup NU author(s): Dr James Smith, Professor Paul WatsonORCiD



It is argued that there is a significant class of pipelined large grain data flow computations whose wide area distribution and long running nature suggest a need for fault-tolerance, but for which existing approaches appear either costly or incomplete. This paper presents an approach which exploits limited input from the application layer to implement a low overhead recovery protocol for such data flow computations. Over a large range of possible data flow graphs, the protocol supports tolerance of a single machine failure, per execution of the computation, and in many cases a greater degree of fault-tolerance. The protocol is implemented within an emulation of a distributed query processing system. Preliminary performance measurements suggest that the overhead is indeed low.

Publication metadata

Author(s): Smith J, Watson P

Publication type: Report

Publication status: Published

Series Title: School of Computing Science Technical Report Series

Year: 2004

Pages: 15

Print publication date: 01/10/2004

Source Publication Date: October 2004

Report Number: 861

Institution: School of Computing Science, University of Newcastle upon Tyne

Place Published: Newcastle upon Tyne