On the use and implementation of message logging

Elmootazbellah N. Elnozahy*, Willy Zwaenepoel

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

57 Scopus citations

Abstract

Message logging has long been advocated as offering better failure-free performance than coordinated checkpointing. On the contrary, we present a number of experiments showing that for compute-intensive applications executing in parallel on clusters of workstations, message logging has higher failure-free overhead than coordinated checkpointing. Message logging protocols, however, result in much shorter output latency than coordinated checkpointing. Therefore, message logging should be used for applications involving substantial interactions with the outside world, while coordinated checkpointing should be used otherwise. We also present an unorthodox message logging design that uses coordinated checkpointing with message logging, departing from the conventional approaches that use independent checkpointing. This combination of message logging and coordinated checkpointing offers several advantages, including improved failure-free performance, bounded recovery time, simplified garbage collection, and reduced complexity. Meanwhile, the new protocols retain the advantages of the conventional message logging protocols with respect to output commit. Finally, we discuss three `lessons learned' from an implementation of various message logging protocols. First, during output commit, only the dependency information for the messages in the log needs to be written to the stable storage. It is not necessary to write the message data to stable storage, leading to faster output commit. Second, the use of copy-on-write in the implementation of message logging substantially reduces the logging overhead for communication-intensive programs. Finally, we provide quantitative evidence supporting previous qualitative claims about the superiority of sender-based message logging over receiver-based logging.

Original languageEnglish (US)
Title of host publicationDigest of Papers - International Symposium on Fault-Tolerant Computing
PublisherPubl by IEEE
Pages298-307
Number of pages10
ISBN (Print)0818655224
StatePublished - 1994
Externally publishedYes
EventProceedings of the 24th International Symposium on Fault-Tolerant Computing - Austin, TX, USA
Duration: Jun 15 1994Jun 17 1994

Publication series

NameDigest of Papers - International Symposium on Fault-Tolerant Computing
ISSN (Print)0731-3071

Other

OtherProceedings of the 24th International Symposium on Fault-Tolerant Computing
CityAustin, TX, USA
Period06/15/9406/17/94

ASJC Scopus subject areas

  • Hardware and Architecture
  • General Engineering

Fingerprint

Dive into the research topics of 'On the use and implementation of message logging'. Together they form a unique fingerprint.

Cite this