Fault prediction in distributed systems gone wild

Marco Canini*, Dejan Novaković, Vojin Jovanović, Dejan Kostić

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

We consider the problem of predicting faults in deployed, large-scale distributed systems that are heterogeneous and federated. Motivated by the importance of ensuring reliability of the services these systems provide, we argue that the key step in making these systems reliable is the need to automatically predict faults. For example, doing so is vital for avoiding Internet-wide outages that occur due to programming errors or misconfigurations.

Original languageEnglish (US)
Title of host publicationProceedings of the 4th ACM/SIGOPS Workshop on Large-Scale Distributed Systems and Middleware, LADIS 2010
Pages7-11
Number of pages5
DOIs
StatePublished - 2010
Externally publishedYes
Event4th ACM/SIGOPS Workshop on Large-Scale Distributed Systems and Middleware, LADIS 2010 - Zurich, Switzerland
Duration: Jul 28 2010Jul 29 2010

Publication series

NameProceedings of the 4th ACM/SIGOPS Workshop on Large-Scale Distributed Systems and Middleware, LADIS 2010

Other

Other4th ACM/SIGOPS Workshop on Large-Scale Distributed Systems and Middleware, LADIS 2010
Country/TerritorySwitzerland
CityZurich
Period07/28/1007/29/10

Keywords

  • BGP
  • fault prediction
  • federated systems
  • heterogeneous systems
  • shadow snapshot
  • spatial and temporal awareness

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Fault prediction in distributed systems gone wild'. Together they form a unique fingerprint.

Cite this