Scalability of partial differential equations preconditioner resilient to soft and hard faults

Karla Morris*, Francesco Rizzi, Khachik Sargsyan, Kathryn Dahlgren, Paul Mycek, Cosmin Safta, Olivier Le Maître, Omar Knio, Bert Debusschere

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present a resilient domain-decomposition preconditioner for partial differential equations (PDEs). The algorithm reformulates the PDE as a sampling problem, followed by a solution update through data manipulation that is resilient to both soft and hard faults. We discuss an implementation based on a server-client model where all state information is held by the servers, while clients are designed solely as computational units. Servers are assumed to be “sandboxed”, while no assumption is made on the reliability of the clients. We explore the scalability of the algorithm up to ∼12k cores, build an SST/macro skeleton to extrapolate to∼50k cores, and show the resilience under simulated hard and soft faults for a 2D linear Poisson equation.

Original languageEnglish (US)
Title of host publicationHigh Performance Computing - 31st International Conference, ISC High Performance 2016, Proceedings
EditorsJack Dongarra, Julian M. Kunkel, Pavan Balaji
PublisherSpringer Verlag
Pages469-485
Number of pages17
ISBN (Print)9783319413204
DOIs
StatePublished - 2016
Event31st International Conference on High Performance Computing, ISC High Performance 2016 - Frankfurt, Germany
Duration: Jun 19 2016Jun 23 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9697
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other31st International Conference on High Performance Computing, ISC High Performance 2016
Country/TerritoryGermany
CityFrankfurt
Period06/19/1606/23/16

Bibliographical note

Funding Information:
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under Award Numbers 13-016717. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Publisher Copyright:
© Springer International Publishing Switzerland 2016.

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Scalability of partial differential equations preconditioner resilient to soft and hard faults'. Together they form a unique fingerprint.

Cite this