TY - GEN
T1 - Chaos Engineering for Enhanced Resilience of Cyber-Physical Systems
AU - Konstantinou, Charalambos
AU - Stergiopoulos, George
AU - Parvania, Masood
AU - Esteves-Verissimo, Paulo
N1 - KAUST Repository Item: Exported on 2023-05-05
PY - 2021
Y1 - 2021
N2 - Cyber-physical systems (CPS) incorporate the complex and large-scale engineered systems behind critical infrastructure operations, such as water distribution networks, energy delivery systems, healthcare services, manufacturing systems, and transportation networks. Industrial CPS in particular need to simultaneously satisfy requirements of available, secure, safe and reliable system operation against diverse threats, in an adaptive and sustainable way. These adverse events can be of accidental or malicious nature and may include natural disasters, hardware or software faults, cyberattacks, or even infrastructure design and implementation faults. They may drastically affect the results of CPS algorithms and mechanisms, and subsequently the operations of industrial control systems (ICS) deployed in those critical infrastructures. Such a demanding combination of properties and threats calls for resilience-enhancement methodologies and techniques, working in real-time operation. However, the analysis of CPS resilience is a difficult task as it involves evaluation of various interdependent layers with heterogeneous computing equipment, physical components, network technologies, and data analytics. In this paper, we apply the principles of chaos engineering (CE) to industrial CPS, in order to demonstrate the benefits of such practices on system resilience. The systemic uncertainty of adverse events can be tamed by applying runtime CE-based analyses to CPS in production, in order to predict environment changes and thus apply mitigation measures limiting the range and severity of the event, and minimizing its blast radius.
AB - Cyber-physical systems (CPS) incorporate the complex and large-scale engineered systems behind critical infrastructure operations, such as water distribution networks, energy delivery systems, healthcare services, manufacturing systems, and transportation networks. Industrial CPS in particular need to simultaneously satisfy requirements of available, secure, safe and reliable system operation against diverse threats, in an adaptive and sustainable way. These adverse events can be of accidental or malicious nature and may include natural disasters, hardware or software faults, cyberattacks, or even infrastructure design and implementation faults. They may drastically affect the results of CPS algorithms and mechanisms, and subsequently the operations of industrial control systems (ICS) deployed in those critical infrastructures. Such a demanding combination of properties and threats calls for resilience-enhancement methodologies and techniques, working in real-time operation. However, the analysis of CPS resilience is a difficult task as it involves evaluation of various interdependent layers with heterogeneous computing equipment, physical components, network technologies, and data analytics. In this paper, we apply the principles of chaos engineering (CE) to industrial CPS, in order to demonstrate the benefits of such practices on system resilience. The systemic uncertainty of adverse events can be tamed by applying runtime CE-based analyses to CPS in production, in order to predict environment changes and thus apply mitigation measures limiting the range and severity of the event, and minimizing its blast radius.
UR - http://hdl.handle.net/10754/669825
UR - https://ieeexplore.ieee.org/document/9611797/
UR - http://www.scopus.com/inward/record.url?scp=85121945207&partnerID=8YFLogxK
U2 - 10.1109/RWS52686.2021.9611797
DO - 10.1109/RWS52686.2021.9611797
M3 - Conference contribution
SN - 978-1-6654-2906-1
BT - 2021 Resilience Week (RWS)
PB - IEEE
ER -