Modeling and evaluating the resilience of systems, potentially complex and large-scale in nature, has recently raised significant interest among both practitioners and researchers. This recent interest has resulted in several definitions of the concept of resilience and several approaches to measuring this concept, across several application domains. As such, this paper presents a review of recent research articles related to defining and quantifying resilience in various disciplines, with a focus on engineering systems. We provide a classification scheme to the approaches in the literature, focusing on qualitative and quantitative approaches and their subcategories. Addressed in this review are: an extensive coverage of the literature, an exploration of current gaps and challenges, and several directions for future research.
Modern societies are becoming increasingly dependent on critical infrastructure systems (CISs) to provide essential services that support economic prosperity, governance, and quality of life. These systems are not alone but interdependent at multiple levels to enhance their overall performance. However, recent worldwide events such as the 9/11 terrorist attack, Gulf Coast hurricanes, the Chile and Japanese earthquakes, and even heat waves have highlighted that interdependencies among CISs increase the potential for cascading failures and amplify the impact of both large and small scale initial failures into events of catastrophic proportions. To better understand CISs to support planning, maintenance and emergency decision making, modeling and simulation of interdependencies across CISs has recently become a key field of study. This paper reviews the studies in the field and broadly groups the existing modeling and simulation approaches into six types: empirical approaches, agent based approaches, system dynamics based approaches, economic theory based approaches, network based approaches, and others. Different studies for each type of the approaches are categorized and reviewed in terms of fundamental principles, such as research focus, modeling rationale, and the analysis method, while different types of approaches are further compared according to several criteria, such as the notion of resilience. Finally, this paper offers future research directions and identifies critical challenges in the field.
In this paper, we have reviewed various approaches to defining resilience and the assessment of resilience. We have seen that while resilience is a useful concept, its diversity in usage complicates its interpretation and measurement. In this paper, we have proposed a resilience analysis framework and a metric for measuring resilience. Our analysis framework consists of system identification, resilience objective setting, vulnerability analysis, and stakeholder engagement. The implementation of this framework is focused on the achievement of three resilience capacities: adaptive capacity, absorptive capacity, and recoverability. These three capacities also form the basis of our proposed resilience factor and uncertainty-weighted resilience metric. We have also identified two important unresolved discussions emerging in the literature: the idea of resilience as an epistemological versus inherent property of the system, and design for ecological versus engineered resilience in socio-technical systems. While we have not resolved this tension, we have shown that our framework and metric promote the development of methodologies for investigating “deep” uncertainties in resilience assessment while retaining the use of probability for expressing uncertainties about highly uncertain, unforeseeable, or unknowable hazards in design and management activities.
Condition-based maintenance (CBM) is a maintenance strategy that collects and assesses real-time information, and recommends maintenance decisions based on the current condition of the system. In recent decades, research on CBM has been rapidly growing due to the rapid development of computer-based monitoring technologies. Research studies have proven that CBM, if planned properly, can be effective in improving equipment reliability at reduced costs. This paper presents a review of CBM literature with emphasis on mathematical modeling and optimization approaches. We focus this review on important aspects of the CBM, such as optimization criteria, inspection frequency, maintenance degree, solution methodology, etc. Since the modeling choice for the stochastic deterioration process greatly influences CBM strategy decisions, this review classifies the literature on CBM models based on the underlying deterioration processes, namely discrete- and continuous-state deterioration, and proportional hazard model. CBM models for multi-unit systems are also reviewed in this paper. This paper provides useful references for CBM management professionals and researchers working on CBM modeling and optimization.
Effective health diagnosis provides multifarious benefits such as improved safety, improved reliability and reduced costs for operation and maintenance of complex engineered systems. This paper presents a novel multi-sensor health diagnosis method using deep belief network (DBN). DBN has recently become a popular approach in machine learning for its promised advantages such as fast inference and the ability to encode richer and higher order network structures. The DBN employs a hierarchical structure with multiple stacked restricted Boltzmann machines and works through a layer by layer successive learning process. The proposed multi-sensor health diagnosis methodology using DBN based state classification can be structured in three consecutive stages: first, defining health states and preprocessing sensory data for DBN training and testing; second, developing DBN based classification models for diagnosis of predefined health states; third, validating DBN classification models with testing sensory dataset. Health diagnosis using DBN based health state classification technique is compared with four existing diagnosis techniques. Benchmark classification problems and two engineering health diagnosis applications: aircraft engine health diagnosis and electric power transformer health diagnosis are employed to demonstrate the efficacy of the proposed approach.
Traditionally, system prognostics and health management (PHM) depends on sufficient prior knowledge of critical components degradation process in order to predict the remaining useful life (RUL). However, the accurate physical or expert models are not available in most cases. This paper proposes a new data-driven approach for prognostics using deep convolution neural networks (DCNN). Time window approach is employed for sample preparation in order for better feature extraction by DCNN. Raw collected data with normalization are directly used as inputs to the proposed network, and no prior expertise on prognostics and signal processing is required, that facilitates the application of the proposed method. In order to show the effectiveness of the proposed approach, experiments on the popular C-MAPSS dataset for aero-engine unit prognostics are carried out. High prognostic accuracy on the RUL estimation is achieved. The superiority of the proposed method is demonstrated by comparisons with other popular approaches and the state-of-the-art results on the same dataset. The results of this study suggest that the proposed data-driven prognostic method offers a new and promising approach.
The concept of system resilience is important and popular—in fact, hyper-popular over the last few years. Clarifying the technical meanings and foundations of the concept of resilience would appear to be necessary. Proposals for defining resilience are flourishing as well. This paper organizes the different technical approaches to the question of what is resilience and how to engineer it in complex adaptive systems. This paper groups the different uses of the label ‘resilience’ around four basic concepts: (1) resilience as rebound from trauma and return to equilibrium; (2) resilience as a synonym for robustness; (3) resilience as the opposite of brittleness, i.e., as graceful extensibility when surprise challenges boundaries; (4) resilience as network architectures that can sustain the ability to adapt to future surprises as conditions evolve.
This article surveys the application of gamma processes in maintenance. Since the introduction of the gamma process in the area of reliability in 1975, it has been increasingly used to model stochastic deterioration for optimising maintenance. Because gamma processes are well suited for modelling the temporal variability of deterioration, they have proven to be useful in determining optimal inspection and maintenance decisions. An overview is given of the rich theoretical aspects as well as the successful maintenance applications of gamma processes. The statistical properties of the gamma process as a probabilistic stress–strength model are given and put in a historic perspective. Furthermore, methods for estimation, approximation, and simulation of gamma processes are reviewed. Finally, an extensive catalogue of inspection and maintenance models under gamma-process deterioration is presented with the emphasis on engineering applications.
Safety analysis in gas process facilities is necessary to prevent unwanted events that may cause catastrophic accidents. Accident scenario analysis with probability updating is the key to dynamic safety analysis. Although conventional failure assessment techniques such as fault tree (FT) have been used effectively for this purpose, they suffer severe limitations of static structure and uncertainty handling, which are of great significance in process safety analysis. Bayesian network (BN) is an alternative technique with ample potential for application in safety analysis. BNs have a strong similarity to FTs in many respects; however, the distinct advantages making them more suitable than FTs are their ability in explicitly representing the dependencies of events, updating probabilities, and coping with uncertainties. The objective of this paper is to demonstrate the application of BNs in safety analysis of process systems. The first part of the paper shows those modeling aspects that are common between FT and BN, giving preference to BN due to its ability to update probabilities. The second part is devoted to various modeling features of BN, helping to incorporate multi-state variables, dependent failures, functional uncertainty, and expert opinion which are frequently encountered in safety analysis, but cannot be considered by FT. The paper concludes that BN is a superior technique in safety analysis because of its flexible structure, allowing it to fit a wide variety of accident scenarios.
The objective of this paper is to provide a systematic view on the problem of vulnerability and risk analysis of critical infrastructures. Reflections are made on the inherent complexities of these systems, related challenges are identified and possible ways forward for their analysis and management are indicated. Specifically: the framework of vulnerability and risk analysis is examined in relation to its application for the protection and resilience of critical infrastructures; it is argued that the complexity of these systems is a challenging characteristic, which calls for the integration of different modeling perspectives and new approaches of analysis; examples of are given in relation to the Internet and, particularly, the electric power grid, as representative of critical infrastructures and the associated complexity; the integration of different types of analyses and methods of system modeling is put forward for capturing the inherent structural and dynamic complexities of critical infrastructures and eventually evaluating their vulnerability and risk characteristics, so that decisions on protections and resilience actions can be taken with the required confidence.
This paper reviews the definition and meaning of the concept of risk. The review has a historical and development trend perspective, also covering recent years. It is questioned if, and to what extent, it is possible to identify some underlying patterns in the way risk has been, and is being understood today. The analysis is based on a new categorisation of risk definitions and an assessment of these categories in relation to a set of critical issues, including how these risk definitions match typical daily-life phrases about risk. The paper presents a set of constructed development paths for the risk concept and concludes that over the last 15–20 years we have seen a shift from rather narrow perspectives based on probabilities to ways of thinking which highlight events, consequences and uncertainties. However, some of the more narrow perspectives (like expected values and probability-based perspectives) are still strongly influencing the risk field, although arguments can be provided against their use. The implications of this situation for risk assessment and risk management are also discussed.
Global sensitivity analysis (SA) aims at quantifying the respective effects of input random variables (or combinations thereof) onto the variance of the response of a physical or mathematical model. Among the abundant literature on sensitivity measures, the Sobol’ indices have received much attention since they provide accurate information for most models. The paper introduces generalized polynomial chaos expansions (PCE) to build surrogate models that allow one to compute the Sobol’ indices as a post-processing of the PCE coefficients. Thus the computational cost of the sensitivity indices practically reduces to that of estimating the PCE coefficients. An original non intrusive regression-based approach is proposed, together with an experimental design of minimal size. Various application examples illustrate the approach, both from the field of global SA (i.e. well-known benchmark problems) and from the field of stochastic mechanics. The proposed method gives accurate results for various examples that involve up to eight input random variables, at a computational cost which is 2–3 orders of magnitude smaller than the traditional Monte Carlo-based evaluation of the Sobol’ indices.
Resilience is generally understood as the ability of an entity to recover from an external disruptive event. In the system domain, a formal definition and quantification of the concept of resilience has been elusive. This paper proposes generic metrics and formulae for quantifying system resilience. The discussions and graphical examples illustrate that the quantitative model is aligned with the fundamental concept of resilience. Based on the approach presented it is possible to analyze resilience as a time dependent function in the context of systems. The paper describes the metrics of network and system resilience, time for resilience and total cost of resilience. Also the paper describes the key parameters necessary to analyze system resilience such as the following: disruptive events, component restoration and overall resilience strategy. A road network example is used to demonstrate the applicability of the proposed resilience metrics and how these analyses form the basis for developing effective resilience design strategies. The metrics described are generic enough to be implemented in a variety of applications as long as appropriate figures-of-merit and the necessary system parameters, system decomposition and component parameters are defined. ► Propose a graphical model for the understanding of the resilience process. ► Mathematical description of resilience as a function of time. ► Identification of necessary concepts to define and evaluate network resilience. ► Development of cost and time to recovery metrics based on resilience formulation.
Disruptive events, whether malevolent attacks, natural disasters, manmade accidents, or common failures, can have significant widespread impacts when they lead to the failure of network components and ultimately the larger network itself. An important consideration in the behavior of a network following disruptive events is its resilience, or the ability of the network to “bounce back” to a desired performance state. Building on the extensive reliability engineering literature on measuring component importance, or the extent to which individual network components contribute to network reliability, this paper provides two resilience-based component importance measures. The two measures quantify the (i) potential adverse impact on system resilience from a disruption affecting link , and (ii) potential positive impact on system resilience when link cannot be disrupted, respectively. The resilience-based component importance measures, and an algorithm to perform stochastic ordering of network components due to the uncertain nature of network disruptions, are illustrated with a 20 node, 30 link network example.
The first recorded usage of the word reliability dates back to the 1800s, albeit referred to a person and not a technical system. Since then, the concept of reliability has become a pervasive attribute worth of both qualitative and quantitative connotations. In particular, the revolutionary social, cultural and technological changes that have occurred from the 1800s to the 2000s have contributed to the need for a rational framework and quantitative treatment of the reliability of engineered systems and plants. This has led to the rise of reliability engineering as a scientific discipline. In this paper, some considerations are shared with respect to a number of problems and challenges which researchers and practitioners in reliability engineering are facing when analyzing today's complex systems. The focus will be on the contribution of reliability to system safety and on its role within system risk analysis.
A probabilistic Physics of Failure-based framework for fatigue life prediction of aircraft gas turbine discs operating under uncertainty is developed. The framework incorporates the overall uncertainties appearing in a structural integrity assessment. A comprehensive uncertainty quantification (UQ) procedure is presented to quantify multiple types of uncertainty using multiplicative and additive UQ methods. In addition, the factors that contribute the most to the resulting output uncertainty are investigated and identified for uncertainty reduction in decision-making. A high prediction accuracy of the proposed framework is validated through a comparison of model predictions to the experimental results of GH4133 superalloy and full-scale tests of aero engine high-pressure turbine discs.
Applying reliability methods to a complex structure is often delicate for two main reasons. First, such a structure is fortunately designed with codified rules leading to a large safety margin which means that failure is a small probability event. Such a probability level is difficult to assess efficiently. Second, the structure mechanical behaviour is modelled numerically in an attempt to reproduce the real response and numerical model tends to be more and more time-demanding as its complexity is increased to improve accuracy and to consider particular mechanical behaviour. As a consequence, performing a large number of model computations cannot be considered in order to assess the failure probability. To overcome these issues, this paper proposes an original and easily implementable method called AK-IS for active learning and Kriging-based Importance Sampling. This new method is based on the AK-MCS algorithm previously published by Echard et al. [AK-MCS: an active learning reliability method combining Kriging and Monte Carlo simulation. Structural Safety 2011;33(2):145–54]. It associates the Kriging metamodel and its advantageous stochastic property with the Importance Sampling method to assess small failure probabilities. It enables the correction or validation of the FORM approximation with only a very few mechanical model computations. The efficiency of the method is, first, proved on two academic applications. It is then conducted for assessing the reliability of a challenging aerospace case study submitted to fatigue.
This paper deals with a proactive condition-based maintenance (CBM) considering both perfect and imperfect maintenance actions for a deteriorating system. Perfect maintenance actions restore completely the system to the ‘as good as new’ state. Their related cost are however often high. The first objective of the paper is to investigate the impacts of imperfect maintenance actions. In fact, both positive and negative impacts are considered. Positive impact means that the imperfect maintenance cost is usually low. Negative impact implies that (i) the imperfect maintenance restores a system to a state between good-as-new and bad-as-old and (ii) each imperfect preventive action may accelerate the speed of the system׳s deterioration process. The second objective of the paper is to propose an adaptive maintenance policy which can help to select optimally maintenance actions (perfect or imperfect actions), if needed, at each inspection time. Moreover, the time interval between two successive inspection points is determined according to a remaining useful life (RUL) based-inspection policy. To illustrate the use of the proposed maintenance policy, a numerical example finally is introduced.
Multi-objective formulations are realistic models for many complex engineering optimization problems. In many real-life problems, objectives under consideration conflict with each other, and optimizing a particular solution with respect to a single objective can result in unacceptable results with respect to the other objectives. A reasonable solution to a multi-objective problem is to investigate a set of solutions, each of which satisfies the objectives at an acceptable level without being dominated by any other solution. In this paper, an overview and tutorial is presented describing genetic algorithms (GA) developed specifically for problems with multiple objectives. They differ primarily from traditional GA by using specialized fitness functions and introducing methods to promote solution diversity.
This paper is to provide practical options for prognostics so that beginners can select appropriate methods for their fields of application. To achieve this goal, several popular algorithms are first reviewed in the data-driven and physics-based prognostics methods. Each algorithm’s attributes and pros and cons are analyzed in terms of model definition, model parameter estimation and ability to handle noise and bias in data. Fatigue crack growth examples are then used to illustrate the characteristics of different algorithms. In order to suggest a suitable algorithm, several studies are made based on the number of data sets, the level of noise and bias, availability of loading and physical models, and complexity of the damage growth behavior. Based on the study, it is concluded that the Gaussian process is easy and fast to implement, but works well only when the covariance function is properly defined. The neural network has the advantage in the case of large noise and complex models but only with many training data sets. The particle filter and Bayesian method are superior to the former methods because they are less affected by noise and model complexity, but work only when physical model and loading conditions are available.