Predictive analytics are essential for the operation of complex engineered systems

Predictive analytics is receiving an upgrade, thanks to the Garrick Institute for Risk Sciences. Ali Mosleh, the Director of the Institute at the University of California, tells us more.

Predictive analytics, in the broadest definition, is a branch of advanced analytics that uses available information and models to make predictions about future events. As such, it is forward-looking, using past events to anticipate the future, considering inherent uncertainties, and often using probability to codify such uncertainties. It has a long history going back at least four centuries when initial ideas on the formulation of the mathematical theories of chance events, statistics, and probability emerged. While originally, the development and application contexts were insurance and gambling, predictive analytics now play a central role in all branches of science.

The past 50 years have witnessed significant advances in various subdomains of predictive analytics, which are now firmly recognised as indispensable tools in the engineering of ultra-complex systems. Applications include system simulation for design, design optimisation under uncertainty, system control algorithms, system prognostics and health management, reliability assurance, risk control, digital twins, and autonomous operations and safety.

Predictive analytics methods for engineering applications include traditional statistical and probabilistic techniques, as well as more advanced data analytics such as machine learning and artificial intelligence (AI). In many cases, data analytics are used in conjunction with physical models, inductive and deductive logic models, and computer simulations. In the majority of cases, a primary objective is to predict possible trajectories or scenarios of system behaviour in time, covering both normal and anticipated or abnormal and unexpected.

Ultra-complex systems, however, pose formidable predictability challenges stemming from complexity in topological, functional, and behavioural features, as well as limitations in the data and knowledge needed to understand the complexity. Examples of complex technologies that have benefited from or heavily relied on predictive analytics include nuclear power, petrochemical industries, space systems, numerous consumer products, communication networks, and autonomous transportation systems.

The scale and scope of models and data needed to apply predictive analysis to such systems vary depending on the end use and the level of resolution needed for the engineering application. In some cases, multiscale modelling and integration of many different types of analytical and numerical techniques might be required, almost always with the aid of highly advanced computational techniques and platforms. The following are some examples of successful implementations in a few complex technologies.

Nuclear power safety and reliability

Predictive analytics in the form of Probabilistic Risk Analysis (PRA) and Reliability Analysis play a pivotal role in design, operation, and regulatory compliance in the nuclear industry. In fact, the nuclear industry was the birthplace of the modern model-based PRA. Several techniques have found important applications in operational aspects. Plant Availability and Capacity forecasting and planning, Reliability Centred Maintenance, online risk monitoring, and refuelling shutdown management are examples. The conventional PRA approach in the nuclear industry is based on two primary modelling techniques, namely Event Tree (ET) and Fault Tree (FT) methods, to predict risk scenarios, i.e., the spectrum of evidence (including expert opinion and engineering analysis) are used to estimate the probability of the scenarios.

More recently, powerful simulation-based PRA methods (aka Dynamic PRA) have been explored, and some have been implemented in computational platforms. Dynamic PRA methods significantly improve plant PRAs by providing rich contextual information and explicit consideration of feedback arising from complex equipment dependencies, plant physical process variables, operator actions, and control software. The Accident Dynamics Simulator (ADS) developed by researchers at UCLA Garrick Institute is one such dynamic method. ADS couples a plant thermal-hydraulic model with an operations crew cognitive model to simulate plant response and operator performance during potential nuclear power plant accidents. ADS generates a discrete dynamic event tree (DDET) of a huge number of scenarios based on hardware/software failures, plant thermal and hydraulic response, operator decisions and actions, and stochastically varying timing of events. In ADS, the experience and training of each crew operator are captured in a computer knowledge base model that includes the information needed to assess the plant state, execute procedural actions, and match memorised response actions to perceived plant needs.

Compared to more traditional risk assessment methods (using linked ET and FT), dynamic PRA offers several significant advantages. Dynamic simulation methods more explicitly represent the timing and sequencing of events, can directly calculate the impact of variations of hardware and operator performance on the plant state, and are capable of capturing complex interdependencies. This results in the generation of high-fidelity and more realistic accident evolution scenarios, their consequences, and corresponding probabilities. Simulation-based DPRA has the potential to be the basis of a human operator decision support system and can even function as a virtual operator, particularly in the emerging multi-unit small modular reactors.

Civil aviation system-wide safety

The civil aviation system is an extremely complex web of private and governmental organisations operating or regulating flights involving diverse types of aircraft, ground support, and other physical and organisational infrastructures.

In contrast with many other complex systems, the aviation system may be characterised as an ‘open’ system, as there are many dynamic interfaces with outside organisations, commercial entities, individuals, physical systems, and environments.

Aircraft manufacturers, airlines, airport authorities, and regulatory/oversight agencies such as the US Federal Aviation Administration (FAA) are increasingly relying on predictive analytics to manage complex design and operational decisions. These methods include traditional statistical methods, operations research, reliability analysis, advanced machine learning techniques, and risk-informed decision-making.

One of the most advanced capabilities developed by the UCLA Garrick Institute researchers for the FAA is the Integrated Risk Information System (IRIS), a software platform to help the agency in risk-informing its safety oversight. IRIS uses a wide range of predictive analytics, including a new generation of system modelling known as Hybrid Causal Logic (HCL) methodology. HCL provides a multi-layered capability to fully and realistically capture the effect of factors that directly or indirectly impact civil aviation safety. The main layers include:
• A model to define safety context. This is done using a technique known as the Event Sequence Diagram (ESD) method that helps define the kinds of accident and incident scenarios that aviation should be concerned with.
• A model to capture the behaviours of the physicalsystem (hardware, software, and environmental factors) as possible causes or contributing factors to accident and incident scenarios delineated by the ESDs. This is done using common system modelling techniques such as Fault Tree.
• A model to extend the causal chain of events to potential human and organisational roots. This is done using Bayesian Net (BN). BNs are particularly useful since they do not require complete knowledge of causes and effects.

The integrated model is, therefore, a hybrid causal model with the corresponding sets of analytical and computational procedures to quantify the event probabilities. IRIS offers a unifying framework for system safety assessment, hazard analysis, and risk analysis. As a causal model, it provides a vehicle for identifying cause-effect relations between various elements of the aviation system. Categories of causal factors include human activities (ground and flight crews, inspectors), organisational factors (airline management, FAA regulatory and oversight functions), hardware/software failures, and adverse conditions of the physical environment. IRIS software provides probabilistic answers to some of the most raised questions regarding aviation safety:

What is the current level of aviation safety?
What are the most important contributors to aviation risk and hazards?
What is the safety/risk impact of changing ‘x’ (e.g., introducing a new operating procedure)?
What are the likely causes of a given incident/accident?
How significant is a given ‘safety finding’ by inspectors?
What should we use as Safety Performance Indicators?
Is reducing the statistical rate of accidents (e.g., crash rates) the only way to know that we have improved safety? If not, how do we truly know that we have improved safety?

Fig. 2: Electric power network wildfire resilience management decision support platform

IRIS is:

A platform to answer the above questions, answers that are reproducible and supported by a broad base of shared knowledge.
A common platform for communication on safety and operational matters between the regulator (FAA) and the aviation industry.
A platform to support designing and monitoring risk-informed regulation and oversight.
A platform for communication of safety matters between the FAA and
industry.
Expandable in scope and depth to assess safety, security, and operational risks, identify and rank hazards, and analyse accident ‘precursors.’
A platform that helps to identify common root causes in support of accident investigation.
A platform that supports the identification and quantification of ‘safety performance indicators’.

Electric power network wildfire resilience assessment and management

Another application of modern predictive analytics is in assessing and improving the resilience of the complex electric power transmission and distribution network with respect to natural hazards such as wildfires. Wildfire events have been growing in frequency and intensity worldwide in recent years and not only threaten public safety but have recently resulted in billions of dollars in direct and indirect damages for single events. In the past few years, a number of fairly advanced predictive analytics have been developed and applied by the owners and operators of electric power networks in order to identify and assess the effectiveness of current and proposed preventive and mitigating technologies such as undergrounding powerlines, vegetation management around powerlines and substations, and smart selective public safety power shutoff (PSPS).

As a major advancement in providing wildfire risk management capability to network operators, UCLA Garrick Institute in collaboration with the Pacific Gas & Electric public utility company has developed an integrated predictive analytics platform based on the Hybrid Causal Logic approach, which, as mentioned earlier, is also used for aviation systems safety management. The web-based software platform is designed as a decision support system in three different modes:

Planning Mode for long-term risk management and decisions such as asset management strategies and prioritisation of wildfire risk mitigation options.
Operational Mode for continuous risk monitoring and decision support based on real-time or near real-time information (e.g., meteorological conditions) to alarm operators of the changing risk levels and provide input to action decisions such as proactive PSPS.
Event Mode for decision support during an active fire situation, dynamic updating of risks associated with fire propagation and supporting decisions on evacuation of the threatened communities.

The current scope is the assessment and management of risks due to wildfires caused by equipment failure. Predictive scenarios generated by the software are based models for predicting the behaviour of the natural system, understanding causes of equipment failure, analysing deterministic and stochastic behaviour of wildfires, and understanding complex planning and decisions to mitigate the risks.

The software dashboard for Mode 1 (see Fig. 2) provides ranked wildfire susceptibility of the individual powerlines and the aggregated system-level risks for the entire power network. It also provides the operators with a window into the causes and factors contributing to the risk and a set of quantitative measures of the consequences in the form of risk curves for public safety, financial loss, and duration of power loss to customers.

Predictive analytics at the heart of digital twins

The National Academy of Sciences defines a digital twin as: “A set of virtual information constructs that mimics the structure, context, and behaviour of a natural, engineered, or social system, is dynamically updated with data from its physical twin, has a predictive capability, and informs decisions that realise value.”¹ In pairing physical and digital twins, the physical system is equipped with sensors, data acquisition and data fusion capabilities. In contrast, the virtual twin possesses features such as modelling and simulation, AI, and first-principle mechanistic and empirical models. The two systems communicate (normally in real-time), with sensor data sent by the physical system to the digital twin and automated control and decisions flowing from the digital twin to the physical system. Predictive analytics are clearly essential to digital twin concepts for predicting the behaviour of the physical system. In fact, the same scenario generation and quantification capabilities discussed earlier could be used as the core of a digital twin’s decision support capabilities.

Challenges exist in computationally processing of large volumes of data, conducting large scale probabilistic simulation of system behaviour, and applying other predictive analytics techniques for complex systems. The magnitude of the challenge depends on the level of complexity of the engineered system and the required fidelity of the results. Efforts have been made to overcome these challenges with existing computational infrastructures, including parallel and cloud computing. In the long term, a possible solution could be found in quantum computation, emerging as a promising technology to address many of the most complex computational and system simulation challenges. Some key areas of focus include enhancing combinatorial optimisation solutions and accelerating sampling-based inferential approaches. This is also one of the areas of active research at UCLA Garrick Institute.

The UCLA Garrick Institute for the Risk Sciences (GIRS) is dedicated to providing methods and technology for assessing and managing risks to society for the purpose of saving lives, protecting the environment, and the overall betterment of society. Founded in 2014, the Institute is the umbrella organisation for risk, reliability, and resilience research and related educational activities at UCLA. It has over 80 core, adjunct, and affiliate faculty members with diverse expertise in engineering and scientific domains.

References

1. Committee on Foundational Research Gaps and Future Directions for Digital Twins et al., Foundational Research Gaps and Future Directions for Digital Twins. Washington, D.C.: National Academies Press, 2024, p. 26894. doi: 10.17226/26894

Please note, this article will also appear in the 18th edition of our quarterly publication.

Advanced predictive analytics: Centuries old, but essential to ensuring safe and intelligent operation of complex engineered systems