ANALYSE -- Learning to Attack Cyber-Physical Energy Systems With Intelligent Agents

The ongoing penetration of energy systems with information and communications technology (ICT) and the introduction of new markets increase the potential for malicious or profit-driven attacks that endanger system stability. To ensure security-of-supply, it is necessary to analyze such attacks and their underlying vulnerabilities, to develop countermeasures and improve system design. We propose ANALYSE, a machine-learning-based software suite to let learning agents autonomously find attacks in cyber-physical energy systems, consisting of the power system, ICT, and energy markets. ANALYSE is a modular, configurable, and self-documenting framework designed to find yet unknown attack types and to reproduce many known attack strategies in cyber-physical energy systems from the scientific literature.


Motivation and Significance
Energy systems worldwide are evolving into increasingly complex systems due to an increasing amount of information and communications technology (ICT) added for monitoring and controlling, as well as the inclusion of energy markets.This results in faster time scales, smart controllable loads, and more automation and intelligence in the system.Despite the intended effects of controllability, adaptability, and cost efficiency of these digitalized systems, the resulting cyber-physical energy systems (CPES) exhibit new challenges for system operators with their increased complexity [1].While system complexity increases, grid operators are under increasing economic pressure to maintain grid stability at minimal costs and investments.It must be ensured that no shortcuts are taken in terms of robustness.
Meanwhile, energy systems increasingly become targets for cyber attacks, terrorism, warfare, and other unwanted interventions from adversarial players.Recent examples are the ongoing attacks on the Ukrainian power system since 2017 and on the Nordstream gas pipelines in 2022.Primarily due to the increasing complexity and interleaving of these interconnected systems, we expect an increasing potential for such attacks [2], with potentially catastrophic consequences.Besides attacks with malicious intentions, another possible motivation is to maximize profit by exploiting the flawed design of energy markets [3].However, the potential consequences can be the same, i.e., destabilizing interconnected energy systems.
Therefore, an important research task is to develop methodologies and tools to identify systemic vulnerabilities, potential attack vectors, and unwanted destabilization incentives in existing and future CPES.This way, preventive measures can be taken to remove vulnerabilities or to deploy countermeasures.One major issue is that the still unknown attack vectors are the most relevant ones because no defense mechanism exists for them, similar to zero-day attacks in cyber security.While a growing body of literature investigates attacks on power systems, mainly rather predictable attacks or variants of known strategies are considered.However, given the systems-of-systems characteristics of CPES, future attacks can exploit the system's increasing complexity and interleaving, making them impossible to foresee at design time with traditional approaches.
One emerging approach to search for unknown attack possibilities is to place learning attacker agents within the system, e.g.[3,4,5].In literature, mostly deep reinforcement learning (DRL) agents are trained to maximize damage in the system, for example, by creating a blackout [4].In reinforcement learning (RL), an agent uses trial-and-error to find actions in an environment that maximize a reward function [6].Consequently, learning attacker agents will autonomously develop strategies to attack the system if such strategies are possible and rewarded during training.From a research perspective, this allows us to extract new and unknown attack vectors and to stress-test a given CPES in simulation.This is a prerequisite to developing defensive strategies in the future.However, the learning agent can only utilize the degrees of freedom and information given.This is especially relevant in research, where often over-simplified scenarios are considered.For example, in current research, the different layers of a CPES, i.e., physical component layer, ICT layer, and function layer are typically modeled and evaluated separately.However, a too-limited action space may restrict the agent's ability to find new strategies, resulting in a false sense of security if no attack possibilities are found.
An artificial intelligence does not necessarily require a simplified system.Especially DRL algorithms demonstrated how they can handle highly complex tasks, e.g., mastering the game of Go [7].Therefore, one next step in this research area is to move from simplified scenarios to more realistic and interconnected ones, yielding more realistic attack strategies.Applied to the energy system, ICT, energy grids, and markets should be investigated more holistically as a CPES to consider interdependencies.
In the following, we present ANALYSE (ANAlyzing compLex cYber-physical Systems wEaknesses), a tool-suite to analyze coupled power grids, ICT, and market systems with learning agents for vulnerabilities and potential attack strategies.The main features of this software artifact are as follows: • ANALYSE is the first open-source tool-suite that combines power grid, energy market, and ICT infrastructure into one coupled simulation to analyze them regarding vulnerabilities.
• It allows to place one or multiple learning (or non-learning) agents into the system that have access to sensors and actuators in all three domains.
• ANALYSE is modular, configurable, self-documenting, and provides advanced data logging.
Since ANALYSE utilizes a co-simulation framework, it is easily expandable, for example if alternative or additional domains need to be considered.This way, ANALYSE is designed to reproduce known attack strategies and scenarios from literature or to find novel more complex attack vectors over multiple domains.
In section 2, we provide an overview of ANALYSE and its components.Further, we present its key functionalities.In section 3, we provide a short illustrative example how the tool-suite works.In section 4, we discuss the potential impact of ANALYSE before we conclude our work with section 5.
2 Software Description: ANALYSE ANALYSE describes the concept of multiple frameworks and modular simulators that work together as a tool-suite to analyze agent misbehavior in CPES.We have developed multiple open-source tools that fulfill different sub-tasks towards that general goal but can also be used independently.This way, we achieve the best possible re-usability of the different software parts and a highly modular software architecture.In the following, we will present the interplay of the ANALYSE components, the details of each component, and the overall functionalities.

Software Architecture
The architecture and interplay of ANALYSE and its components are shown in Figure 1, which we describe in the following section from top to bottom.The goal is to analyze attack scenarios in CPES.The scenarios are defined in a declarative way by YAML1 files and are the starting points of ANALYSE experiments.They define which power grid to use, how many agents act in that environment, their objectives, etc.The first step for an experiment is arsenAI, which translates the scenario file into multiple experiment runs by applying techniques from design-of-experiments (DoE).
The experiment runs are performed by palaestrAI, an RL framework that allows the training of one or multiple agents in an RL environment.To build modular RL environments, we developed MIDAS, which assembles a single environment from multiple simulators -in this case, power system simulator, market simulator, and ICT simulator -by using the co-simulation framework mosaik.During the whole simulation and training, all relevant data is logged into a database.These data can later be analyzed regarding system vulnerabilities, noteworthy emerging agent strategies, potential countermeasures, etc.The different components will be presented in more detail in the following sections.

Mosaik and MIDAS
The open-source co-simulation framework mosaik 2 is, on a technical level, responsible for data exchange and synchronization of the different simulators [8].Mosaik follows a concept of so-called simulators and models.A model can be anything, from a simple function to a complex simulation environment, while a simulator implements the mosaik API for that model.A simulator can manage several models.To allow different models to exchange data, they must be connected via mosaik.The synchronization is done using individual but fixed time intervals for each simulator.
The more simulators and models are involved, the more complex the orchestration process of starting and connecting models and managing data flows becomes.Therefore, we developed the open-source framework MIDAS 3 , which handles the assembly of mosaik scenarios and comes with some pre-configured smart grid simulators and scenarios.MIDAS also enables aggregating all mosaik simulators into an environment for RL agents, defining actions, observations, and rewards.

Power System Simulator
ANALYSE uses the power grid simulator pandapower 4 to simulate a power grid with its topology and power flows [9].To achieve more realistic behavior, we also added simulators for publicly available load profiles and a simulator for a photovoltaic (PV) model that takes real weather data as input.Since the PV models depend on environmental conditions, we added a simulator for publicly available weather data as well.Although all those simulators are individual components of the mosaik scenario, we comprise them as the power system simulator in the following for simplicity.

Reactive Power Market Simulator
As a market system, we added a local reactive power market: In this type of markets, the grid operator procures reactive power from generators and other energy resources in the local system, mainly to perform voltage control [10].Reactive power markets are still under active research and were shown to be susceptible to profit-oriented attacks [3,11] and the exercise of market power [10,12].We chose to consider a reactive power market because their local nature allows having clear system boundaries, in contrast to, e.g., the wholesale energy market [3].Most reactive power market models from literature are based on solving an optimal power flow (OPF), i.e., the grid operator solves an optimization problem to determine the optimal reactive power procurement [10].That is problematic for DRL since often thousands and millions of iterations are required.To reduce computational effort, we implemented a non-OPF-based market, where the grid operator accepts the cheapest and most useful reactive power offers to perform voltage control in the system.
To achieve competition, we also implemented some basic non-learning agents participating in the market.

rettij ICT Simulator
The rettij 5 ICT simulator has been developed to support CPES research regarding ICT security [13].It provides a simpleto-use yet scalable network simulator that can be combined with co-simulation frameworks like mosaik.rettij makes use of the Linux network stack, as rettij's network components (i.e., switch, router, hosts) are based on containerization technology and use Kubernetes as an orchestrator.The network simulator's architecture allows to simulate realistic network traffic without the noise of management traffic that often occurs in plain container communication.rettij does not aim to simulate certain network technologies such as cellular or WiFi but chooses a more general approach where a normal media access control (MAC)-layer is simulated.However, the network channels can be throttled and delays and packet losses can be introduced to mimic such communication technologies.Network topologies are defined declaratively by YAML configuration files.It is possible to use any custom components within the simulation as long as it can run in a docker container.For ANALYSE, we implemented interfaces that allow for interaction between rettij and the agents to control and monitor parts of the network infrastructure.Sensors provide the ability to read network states, such as the utilization of network interfaces, while actuators offer the possibility to manipulate data in the network or restart targeted nodes.A more in-depth description of rettij and a standalone version of a co-simulation usage example can be found in [13,14].

palaestrAI: Learning Agents Framework
palaestrAI is a framework to train and evaluate learning agent systems.The difference to existing DRL frameworks is its clear separation of agents, environment, and experimental framework.palaestrAI focuses on a reliable execution of experiments with learning agents in complex environments, including co-simulations.It allows the implementation of various learning agents (DRL, neuroevolution, etc.) and a variety of environments.
The palaestrAI subpackage arsenAI parses experiment documents and uses statistical means to generate several experiment run definitions with specific parameter combinations defined in the experiment document.Each experiment run is reproducible; feeding the same experiment run document to different palaestrAI instances will produce the same experimentation result each time, which helps countering the AI reproducibility crisis [15,16,17].The key design goal of palaestrAI was to facilitate complex experiments, where agents can act in different, co-simulated simulators that form an RL environment while learning from the experiences gathered from all simulators.

Software Functionalities
The core of ANALYSE is the co-simulation ability gained from the mosaik interface.It interfaces to separate simulators for power grid, market, and ICT.Through palaestrAI's approach, these three simulators can be treated as one large RL environment.For example, an agent can observe the power grid's current state and decide to bid on the market based on this observation.The bid is then communicated to the market by ICT components.The goal is to simulate potential attack vectors for analysis, such as manipulating the power grid through malicious assets, shutting an actor out of the market with denial-of-service attacks (DoS), or gaming the market through an agent that controls multiple assets.More importantly, ANALYSE allows learning agents to discover weaknesses that emerge through the interconnection of three complex systems.
To investigate and analyze a variety of potential attack scenarios, we emphasize the following key features: Co-Simulation and Modularity ANALYSE uses mosaik to aggregate multiple independently developed simulators to create a single RL environment.This results in a modular tool where simulators can be added, removed, or exchanged easily.For example, a simulator of the gas system or the balancing power market could be added.
Self-Documentation with Run Files Inherited from palaestrAI, experiments in ANALYSE are defined with run files.Run files are YAML files that define which RL algorithm, environment, actuators, sensors, objectives, etc. are used for the experiment.This improves documentation of large-scale experimentation because the exact experiment definition for every experiment is always automatically stored.Further, this allows us to investigate diverse variants of one environment.Usually, in RL research, the environment is a fixed benchmark environment.However, in applied RL, we often want to investigate multiple variants of the environment, e.g., to explore the consequences of different design decisions, like controller design, market design, etc.

Logging and Evaluation
To evaluate experiments, we designed a custom vulnerability-analyzer component.Its core concept stems from cyber-security, where log data from different sources is collected, harmonized, and shipped to a security information and event management (SIEM).The SIEM allows analyzing the data and correlations of events.To easier collect logs, the ANALYSE components create event logs and simulation status logs.Those logs are shipped to an Elasticsearch database and can be visualized with Elastic's Kibana interface.This typical Elastic stack 6 architecture allows efficient thread hunting [18] for ICT-security and is here used to approach experiment evaluation analysis.This flexible tool chain enables integrating more data towards a big data platform to detect interdependent attacks.

Illustrative Example
The following example demonstrates the capabilities of our software tool-suite ANALYSE. Figure 2 shows a simple scenario with a small power grid, a local reactive power market, and the underlying ICT.The power grid consists of a 4-bus network with two PV-panels at bus 3 and 4, respectively, managed by four agents.Each agent can offer the flexibility of its PV on the market.The market operator communicates with the market agents by using an ICT network.To simplify the ICT network, we have connected them to one switch.The ICT network is also used by the market operator to communicate to the PVs how much reactive power they should deliver, i.e., how much power was procured on the market.Such scenarios can be created in YAML files.Figure 3 shows configuration snippets for our example.The core element is the schedule, which defines the different phases of the scenario, e.g., training and testing of the learning agents.For each phase, configuration parameters for environments and agents need to be provided.For the environment, this includes the configuration for the co-simulation scenario with MIDAS/mosaik.The behavior of an palaestrAI agent is defined by brain and muscle, which together form an agent, objective and the available sensors and actuators.uid: "minimal example" schedule: -

Impact
The impact of ANALYSE can be derived directly from its features and the co-developed frameworks.As mentioned before, most research in the field focuses on single domains and sub-systems.The coupling of power systems, markets, ICT, and potentially other domains enables researchers to look at new research questions that focus on interconnected multi-domain systems and their interrelations.For example, ANALYSE has the potential to investigate research questions like "How much can profit on the energy market be increased when we apply false data injection at point xy in the system?" or "How much of the ICT needs to be compromised by an attacker to create a blackout?".Normally, such research questions would require extensive implementation and modeling.With ANALYSE, the co-simulation only needs to be supplemented by the required simulators and models with little implementation effort, if they do not exist already.
Second, we can place learning agents at almost arbitrary places in the system.By defining their reward function as maximizing damage, we have a general tool to investigate potential attack strategies in CPES.Similarly, it is possible to find market exploits by defining market profit as reward [3].However, the opposite is imaginable too.By defining the reward function as maximizing, e.g., system stability or welfare, we can determine possible defensive strategies like controller settings by learning them.Therefore, it is also possible to place attacker and defender agents into the system to investigate their interrelations, which is the adversarial resilience learning (ARL) approach [19].However, the main focus of ANALYSE is the attack and vulnerability analysis.
Besides enabling new research questions, ANALYSE will also prove helpful for existing research directions.Again, modularity and configurability are the key functionalities: For example, Wolgast et al. [3] investigate attack scenarios in a coupled energy and reactive power market system.With ANALYSE, it would be possible to reproduce the research by removing the ICT simulator and defining the RL actuators, sensors, and reward according to the objective in the paper.Chen et al. [20] apply false data injection attacks to attack automatic voltage control in the power network.Their work could be reproduced by removing the market simulation and again defining the RL problem accordingly.In conclusion, ANALYSE builds a foundation to reproduce existing research in the field, investigate different variants of these scenarios, or increase their complexity.ANALYSE is still newly published.However, we believe it can provide an important contribution to investigating unwanted attack scenarios in CPES and to deriving potential countermeasures.

Conclusions
ANALYSE is a co-simulation-based tool-suite to use DRL to find and analyze attack strategies in cyber-physical energy systems.Its current version combines a power system, ICT, and an energy market implementation.We designed ANALYSE to be modular, configurable, and self-documenting to allow for the investigation of diverse research questions.This way, ANALYSE is designed to reproduce a broad range of existing research and to find new attack strategies in interconnected multi-domain systems.
Currently, ANALYSE focuses on a single learning attacker agent, but the long-term idea is to allow for multi-agent systems, add defender agents, analyze their interplay, and improve system design based on the findings.

Conflict of Interest
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Figure 2 :
Figure 2: ANALYSE: Illustrative example consisting of a power grid, a local market, market participants, the communication network, and the attacker agent.