# Real-Time Multi-Rate Electromagnetic Transient Simulation on Conventional CPUs

P. Le-Huy, S. Guérette

*Abstract*—This paper presents a multi-rate simulation approach for Hydro-Québec's real-time simulation software (Hypersim). This approach does not rely on specialized hardware; it consists of scheduling differently conventional CPUs. The required modifications to enable multi-rate simulation are discussed. A three-rate application example is presented in which a wind power plant is simulated in great details while the related power transmission system is simulated with a bigger time step. This multi-rate simulation is carried out on a regular workstation, the latest RT hardware platform for Hypersim. These regular PCs offer splendid RT performance and are affordable. However, they are limited in their computing resources and offer no scalability. Nonetheless, combined with multi-rate simulation, this low-cost solution can be used for a vast number of cases.

*Keywords*: Electromagnetic transient, multi-rate, real-time, simulation, wind turbine generator.

## I. INTRODUCTION

TRADITIONAL electromagnetic transient (EMT) hardware-in-the-loop (HIL) studies are limited in scope by the computational burden and the availability of expensive real-time (RT) simulation hardware.

EMT simulation of power systems, as originally described in [1], was computationally expensive for yesterday's computers but it remains costly today because of the increased complexity of the simulations: complex power electronic apparatus, high fidelity modeling of non-linear phenomena, sophisticated control and protection systems, wide-area simulation, etc. In addition, several of these complex modeling or power electronic devices require high simulation bandwidth (small time step) compared to line frequency phenomena that are accurately represented with time step in the range of 50  $\mu$ s. So part of the simulated power system do not require such bandwidth. This additional burden is even more costly in RT as it involves finer partitioning to respect the RT constraint which then leads to a greater parallelization cost, both in terms of hardware resources and execution time [2].

This issue can be addressed in several ways.

• Raw computing power can be applied here if an EMT representation is required for the whole simulation.

However, RT performance with very small time steps is not trivial to achieve and it can require a lot of hardware resources [2].

- Multiple domains of simulation can be combined to represent different part of the simulation. By using simpler and faster modeling for certain parts of the simulation, it is possible to reduce the hardware resources required to simulate in RT a specific power system. Although, RT operation of EMT combined with other modeling is neither a trivial nor widespread solution [3]-[5].
- Multiple simulation rates better suited for each part of the simulation can be used. This multi-rate (MR) approach is very interesting and is typically used with specialized hardware to simulate high-frequency power electronic devices or devices requiring a lot of real-world IOs [6]-[7]. The specialized hardware typically operates as a black/gray box (internal observability, if possible, is limited) with a smaller time step than the related RT EMT simulation. MR use on conventional hardware is also possible in RT with smaller or bigger time steps [8]. This specific implementation imposes limitations to the rate difference, between 2 to 5 times the base rate of the simulation.

This paper presents the work to enhance Hydro-Québec's RT simulator in order to increase the availability of RT simulators and expend the scope of possible studies on a fixed amount of computational resources. It starts with a brief discussion on the latest evolution of the Hypersim RT hardware platform and its impact on the RT software architecture. The modifications required to implement MR RT simulation are then presented: the corner stone of the MR lies in the synchronization barrier but MR also affects all interthread/process communications. Section IV then presents an MR application example in which three different rates are used to simulate a simple power transmission system, a wind power plant and its control system. The simulation implementation, the RT simulation setup, the results and the RT performances are discussed. The paper closes with a few concluding remarks.

# II. HYPERSIM RT EMT SIMULATOR

Hydro-Québec's real-time simulator is a large-scale multiprocessor simulator used for power system studies as well as for the development, validation, tuning and commissioning of control systems [9]. The computational effort is automatically spread across available processors using

Philippe Le-Huy and Sylvain Guérette are Research Engineers with IREQ, Hydro-Québec's research institute (System Simulation and Evolution unit), 1800 Lionel-Boulet, Varennes, Québec, Canada, J3X 1S1 (e-mail of corresponding author: <u>le-huy.philippe@ireq.ca</u>).

Paper submitted to the International Conference on Power Systems Transients (IPST2019) in Perpignan, France June 17-20, 2019.

the natural propagation delay of the transmission lines. As a result, the large system impedance matrix is divided into several smaller submatrices which can be solved in parallel by many processors without introducing any error, thus drastically increasing the simulation speed [10].

The next two subsections give details on the latest evolution.

## A. Hardware Platform

In the late 90's, in order to achieve RT performance, several tens of computation units had to work together. Each of those units came with their own motherboard and IOs. As such, Hypersim had to be optimized for large-scale supercomputers, having several tens of motherboards, with all the difficulties associated with communication and synchronization. Originally, each of these motherboards only had a single socket to host a single-core CPU.

However, as CPU technology continues to advance, each socket can host a CPU with tens of computation cores and enough IOs to effectively communicate to its neighboring sockets and to the PCI bus to access real-world IOs or specialized coprocessors such as FPGAs or GPUs. This led to supercomputers with several hundred/thousands of computation cores. To cope with such number of computation cores, it had to be slightly modified [2].

However, such systems are expensive, even for utilities, and since a majority of simulation cases can be addressed with a relatively small number of cores, efforts were put into exploring low-cost and low-core count alternative hardware platforms for RT simulation.

It was found that it is possible to run in RT on regular single-socket workstations, such as the HP Z440 used for the application example of section IV, with impressive performances at the computational level as well as for communication and synchronization. Internally, these RT workstations are called RTPCs. Obviously, these workstations have no scalability if its computational resources are insufficient for the task at hand and they have limited PCI slots for real-world IOs or specialized co-processors.

Overall, the stellar performances, the availability and the low-cost make these workstations ideal for scale-limited RT studies.

# B. RT Software Architecture

The RTPC software architecture for RT simulation remains almost identical to the one used on supercomputers, with the exception of the main server. Supercomputers have a multiuser main server which allows several users to simulate on the available resources. On RTPCs, as simulation resources are rather limited, the main server only allows a single simulation to be active, as illustrated in Fig. 1.

It is important to note that like all other Hypersim hardware platform, the RTPCs do not host the graphic user interface or other applications. The simulation hardware platform only hosts the server and the simulation processes for maximum performances. The depicted workstation is another PC, which may be located anywhere as long as RPCs can be made to the different servers (main and simulation).



Fig. 1. RTPC software architecture for maximum RT thread isolation. The communication mechanisms (Remote Procedure Call (RPC), shared memory (shmem), Direct Memory Access (DMA) and real voltages and currents) are identified with different colors.

#### **III. RT MULTI-RATE SIMULATION**

Hypersim main activities during a time step are illustrated in Fig. 2. All operations requiring communication between the various threads or processes involved in the simulation had to be modified to operate in MR: exchange with the simulation server, simulation thread synchronization and inter-task communications. The required modifications are detailed in the following subsections but the overall structure of a time step remains unaltered. Additional details on the MR approach can be found in [5] as this mechanism was also used for RT EMT-transient stability co-simulation.



Fig. 2. Conceptual Hypersim simulation time step execution flow.

#### A. Synchronization Barrier

The synchronization barrier had to be modified to manage different synchronization rates as illustrated in Fig. 3. Both  $T_1$  and  $T_2$ , the time step for the first and second rate respectively, have to be integer multiple  $N_1$  and  $N_2$  of the base time step  $T_b$  and  $N_1$  and  $N_2$  must be integer multiples of themselves. For each rate, a distinct synchronization barrier is created with a defined set of cores to block. The global barrier, where all the simulating cores are synchronized, is associated to the slowest simulation rate,  $T_2$  in this case. The  $T_b$  barrier blocks only the cores operating at the base rate while the other barriers block all the cores operating at their specific rate and faster (i.e. the  $T_1$  barrier blocks both  $T_1$  and  $T_b$ ). To direct the cores to the

correct barrier, the required information was added to the simulation management flags of the synchronization core, which is required to be a base rate core to maintain correct service handling (e.g. monitoring, signal acquisition, disturbances/faults triggering and parameter change).



Fig. 3. Synchronization barriers for MR (for clarity, not all possible communication links are illustrated).

#### B. Inter-task Communications

Hypersim inter-task communication engine was also modified to take into account the different simulation rates. A standard double-buffer communication scheme is used for all communications. Each communication link is tagged with the rate of the slowest end. With that information, only the required communication links are updated at each base rate step, which reduces stress on the communication engine.

#### C. IOs

Signals from all operating rates can be assigned to IOs and the procedure is the same regardless of the simulation rate. No modifications were required to the IO system: IOs are serviced normally regardless of the read/write rate.

However, better execution performance is achieved if all the IOs related to an IO expansion chassis originate from the same time-rate domain.

### D. Simulation Services

The different simulation rates involved in the MR approach had to be taken into account in how the simulation server (Figs. 1 and 2) communicates with simulation thread. Minor modifications were required to most services, like extending timeouts, but signal acquisition and simulation termination required more adaptation.

For signal acquisition, it was necessary to implement a way to keep track of exactly when the acquisition was requested by the user in order to pad properly the signal acquisition buffers for thread operating at rates 1 and 2. This is necessary as base rate thread would start filling their acquisition buffers as soon as they received the acquisition request while slower threads might receive the acquisition order several time steps later. For the simulation termination, a synchronization mechanism was implemented to stop all threads at the barrier of the slowest rate. In both cases, the relevant information is passed in the simulation management flags.

### IV. APPLICATION EXAMPLE

A wind power plant (WPP), connected to a simple 7machine power system, is used to illustrate the MR approach as shown in Fig. 7. The base time step for this simulation is 5  $\mu$ s while the other time steps are 40 and 80  $\mu$ s.

# A. Wind Power Plant

Based on the type-4 model presented in [11], the 200 MW WPP is represented by a single wind turbine generator (WTG) with an aggregated collector system (see Fig. 7). The converters (boost, chopper and inverter), the low-level control (pulse generation and protective features), the filters, the collector system and the step-up transformer are all simulated at the base rate (5  $\mu$ s). The mechanical parts, the synchronous generator and the high-level control system are simulated with a time step of 40  $\mu$ s. Table I contains the main WPP parameters.

#### B. Power System

With a total generating capability of 29.2 GW, excluding the WPP contribution, the simulated power system supplies a 24.5 GW load with a 735 kV transport system (only major loads are represented in Fig. 7). The seven generating units represent hydraulic power generation, with all the associated control systems including power system stabilizers. Load 1 is divided in two parts (constant impedance and dynamic load) while Load 2 is only constant impedance. A three-phase fault is placed at the point of common coupling (PCC) with the WPP. This power system is simulated at 80 µs and its main characteristics are presented in Table I.

# C. Complete System Implementation

The simulated system implementation is also illustrated in Fig. 7: CPU #1 handles the base rate simulation (5  $\mu$ s) while CPUs # 2 and 3 simulate at 40 and 80  $\mu$ s respectively.

As described previously, the low-level control is simulated at base rate in order to provide sufficient time resolution for the PWM pulse generator. As the WTG converter operates at 2 kHz, the base rate provides a 1% time resolution for the PWM, which is adequate in this case. Furthermore, the base rate simulation requires iterations for numerical stability: without iterations, the high number of naturally-switched devices (i.e. diodes) can adopt an incorrect state and produce very large uncharacteristic voltage and current spikes that rapidly lead to divergence. The maximum number of iteration was fixed at four in this case to respect the RT constraint: left unlimited, this case uses at most six iterations but it would not respect the RT constraint when also servicing real-world IOs. The impact of limiting to four the maximum number of iterations is really subtle and unnoticeable to the naked eye: the WTG behavior is unaltered and the protection schemes are unaffected as is the overall system response. However, it liberates around 1.4  $\mu$ s to service the real-world IOs.

Simulating the control system and the synchronous machine in the WTG at the base rate is unnecessary as their time constants and their behavior is accurately represented with a bigger time step such as 40  $\mu$ s. The same applies for the rest of the power system, simulated at 80  $\mu$ s. Both of these simulation tasks are interfaced with the base rate simulation task through a classical ideal transformer decoupling scheme with delay compensation. Iteration is not necessary for CPUs #2 and #3.

Execution times for all CPUs are presented in Table II as well as other characteristics to help understand each CPU's workload.

TABLE I

|              | POWER SYSTEM PARAMETERS |                                                            |  |  |  |  |  |  |
|--------------|-------------------------|------------------------------------------------------------|--|--|--|--|--|--|
|              | Parameters Values       |                                                            |  |  |  |  |  |  |
| WPP          | Station transformer     | 735 kV / 25 kV, 220 MVA                                    |  |  |  |  |  |  |
|              |                         | R = .00266 pu, L = 0.08 pu                                 |  |  |  |  |  |  |
|              | Equivalent collector    | Req = 0.016 pu, Xeq = 0.059 pu                             |  |  |  |  |  |  |
|              | system                  | Beq = 0.032 pu                                             |  |  |  |  |  |  |
|              | Wind turbine            | 25 kV / 575 V, 220 MVA                                     |  |  |  |  |  |  |
|              | transformer             | Rc = .0016  pu, Lc = 0.05  pu                              |  |  |  |  |  |  |
|              |                         | Cfilter = $15 \text{ MVar} (Q = 50)$                       |  |  |  |  |  |  |
|              | Converter               | Protection resistance : $Rp = 6 m\Omega$                   |  |  |  |  |  |  |
|              |                         | DC bus capacitor : $C = 9 F$                               |  |  |  |  |  |  |
|              |                         | Boost inductance: $L = 1.2 \ \mu H \ (R = 5 \ \mu \Omega)$ |  |  |  |  |  |  |
|              | SG                      | 730 V, 220 MVA                                             |  |  |  |  |  |  |
| Power system | Total generation        | 29.2 GW                                                    |  |  |  |  |  |  |
|              | Total load              | 24.5 GW                                                    |  |  |  |  |  |  |
|              |                         | (18 GW constant Z, 6.5 GW dynamic)                         |  |  |  |  |  |  |
|              | Voltage levels          | Generation : 13.8 kV                                       |  |  |  |  |  |  |
|              |                         | Transport : 735 kV                                         |  |  |  |  |  |  |
|              |                         | Loads : 25 kV                                              |  |  |  |  |  |  |
|              | Total line length       | 4060 km                                                    |  |  |  |  |  |  |
|              | Series compensation     | 15-40 %                                                    |  |  |  |  |  |  |
|              |                         |                                                            |  |  |  |  |  |  |

## D. Simulation Setup

The application example was simulated on the setup pictured in Fig. 4. Instead of the usual SGI (now HPE) supercomputer (e.g. [2], [5], [9]-[11]), the simulation was done on an HP Z440 workstation equipped with an Intel Xeon E5-1650v3 (6 cores, 3.5 GHz) CPU (RTPC02 in Fig. 4). A One Stop Systems x4 adapter card is used in RTPC02 to connect an Opal-RT OP5607 IO expansion chassis for real-world IOs. A Tektronix scope completes the simulation setup.

## E. RT Results and Performances

The system response to a three-cycle three-phase fault at the PCC on the power system side is illustrated on Figs. 5 and 6. During the fault, as voltage drops to zero at the PCC, power transfer to the power system drops to zero as well: the WTG cannot output the collected wind energy therefore the internal DC voltage starts to rise. As it crosses the safe operating threshold, the DC voltage protection scheme is activated to shed the excess energy into the chopper resistor. The pitch of the blades is reduced to limit the amount of wind energy harvested by the WTG. When the fault is cleared, power transfer is restored and the WTG operation returns to normal.

Table II contains the content of each simulation task and performance metrics. The IOs, communication between the simulation thread and the simulation server and the synchronization consume a little more than 1  $\mu$ s for each of the three CPUs. For the base rate CPU, this is obviously critical and limits the maximum number of iterations to four for RT operation. For the 40 and 80  $\mu$ s rate, it is of no consequence as CPU #2 is very lightly loaded and CPU #3 has more than 7  $\mu$ s to spare.

A regular single-rate RT simulation of this system at 5  $\mu$ s would not be trivial to accomplish as it would require more computational resources (a multi-socket hardware platform would be required) and synchronization and communication costs would increase accordingly. So for applications with small time steps, this conventional CPU MR approach is quite promising.



Fig. 4. Simulation setup for the application example: RTPC02 (simulating) and RTPC03 (used for another project) (HP Z440 workstation, Intel Xeon E5-

1650v3 (6 cores, 3.5 GHz); Opal-RT OP5607 IO expansion chassis and a Tektronix TPS 2014 scope.



Fig. 5. Application of a three-phase fault (3-cycle duration) at the PCC (CPU #3). Left: PCC voltages, currents and power (CPU #1, kV, A, MW, negative power is sent to power system); right: synchronous generator M1 (CPU #3) and WTG synchronous generator (CPU #2) speed (p.u.); WTG internal DC voltage (CPU #1, V) and WTG blade pitch angle (CPU #2, degrees).



Fig. 6. Application of a three-phase fault (3-cycle duration) at the PCC (CPU #3). CH1 and CH2: PCC voltage and current phase A (CPU #1, 400 kV/div, 100 A/div)); CH3: WTG internal DC voltage (CPU #1, 400 V/div) and CH3: synchronous generator M1 speed (CPU #3, DC offset removed, 0.005 p.u./div).

From these results, the computational power of the RT PC, here a Z440, a low-cost, entry-level workstation from HP, is obvious. It's a good example of how the RT PCs can tackle a demanding but scope-limited simulation. The MR approach helps broaden that scope by allowing different part of the simulation to operate at different rates without compromising thread synchronization and communications, real-world IOs and simulation services.

# V. CONCLUSIONS

This paper explained the MR approach available in Hypersim and illustrated its use with an application example consisting of an aggregated WPP connected to a simple power transmission system. With the multi-barrier synchronization scheme, different parts of the simulation truly operate at different rate i.e. they are not executed once per  $N_1$  or  $N_2$  steps. These simulation tasks can work without interruption for the whole duration of their time step,  $T_1$  or  $T_2$ .

The latest work on Hypersim hardware platform was also discussed. In order to address internal needs for affordable and compact RT simulation setups, the use of conventional PCs was explored and it provides powerful yet limited RT platforms. However, combined to the MR approach, these resource-limited PCs have the potential to handle a vast array of simulation cases, in the same manner as illustrated by the application example.

### VI. REFERENCES

- H. W. Dommel, "Digital Computer Solution of Electromagnetic Transients in Single and Multiphase Networks," IEEE Trans. Power Apparatus and Systems, vol. PAS-88, no. 4, pp.388-399, 1969.
- [2] P. Le-Huy, M. Woodacre, S. Guérette, É. Lemieux. "Massively Parallel Real-Time Simulation of Very-Large-Scale Power Systems" IPST'17, Seoul, Republic of Korea, June 26-29, 2017.
- [3] Jalili-Marandi, V., Dinavahi, V., Strunz, K., J. A. Martinez, A. Ramirez, "Interfacing techniques for transient stability and electromagnetic transient programs IEEE task force on interfacing techniques for simulation tools," *IEEE Trans. Power Del.*, vol. 24, no. 4, pp. 2385-2395, Oct. 2009.
- [4] Zhang, Y., Gole, A. M., Wu, W., Zhang, B. Sun, H., "Development and analysis of applicability of a hybrid transient simulation platform combining TSA and EMT elements," *IEEE Trans. Power Syst.*, vol. 28, no. 1, pp. 357-366, Feb. 2013.
- [5] P. Le-Huy, G. Sybille, P. Giroux, L. Loud, J. Huang, I. Kamwa, "Real-Time Electromagnetic Transient and Transient Stability Co-Simulation Based on Hybrid Line Modelling," *IET Gen., Trans. & Dist.*, vol. 11, no. 12, pp. 2983-2990, Sept. 2017.
- [6] Y. Chen, V. Dinavahi, "Multi-FPGA digital hardware design for detailed large-scale real-time electromagnetic transient simulation of power systems," *IET Gener. Transm. Distrib.*, vol. 7, no. 5, pp. 451-463, 2013.
- [7] J. K. Debnath, A. M. Gole, W.-K. Fung, "Graphics-Processing-Unit-Based Acceleration of Electromagnetic Transients Simulation," *IEEE Trans. Power Delivery*, vol. 31, no. 5, pp. 2036-2044, 2016.
- [8] RTDS.com, RTDS Superstep, 2018. [Online]. Available: <u>https://www.rtds.com/superstep/</u>. [Accessed : 31- Oct- 2018].
- [9] V. Q. Do, J.-C. Soumagne, G. Sybille, G. Turmel, P. Giroux, G. Cloutier, S. Poulin. "Hypersim, an Integrated Real-Time Simulator for Power Networks and Control Systems" ICDS'99, Vasteras, Sweden, May 25-28, 1999.
- [10] D. Paré, G. Turmel, J.-C. Soumagne, V. A. Do, S. Casoria, M. Bissonnette, B. Marcoux, D. McNabb. "Validation tests of the Hypersim digital real time simulator with a large AC-DC network" IPST'03, New Orleans, USA, Sept. 28 Oct. 2, 2003.
- [11] O. Tremblay, R. Gagnon, M. Fecteau. "Real-Time Simulation of a Fully Detailed Type-IV Wind Turbine" IPST'13, Vancouver, Canada, July 18-20, 2013.



Fig. 7. MR application example: a WPP (200 MW WTG with aggregated collector system) running at a base rate of 5 µs and controlled at 40 µs is connected to a simple 7-machine power system simulated with an 80 µs time step. Load 1 contains a constant impedance load and a dynamic load. Load transformers are not illustrated.

|   | Nodes                                             | Transformers | Passives | Sources | Switches | Comms<br>in / out | Execution time<br>(µs)                                         | IOs<br>(µs) | Synch.<br>(µs) | Services<br>(µs) |
|---|---------------------------------------------------|--------------|----------|---------|----------|-------------------|----------------------------------------------------------------|-------------|----------------|------------------|
| 1 | 37                                                | 9            | 46       | 6       | 15       | 8 / 17            | 3.7 (4 iterations)<br>4.5 (5 iterations)<br>5.1 (6 iterations) | 0.4         | 0.4            | 0.1              |
| 2 | WTG control system and synchronous generator      |              |          |         |          | 14 / 5            | 1.9                                                            | 0.4         | 0.4            | 0.4              |
| 3 | Power system<br>132 states, 238 nodes, 8 machines |              |          |         |          |                   | 72.4                                                           |             |                |                  |

TABLE II PT Simul Ation Performance