# FPGA-based Simulation of Grid-tied Converters using Frequency-dependent Network Equivalent Fahimeh Hajizadeh, Alireza Masoom, Tarek Ould-Bachir, Jean-Pierre David Abstract—This paper introduces a real-time simulation grid-tied framework for converters, implemented field-programmable gate arrays (FPGAs). The framework incorporates a Frequency-Dependent Network Equivalent (FDNE) to reduce the original part of the circuit that is not directly under study into a frequency-dependent admittance model, enabling precise modeling of the power network's frequency-dependent dynamics while streamlining the onboard simulation and modeling process. The proposed framework is implemented on the Alveo U280 FPGA, achieving sub-microsecond latencies, low resource utilization, and high computational fidelity across various data types, including single-, double-precision, and customized floating-point formats. The numerical test and validation were conducted using a high-voltage power network that includes detailed models of transmission lines, loads, and a Static Synchronous Compensator (STATCOM), etc. Simulation results show strong alignment with reference models developed in the EMTP, achieving faster-than-real-time performance. These findings demonstrate the effectiveness of the proposed solution in delivering high-speed, resource-efficient, and scalable real-time simulations, providing a promising approach for testing and validating advanced control strategies in modern power systems. Keywords—FPGA, FDNE, Real-time simulation, Electromagnetic simulation, STATCOM # I. INTRODUCTION THE integration of power converters in modern power systems has become essential to accommodate the growing adoption of distributed energy resources (DER) such as wind turbines, solar photovoltaics and charging stations for electric vehicles [1]. Power converters are essential for connecting renewable energy sources to the grid, facilitating efficient energy transfer, and improving grid stability through active and reactive power management [2], [3]. These converters connect various renewable energy sources to the power grid, facilitating cleaner energy consumption and addressing intermittency and fluctuating load demands [4], [5], [6]. However, the extensive use of power electronic devices, This research was funded in part by a collaborative research and development grant from CRIAQ/NSERC, in partnership with the industrial collaborators Bombardier Aviation, Pratt & Whitney Canada Inc., OPAL-RT, and IDS North America Ltd. F. Hajizadeh is with the Department of Electrical Engineering, Polytechnique Montréal, QC, H3T 0A3 Canada (e-mail: fahimeh.hajizadeh@polymtl.ca). A. Massom is with Hydro-Québec Research Institute (IREQ), Varennes, QC, J3X 1S1 Canada (e-mail: masoom.alireza@hydroquebec.com). T. Ould-Bachir is with the MOTCE Laboratory, DGIGL, Polytechnique Montréal, QC, H3T 0A3 Canada (e-mail: t.ould-bachir@polymtl.ca). J. P. David is with the Department of Electrical Engineering, Polytechnique Montréal, QC, H3T 0A3 Canada (e-mail: jean-pierre.david@polymtl.ca). Paper submitted to the International Conference on Power Systems Transients (IPST2025) in Guadalajara, Mexico, June 8-12, 2025. especially in grids with many DERs, poses distinct challenges to grid stability. This situation requires advanced control solutions to address potential power quality issues. However, using power converters, such as voltage source converters (VSCs) and other inverter-based resources, presents challenges for grid stability. These devices can interact with conventional synchronous machines and other grid elements, causing oscillations, higher frequency change rates, and frequency overshoots [7], [8]. Moreover, control strategies to manage the dynamic performance of these converters must be meticulously designed to minimize harmonic distortion and maintain unity power factor for improved grid resilience [9]. As the use of such devices grows, it is essential to establish effective control methodologies to manage interactions with traditional grid infrastructure. This paper presents a simulation framework for grid-connected converters to address these challenges, designed using field-programmable gate arrays (FPGAs). The framework integrates a Frequency-Dependent Network Equivalent (FDNE) model to accurately represent the frequency-dependent behavior of the grid. The framework achieves sub-microsecond latencies while preserving computational accuracy. A Static Synchronous Compensator (STATCOM) is employed as a test case to validate the effectiveness of the proposed approach. The primary contributions of this work include: 1) The development of two novel FDNE integration approaches based on state-space equations, enabling efficient real-time simulation of an FDNE-integrated STATCOM model on FPGA. These formulations leverage matrix-based computations, optimizing execution speed and numerical stability. 2) The implementation of a resource-efficient FPGA framework, utilizing high-level synthesis (HLS) for FPGA programming and integrating Customized Floating-Point based (CuFP-based) arithmetic. CuFP allows customizable precision, balancing accuracy, and hardware efficiency to achieve optimal FPGA utilization. 3) A demonstration that the proposed model achieves faster-than-real-time performance, significantly reducing simulation latencies. This capability positions the framework as a powerful tool for accelerating electromagnetic transient (EMT) simulation applications. 4) An in-depth analysis of resource utilization and computational trade-offs, emphasizing the role of the CuFP library in delivering optimal results. The remainder of this paper is organized as follows: Section II provides an overview of the key concepts. Section III outlines the proposed implementation methodology. Section IV describes the test case used to assess the proposed model's performance compared to the commercial tool EMTP®. Section V offers a comprehensive analysis of the FPGA implementation, focusing on latency, resource utilization, and accuracy. Finally, Section VI presents the conclusions. ### II. BACKGROUND ### A. FPGA-based Power System Simulation FPGAs have been used in real-time simulation applications for many years, particularly in hardware-in-the-loop (HIL) configurations. The inherent parallelism of FPGAs allows for the simulation of complex power electronic circuits with small time steps, ensuring high fidelity and precise interfacing with physical controllers [10], [11]. Beyond real-time simulation, research has extended the use of FPGAs to faster-than-real-time (FTRT) simulation. This approach allows for simulating the behavior of systems in less time than their actual operation, which is beneficial for offline analysis, optimization, and design validation [12], [13]. The authors in [10] provide a comprehensive review of the current state of real-time simulation technologies for power systems. The study focuses on digital real-time simulation (DRTS) and HIL simulation, examining their evolution, computing capabilities, common features, hardware and software components, and solution methodologies across various simulator platforms. The work has demonstrated the simulation of power electronics systems with time steps in the range of a few microseconds. The paper [14] emphasizes the role of real-time simulation technologies in design, prototyping, testing, and teaching, categorizing applications by field, fidelity, and multiphysics aspects, with a focus on transmission and distribution systems. It highlights that the time-step in real-time simulation is critical and varies by application: microsecond-range steps are used for HVdc systems and EMT simulations, while millisecond-range steps are typical for phasor simulations. Paper [15] presents a wide-band multi-port system equivalent for real-time digital simulators, integrating a FDNE for high-frequency EMT and Transient Stability Analysis (TSA) for electromechanical transients. This approach enables accurate simulation of both fast and slow power system dynamics while reducing hardware costs. The multi-port equivalent can be directly connected to the system boundary, allowing TSA to run on a real-time platform. The achieved time-steps vary by simulation type, with 25-50 µs for EMT simulations and 1-2 ms for TSA solutions. ### B. FDNE FDNE aims to reduce the computational burden of Transient simulations of large networks are done by dividing the network under study into two zones: the study zone and the external zone. Assuming that the external zone has a minor impact on a given transient study occurring within the study zone. An FDNE model consists of a rational model [16], its coefficients are calculated to match the frequency response of the subnetwork to replace (external zone) for a finite frequency band as: $$\mathbf{Y}(s) \cong \mathbf{Y}_{fitted}(s) = \mathbf{G}_0 + s\mathbf{E} + \sum_{k=1}^n \frac{\mathbf{R}_k}{s - p_k}$$ (1) where $\mathbf{Y}(s)$ is a $p \times p$ frequency-dependent admittance matrix, and coefficients $p_k$ and $\mathbf{R}_k$ are poles and residues matrices, respectively; n denotes the number of poles of the model; and $\mathbf{G}_0$ and $\mathbf{E}$ are constant matrices ( $\mathbf{E}$ is typically a zero matrix). The proposed method is sensitive to the quality of the rational fitting of the FDNE, as poor fitting can lead to inaccuracies in the simulation results. The FDNE model can be expressed using state-space equations as follows: $$\begin{bmatrix} \dot{\mathbf{x}}(t) \\ \mathbf{i}_F(t) \end{bmatrix} = \begin{bmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{bmatrix} \begin{bmatrix} \mathbf{x}(t) \\ \mathbf{v}(t) \end{bmatrix}$$ (2) where $\mathbf{x}(t)$ is the FDNE state variable. The matrices $\mathbf{A}$ and $\mathbf{C}$ contain the FDNE model poles and residues, with dimensions $(pn) \times (pn)$ and $p \times (pn)$ , respectively. The matrix $\mathbf{B}$ consists of ones and zeros, structured accordingly, with a size of $(pn) \times p$ . The matrix $\mathbf{D}$ is the same as in Eq. 1, with dimensions $p \times p$ . The input and output vectors $\mathbf{v}(t)$ and $\mathbf{i}_F(t)$ represent voltages and currents, respectively. The Backward Euler method is applied in solving the state-space equations, where the next time-step is computed using precomputed system matrices, reducing computational complexity while preserving accuracy. $$\begin{bmatrix} \mathbf{x}(t + \Delta t) \\ \mathbf{i}_{F}(t) \end{bmatrix} = \begin{bmatrix} \mathbf{A}_{d} & \mathbf{B}_{d} \\ \mathbf{C}_{d} & \mathbf{D}_{d} \end{bmatrix} \begin{bmatrix} \mathbf{x}(t) \\ \mathbf{v}(t) \end{bmatrix}$$ (3) where $A_d$ , $B_d$ , $C_d$ , and $D_d$ are discrete versions of matrices A, B, C, and D, respectively. # C. FPGA-based FDNE Simulation FPGA-based FDNE simulations have gained prominence for achieving ultra-low latency and high parallelism, enabling sub-microsecond time steps. Additionally, in [17], the potential of real-time FDNE models was explored on both CPUs and FPGAs, focusing on parallelizing the FDNE algorithm and proposing a partitioning method to optimize computations. Moreover, an FPGA-based FDNE model for accurate real-time simulation of aircraft power systems was presented in [18], employing FDNE models to precisely represent aircraft power cables within an HIL simulation environment. Our work achieves sub-microsecond time-steps for a more intricate test case, highlighting the capability of FPGA-based simulations to efficiently model complex FDNE systems with excellent accuracy and precision for both real-time and FTRT applications. ### D. HLS FPGA Programming HLS bridges the gap between software-oriented design and hardware-level implementation, offering a streamlined approach to developing complex systems [19]. By enabling developers to describe digital system behavior using high-level programming languages like C or C++, HLS tools simplify FPGA programming. Furthermore, HLS tools automate numerous optimization tasks, reducing manual effort and significantly accelerating the development process. Fig. 1. A +100 Mvar/-100 Mvar STATCOM implemented by two-level VSC. Fig. 2. Detailed view of the STATCOM. The CuFP library enhances FPGA resource efficiency while ensuring scalability and computational accuracy, making it a powerful tool for future advancements in grid-tied converter simulations [20]. This work demonstrates how CuFP leverages high-level descriptions to reduce computational latency, optimizing performance while maintaining precision and adaptability in complex scenarios. # III. IMPLEMENTATION METHODOLOGY # A. Brief Test Case Presentation Fig. 1 illustrates the test circuit implemented in EMTP® and used for voltage regulation in electrical power systems. This model represents a 500 kV, 100 MVA STATCOM connected to a 500 kV bus, typically used to stabilize the voltage at a high-voltage transmission line. The STATCOM includes a two-level voltage source converter (VSC) block, power transformer, and associated control and protection systems. The detailed two-level topology is used for the VSC, and the valve comprises one IGBT switch, two non-ideal (series and anti-parallel) diodes, and a snubber circuit, as shown in Fig. 2. A non-ideal switch represents the diodes. The figure displays the configuration of the STATCOM and its integration within the power system, highlighting the components such as the coupling transformer, AC filters, and the control mechanism that manages the reactive power injection or absorption. # B. Integration of STATCOM and FDNE This section examines two distinct approaches for integrating the STATCOM with the FDNE. While the equations presented in this work are explicitly derived for the STATCOM test case, the underlying methodology is broadly applicable and can be easily generalized to other grid-tied converter applications. 1) Approach 1: To characterize the dynamic behavior of the STATCOM, we start by defining its mathematical formulation as outlined in [21]: $$\begin{bmatrix} \mathbf{i}_h(t+\Delta t) \\ \mathbf{i}_C(t) \end{bmatrix} = \begin{bmatrix} \mathbf{H}_{11}^{\sigma} & \mathbf{H}_{12}^{\sigma} & \mathbf{H}_{13}^{\sigma} \\ \mathbf{H}_{21}^{\sigma} & \mathbf{H}_{22}^{\sigma} & \mathbf{H}_{23}^{\sigma} \end{bmatrix} \begin{bmatrix} \mathbf{i}_h(t) \\ \mathbf{v}_{in}(t) \\ \mathbf{v}(t) \end{bmatrix}$$ (4) where $\mathbf{H}_{ij}$ are the matrices associated with switch combinations that result from the algebraic manipulations of the MANA matrix, the notation $\sigma$ refers to the switch combination, $\mathbf{i}_h$ are history terms of the STATCOM, $\mathbf{v}_{in}$ internal voltage sources, $\mathbf{v}$ external voltage sources, and $\mathbf{i}_C$ the current drawn from the external sources (FDNE). The size of the matrix $\mathbf{H}$ depends on the number of history terms and the number of internal and external voltages. State update and output expressions in (4) can be separated, as shown in (5), (6): $$\mathbf{i}_{h}(t + \Delta t) = \begin{bmatrix} \mathbf{H}_{11}^{\sigma} & \mathbf{H}_{12}^{\sigma} & \mathbf{H}_{13}^{\sigma} \end{bmatrix} \begin{bmatrix} \mathbf{i}_{h}(t) \\ \mathbf{v}_{in}(t) \\ \mathbf{v}(t) \end{bmatrix}$$ (5) $$\mathbf{i}_{C}(t) = \begin{bmatrix} \mathbf{H}_{21}^{\sigma} & \mathbf{H}_{22}^{\sigma} & \mathbf{H}_{23}^{\sigma} \end{bmatrix} \begin{bmatrix} \mathbf{i}_{h}(t) \\ \mathbf{v}_{in}(t) \\ \mathbf{v}(t) \end{bmatrix}$$ (6) Same state update and output expressions separation can be applied for the FDNE in (3), we have (7), (8): $$\mathbf{x}(t + \Delta t) = \begin{bmatrix} \mathbf{A}_d & \mathbf{B}_d \end{bmatrix} \begin{bmatrix} \mathbf{x}(t) \\ \mathbf{v}(t) \end{bmatrix}$$ (7) $$\mathbf{i}_F(t) = \begin{bmatrix} \mathbf{C}_d & \mathbf{D}_d \end{bmatrix} \begin{bmatrix} \mathbf{x}(t) \\ \mathbf{v}(t) \end{bmatrix}$$ (8) When the STATCOM and FDNE are integrated into the system, the following relationship is established: $\mathbf{i}_F(t) = -\mathbf{i}_C(t)$ . Thus, by combining the (6), and (8), the following expression for computing the voltage is obtained: $$\mathbf{v}(t) = \begin{bmatrix} \mathbf{W}_{1}^{\sigma} & \mathbf{W}_{2}^{\sigma} & \mathbf{W}_{3}^{\sigma} \end{bmatrix} \begin{bmatrix} \mathbf{i}_{h}(t) \\ \mathbf{v}_{in}(t) \\ \mathbf{x}(t) \end{bmatrix}$$ (9) where $$\begin{cases} \mathbf{W}_{1}^{\sigma} = -(\mathbf{H}_{23}^{\sigma} + \mathbf{D}_{d})^{-1} \mathbf{H}_{21}^{\sigma} \\ \mathbf{W}_{2}^{\sigma} = -(\mathbf{H}_{23}^{\sigma} + \mathbf{D}_{d})^{-1} \mathbf{H}_{22}^{\sigma} \\ \mathbf{W}_{3}^{\sigma} = -(\mathbf{H}_{23}^{\sigma} + \mathbf{D}_{d})^{-1} \mathbf{C}_{d} \end{cases}$$ (10) The following algorithm simulates the model in the time domain with a fixed time-step $\Delta t$ for Approach 1. # Algorithm 1 Simulation Procedure for Approach 1 - 1: Precompute matrices $\mathbf{A}_d, \mathbf{B}_d, \mathbf{H}_{ij}^{\sigma}, \mathbf{W}_1^{\sigma}, \mathbf{W}_2^{\sigma}$ , and $\mathbf{W}_3^{\sigma}$ . - 2: **for** each time point t **do** - 3: Solve for $\mathbf{v}(t)$ using (9). - 4: Optional: compute any output needed, e.g., using (6). - 5: Update states of the STATCOM $i_h(t)$ using (5). - 6: Update states of the FDNE $\mathbf{x}(t)$ using (7). - 7: end for - 2) Approach 2: The purpose of Approach 2 is to reduce the computation latency at each time-point with a few additional precomputations. The idea here is to reduce matrix sizes in (9). Let's define $\xi(t)$ as follows: $$\boldsymbol{\xi}(t) = \mathbf{C}_d \mathbf{x}(t) \tag{11}$$ Hence allowing one to rewrite (9) as: $$\mathbf{v}(t) = \begin{bmatrix} \mathbf{W}_{1}^{\sigma} & \mathbf{W}_{2}^{\sigma} & \mathbf{W}_{4}^{\sigma} \end{bmatrix} \begin{bmatrix} \mathbf{i}_{h}(t) \\ \mathbf{v}_{in}(t) \\ \boldsymbol{\xi}(t) \end{bmatrix}$$ (12) where $$\mathbf{W}_4^{\sigma} = -(\mathbf{H}_{23}^{\sigma} + \mathbf{D}_d)^{-1} \tag{13}$$ As we can rewrite (11) as follows: $$\boldsymbol{\xi}(t + \Delta t) = \mathbf{C}_d \mathbf{x}(t + \Delta t) \tag{14}$$ Combining with (7), we have: $$\boldsymbol{\xi}(t + \Delta t) = \begin{bmatrix} \mathbf{C}_d \mathbf{A}_d & \mathbf{C}_d \mathbf{B}_d \end{bmatrix} \begin{bmatrix} \mathbf{x}(t) \\ \mathbf{v}(t) \end{bmatrix}$$ (15) The simulation algorithm becomes as follows: # Algorithm 2 Simulation Procedure for Approach 2 - 1: Precompute matrices $C_dA_d, C_dB_d, H_{ij}^{\sigma}, W_1^{\sigma}, W_2^{\sigma}$ , and $W_4^{\sigma}$ . - 2: **for** for each time point t **do** - 3: Solve for $\mathbf{v}(t)$ using (12). - 4: Optional: Compute any output needed, e.g., using (6). - 5: Update states of the STATCOM $i_h(t)$ using (5). - 6: Update states of the FDNE $\mathbf{x}(t)$ using (7). - 7: Update variable $\xi(t)$ using (15). - 8: end for The CPU is responsible for the precomputation of matrices $(\mathbf{A}_d, \mathbf{B}_d, \mathbf{H}_{ij}^{\sigma}, \mathbf{W}_1^{\sigma}, \mathbf{W}_2^{\sigma}, \text{ and } \mathbf{W}_3^{\sigma})$ and Fig. 3. Integration of STATCOM and FDNE according to Approach 1. $z^{-1}$ denotes a single time-step delay. $(\mathbf{C_d}\mathbf{A_d}, \mathbf{C_d}\mathbf{B_d}, \mathbf{H}_{ij}^{\sigma}, \mathbf{W}_1^{\sigma}, \mathbf{W}_2^{\sigma},$ and $\mathbf{W}_4^{\sigma})$ , in approaches 1 and 2, respectively, ensuring efficient handling of computationally intensive operations. Once precomputed, these matrices are transferred to the FPGA, which performs real-time execution, solving system equations and updating states at each time step. # C. Latency Analysis for Approach 1 and Approach 2 Fig. 3 illustrates the datapath resulting from the integration of the STATCOM and the FDNE models within a unified simulation framework, according to Approach 1, whereas Fig. 4 shows the integration according to Approach 2. Fig. 5 illustrates the computational latency for Approach 1 and Approach 2. As shown in Fig. 5a, the maximum latency in Approach 1 is governed by two factors: (1) the latency of module $V_{A1}$ , and (2) the maximum latency of the modules I, Y, and X, which are executed in parallel. Since these modules operate concurrently, their collective latency is determined by the module with the highest computational delay. This approach ensures efficient parallelism within the constraints of the design. $$\ell_{\text{Approach 1}} = \ell_{\mathbf{V}_{A1}} + \max(\ell_{\mathbf{I}}, \ell_{\mathbf{Y}}, \ell_{\mathbf{X}}) \tag{16}$$ On the other hand, Fig. 5b illustrated the computational latency of Approach 2, which is determined by the latency of module $V_{A2}$ and the maximum latency of the modules I, Y, X, and E, which also operate in parallel. $$\ell_{\text{Approach 2}} = \ell_{\mathbf{V}_{A2}} + \max(\ell_{\mathbf{I}}, \ell_{\mathbf{Y}}, \ell_{\mathbf{X}}, \ell_{\mathbf{E}})$$ (17) A key observation is that the latency of module $V_{A1}$ is higher than $V_{A2}$ . Hence, By introducing module E into the parallel pipeline, Approach 2 further distributes the workload, optimizing the overall latency. ### IV. TEST CASE Fig. 1 illustrates the test circuit implemented in EMTP® and used for voltage regulation in electrical power systems. This Fig. 4. Integration of STATCOM and FDNE according to Approach 2. $z^{-1}$ denotes a single time-step delay. Fig. 5. Comparison of latency for different approaches. model represents a 500 kV, 100 MVA STATCOM connected to a 500 kV bus, typically used to stabilize the voltage at a high-voltage transmission line. The figure displays the configuration of the STATCOM and its integration within the power system, highlighting the components such as the coupling transformer, AC filters, and the control mechanism that manages the reactive power injection or absorption. The FDNE model contains an ideal current source to represent the steady-state conditions of the original circuit, connected in parallel to the state-space model described by Eq. 2. EMTP computes the parameters of FDNE by employing the vector Fig. 6. Comparison of the computed current with the reference current from the EMTP® model; (a) Close-up view of phase-currents during fault initiation; and (b) Close-up view of phase-currents during fault clearing. fitting method combined with the Loewner-Matrix method [22] to determine the model order automatically. The frequency range spans from 1 Hz to 1 MHz, with a tolerance set to $1^{-6}$ , which results in 52 poles in the model. Therefore, the dimensions of the state-space matrices are as follows: A is $156 \times 156$ , B is $156 \times 3$ , C is $3 \times 156$ , and D is $3 \times 3$ . For a dynamic response of STATCOM, a three-phase fault occurs on BUS 1 at t=1 s and lasts for 200 ms. During the fault, the voltage at Bus 1 decreases from the reference voltage, $V_{\rm ref}=1$ pu. The STATCOM reacts to the event by generating the reactive power to increase the bus voltage. # V. EXPERIMENTAL RESULTS ### A. Simulation Accuracy To validate the accuracy and fidelity of the proposed implementation, the results are compared with a reference model developed in EMTP<sup>®</sup>. Fig. 6a presents the superimposed phase currents at the receiving end, as computed by the proposed implementation and the EMTP<sup>®</sup> model, for the 0.9 to 1.3 seconds of the simulation. Detailed close-up views at critical events are shown in Fig. 6b and 6c, illustrating the system's behavior during fault initiation and clearing, respectively. As observed, the proposed implementation perfectly matches the reference model, confirming its high accuracy and reliability. # B. Area occupation and speed performance This section details the experimental results of implementing the proposed model on an Alveo U280 FPGA using Vitis HLS 2023.2. The selected FPGA platform, the xcu280-fsvh2892-2L-e, enables high-speed, real-time TABLE I. COMPARISON OF THE PROPOSED MODEL'S PERFORMANCE IN TERMS OF NUMBER OF CYCLES, RESOURCE UTILIZATION, AND ACCURACY FOR APPROACH 1 AND APPROACH 2. | Approach | Data Type | Frequency (MHz) | # of Cycles | Latency (ns) | BRAM | DSPs | LUTs | Registers | $RE_2$ -norm (%) | |------------|------------------------------------|-----------------|-------------|--------------|-----------------|-------------------|---------------------|---------------------|-----------------------| | Approach 1 | Single-Precision<br>Floating-Point | 250 | 145 | 580 | 260<br>(6.44%) | 470<br>(5.20%) | 72,232<br>(5.57%) | 86,106<br>(3.32%) | $7.82 \times 10^{-2}$ | | | Double-Precision<br>Floating-Point | 250 | 180 | 720 | 403<br>(9.99%) | 777<br>(8.43%) | 163,013<br>(12.57%) | 206,860<br>(7.98%) | $1.03 \times 10^{-2}$ | | | CuFP (8, 24) [20] | 250 | 102 | 408 | 254<br>(6.30%) | 343<br>(3.80%) | 122,036<br>(9.36%) | 98,478<br>(3.78%) | $8.7 \times 10^{-2}$ | | | CuFP (8, 34) [20] | 250 | 116 | 464 | 300<br>(7.45%) | 671<br>(7.44%) | 182,332<br>(13.99%) | 141,551<br>(5.43%) | $5.07 \times 10^{-2}$ | | | CuFP (11, 53) [20] | 250 | 139 | 556 | 326<br>(8.09%) | 756<br>(10.61%) | 227,619<br>(17.35%) | 183,480<br>(6.75%) | $1.08 \times 10^{-2}$ | | Approach 2 | Single-Precision<br>Floating-Point | 250 | 123 | 492 | 263<br>(6.52%) | 874<br>(9.68%) | 90,303<br>(6.92%) | 121,277<br>(4.65%) | $7.81 \times 10^{-2}$ | | | Double-Precision<br>Floating-Point | 250 | 157 | 628 | 486<br>(12.05%) | 1,676<br>(18.57%) | 218,044<br>(16.72%) | 270,270<br>(10.36%) | $1.03 \times 10^{-2}$ | | | CuFP (8, 24) [20] | 250 | 78 | 312 | 264<br>(6.61%) | 374<br>(4.14%) | 116,534<br>(8.94%) | 87,178<br>(3.23%) | $8.68 \times 10^{-2}$ | | | CuFP (8, 34) [20] | 250 | 86 | 344 | 359<br>(8.92%) | 780<br>(8.64%) | 185,911<br>(14.26%) | 140,549<br>(5.39%) | $5.06 \times 10^{-2}$ | | | CuFP (11, 53) [20] | 250 | 107 | 428 | 335<br>(8.31%) | 957<br>(10.61%) | 226,217<br>(17.35%) | 175,961<br>(6.75%) | $1.07 \times 10^{-2}$ | operation, with results reported at RTL simulation level, after post-routing to reflect actual hardware performance rather than simulation estimates. The offline simulations are carried out on EMTP® v4.5 to validate the FPGA implementation. The simulations are run on a laptop with an 11th Gen Intel i9-11950H processor and 64 GB DDR5 RAM. The FDNE model size directly impacts FPGA resource utilization and execution time. As the number of poles in the admittance matrix increases, the state-space representation expands, leading to a larger system matrix. This results in higher memory usage and increased computational load due to additional state updates, affecting occupation and execution time. However, since our FPGA implementation leverages parallel computation and precomputed matrices, the latency increase remains moderate. The proposed model is evaluated based on execution cycles, FPGA resource utilization, and accuracy. The model operates at a clock frequency of 250 MHz, and its performance is assessed using different data types: single, double, and customized floating-point formats provided by the CuFP library [20]. For the CuFP data type, three different configurations are chosen for comparison. In this library, the notation ${\rm CuFP}(w_e,\ w_m)$ represents a floating-point number with an exponent bit width of $w_e$ and a mantissa bit width of $w_m$ . To enable meaningful comparisons with IEEE 754 single-and double-precision formats, CuFP (8, 24) and CuFP (11, 53) were specifically selected for this study. While fixed-point arithmetic could reduce FPGA resource usage, particularly for DSPs and LUTs, it requires careful bit-width optimization to avoid numerical instability due to its limited dynamic range. This is particularly challenging in power system simulations, where wide dynamic ranges are common. To assess the accuracy of the proposed model, the 2-norm relative error (RE<sub>2-norm</sub>) is employed as the primary metric. This error metric, shown in (18), quantifies the relative discrepancy between the computed output from FPGA $(out_{\rm fpga})$ and the reference output from EMTP® $(out_{\rm emtp})$ . Specifically, the RE<sub>2-norm</sub> is calculated as follows: $$RE_{2-\text{norm}}(\%) = \frac{\|out_{\text{emtp}} - out_{\text{fpga}}\|_{2}}{\|out_{\text{emtp}}\|_{2}} \times 100$$ (18) where $out_{\rm emtp}$ represents the reference output and $out_{\rm fpga}$ represents the computed output. The symbol $\|out\|_2$ represents the 2-norm of the output. Table I presents a comprehensive analysis of the performance metrics, including resource utilization and accuracy, of the proposed model for Approach 1 and Approach 2, evaluated across various data types. Key parameters such as latency, FPGA resource usage (BRAM, DSPs, LUTs, and Registers), and accuracy are compared. In Approach 1, in the case of single-precision floating-point, the model achieves a latency of 580 ns and an RE<sub>2</sub>-norm of 0.0782%. This represents a relatively low latency but with limited accuracy. CuFP (8, 24) shows a promising alternative, achieving a latency of 408 ns with slightly higher accuracy, offering comparable accuracy to single-precision while using fewer resources. This demonstrates the efficiency of CuFP (8, 24) in scenarios where resource constraints and latency are critical. CuFP (8, 34) configuration provides an even higher accuracy, as it uses more bits for the mantissa. However, this comes with an increase in latency and higher resource utilization. This configuration is suitable for applications where accuracy is more critical than latency, but the higher resource requirements must be considered when working within resource-constrained environments. Finally, the double-precision floating-point implementation comes with a higher latency of 720 ns and higher resource consumption. In contrast, CuFP (11, 53) achieves a similar level of accuracy while maintaining a lower latency of 556 ns, and fewer resources than double-precision. This data type balances performance and efficiency, enabling fast computation with minimal resources, making it ideal for real-time simulation. Moving to Approach 2, the single-precision floating-point implementation offers a latency of 492 ns and an RE<sub>2</sub>-norm of 0.0781\%. Resource utilization includes 263 BRAMs, 874 DSPs, 90,303 LUTs, and 121,277 registers. Compared to Approach 1, the latency is reduced, but the resource utilization is higher, reflecting the increased demand on resources in this approach. For double-precision floating-point, the latency is 628 ns and is reduced compared to Approach 1. However, the area is more than in Approach 1. When CuFP (8, 24) is utilized, the latency drops to 312 ns, with an RE<sub>2</sub>-norm of 0.0868%, making it a highly efficient choice for applications requiring low latency and moderate accuracy. This configuration uses 264 BRAMs, 374 DSPs, 116,534 LUTs, and 87,178 registers, showcasing significant latency and resource efficiency improvements over floating-point implementations. For those applications that need more accuracy, the CuFP (8, 34) configuration can be a better option rather than CuFP (8, 24). This configuration offers an improved accuracy of 0.0506\%, with a latency of 344 ns. Although the increased accuracy comes with higher resource utilization, the configuration offers a good balance between precision and performance. Similarly, CuFP (11, 53) achieves a latency of 428 ns, closely matching the accuracy of double-precision floating-point with an RE2-norm of 0.0107%. Moreover, it consumes fewer resources than the double-precision implementation, demonstrating a balance between accuracy and resource utilization. The proposed model demonstrates efficient resource utilization, low latency, and high accuracy across different data types and configurations. In particular, approach 2 with CuFP configurations offers a promising trade-off between latency, accuracy, and resource efficiency, making it an excellent choice for real-time applications. Additionally, the flexibility of the CuFP library allows users to customize the floating-point configuration to meet specific needs, providing enhanced control over performance and resource usage. The reduction in latency in Approach 2, however, comes at the expense of increased resource utilization. The additional computational resources required to support the expanded parallel pipeline result in a larger hardware footprint. This trade-off reflects a deliberate design choice to prioritize latency reduction over resource constraints. The comparative analysis highlights the benefits of leveraging parallelism to optimize latency. Approach 2 significantly reduces computational delay, making it suitable for latency-sensitive applications, although with higher area requirements. The choice between these approaches depends on the specific application requirements and the hardware constraints of the target system. For applications where accuracy is the priority, CuFP (11, 53) or double-precision floating-point should be used to minimize numerical errors. If latency is the primary concern, particularly for real-time control applications, configurations in the second approach, such as CuFP (8, 24), provide a good balance between resource efficiency and numerical precision, significantly reducing execution time while maintaining acceptable accuracy. CuFP (8, 34) serves as a middle ground, offering improved accuracy over CuFP (8, 24) while keeping latency lower than full double-precision arithmetic. These results demonstrate that CuFP gives the flexibility to balance precision and computational efficiency, making it well-suited for various power system applications. # C. Speed-up Performance In real-time simulation, a system is considered real-time if its execution time per step does not exceed the simulation time step ( $\Delta t$ ). In our work, the latency per time step is in the nanosecond range (as shown in Table I), significantly faster than the required microsecond-level time steps for power electronics simulations. The results highlight that the implementation achieved nanosecond-level latencies, making it well-suited for real-time applications. In comparison, the reference model, developed using EMTP® and executed on a system with the previously described configuration, was evaluated for specific time points, e.g. 50,000 time-points, with a fixed time-step. Under these conditions, the reference model exhibited a latency of 6.0625 seconds. By contrast, the proposed model, executed for the same number of time points, completed the task in just 24.6 ms using the single-precision data type, as detailed in (19). Total latency = $$492 \times 10^{-9} \times 50,000 = 24.6 \text{ ms}$$ (19) This achievement represents a performance improvement of approximately 246 times compared to the reference model, highlighting that the proposed implementation is suitable for real-time applications and capable of operating faster than real-time. ## VI. CONCLUSIONS This paper presented an efficient FPGA-based real-time simulation framework for grid-connected converters, integrating a STATCOM model with an FDNE approach. The proposed implementation was validated against a high-fidelity reference model developed in EMTP®, demonstrating strong agreement in waveform accuracy and achieving minimal relative error across various floating-point formats. The results confirm the effectiveness of the proposed framework in replicating dynamic system behaviors with high precision. The FPGA implementation, tested on the Alveo U280 platform, showcases the potential for scalable and efficient real-time simulation systems. With sub-microsecond latencies and low resource utilization, the model is well-suited for integration into real-time control systems for power networks, where speed and accuracy are critical. Notably, the proposed approach achieved a remarkable 246 times speed improvement compared to the reference model, demonstrating its capability to perform faster-than-real-time simulations without compromising accuracy. Furthermore, the flexibility of the CuFP library enables future adaptations for different precision requirements and hardware configurations. Future work will focus on extending the methodology to multi-converter systems, such as multiple STATCOMs or active power filters, to assess scalability and performance trade-offs. Additionally, HIL validation will be conducted to confirm real-time execution in practical scenarios. Another research direction is adaptive precision selection, where CuFP bit-widths dynamically adjust based on system operating conditions to optimize computational efficiency further. ### REFERENCES - [1] N. Burham, Study of STATCOM for voltage compensation in 14-Bus IEEE grid. Thesis, uppsala university, 2024. - [2] C. K. Tse, M. Huang, X. Zhang, D. Liu, and X. L. Li, "Circuits and systems issues in power electronics penetrated power grid," *IEEE Open Journal of Circuits and Systems*, pp. 140–156, 2020. - [3] S. Kouro, J. I. Leon, D. Vinnikov, and L. G. Franquelo, "Grid-connected photovoltaic systems: An overview of recent research and emerging PV converter technology," *IEEE Industrial Electronics Magazine*, vol. 9, no. 1, pp. 47–61, 2015. - [4] S. Rivera, S. Kouro, S. Vazquez, S. M. Goetz, R. Lizana, and E. Romero-Cadaval, "Electric vehicle charging infrastructure: From grid to battery," *IEEE Industrial Electronics Magazine*, vol. 15, no. 2, pp. 37–51, 2021. - [5] N. Kumar, P. Wagh, D. Kolhe, P. Arane, and P. Kadlag, "Power quality improvement in distributed energy resources for EV charging using STATCOM," in *International Conference on Sustainable Technology for* Power and Energy Systems (STPES), pp. 1–4, 2022. - [6] R. Sahoo and M. Roy, "An fpga-based balancing of capacitor voltage for a five-level chb inverter," *Arabian Journal for Science and Engineering*, vol. 49, 04 2024. - [7] C.-Y. Tang and J.-H. Jheng, "An active power ripple mitigation strategy for three-phase grid-tied inverters under unbalanced grid voltages," *IEEE Transactions on Power Electronics*, vol. 38, no. 1, pp. 27–33, 2023. - [8] Y. Chen, B. Zhang, D. Qiu, Y. Chen, F. Xie, and H. Sun, "Switched active power control of a grid-connected inverter with reduced rocof and frequency overshoot," *IEEE Transactions on Power Electronics*, vol. 39, no. 4, pp. 4062–4077, 2024. - [9] A. P. Murdan, I. Jahmeerbacus, and S. Z. S. Hassen, "Modeling and simulation of a statcom for reactive power control," in International Conference on Emerging Trends in Electrical, Electronic and Communications Engineering (ELECOM), pp. 1–6, 2022. - [10] M. D. Omar Faruque, T. Strasser, G. Lauss, V. Jalili-Marandi, P. Forsyth, C. Dufour, V. Dinavahi, A. Monti, P. Kotsampopoulos, J. A. Martinez, K. Strunz, M. Saeedifard, X. Wang, D. Shearer, and M. Paolone, "Real-time simulation technologies for power systems design, testing, and analysis," *IEEE Power and Energy Technology Systems Journal*, vol. 2, no. 2, pp. 63–73, 2015. - [11] H. Chalangar, T. Ould-Bachir, K. Sheshyekani, and J. Mahseredjian, "Methods for the accurate real-time simulation of high-frequency power converters," *IEEE Transactions on Industrial Electronics*, vol. 69, no. 9, pp. 9613–9623, 2022. - [12] B. Sullivan, J. Shi, M. Mazzola, and B. Saravi, "Faster-than-real-time power system transient stability simulation using parallel general norton with multiport equivalent (pgnme)," in *IEEE Power Energy Society General Meeting (PESGM)*, pp. 1–5, 2017. - [13] X. Liu, J. Ospina, I. Zografopoulos, A. Russel, and C. Konstantinou, "Faster than real-time simulation: methods, tools, and applications," in *Proceedings of the 9th Workshop on Modeling and Simulation of Cyber-Physical Energy Systems*, MSCPES '21, (New York, NY, USA), Association for Computing Machinery, 2021. - [14] X. Guillaud, M. O. Faruque, A. Teninge, A. H. Hariri, L. Vanfretti, M. Paolone, V. Dinavahi, P. Mitra, G. Lauss, C. Dufour, P. Forsyth, A. K. Srivastava, K. Strunz, T. Strasser, and A. Davoudi, "Applications of real-time simulation technologies in power and energy systems," - *IEEE Power and Energy Technology Systems Journal*, vol. 2, no. 3, pp. 103–115, 2015. - [15] X. Lin, A. M. Gole, and M. Yu, "A wide-band multi-port system equivalent for real-time digital power system simulators," *IEEE Transactions on Power Systems*, vol. 24, no. 1, pp. 237–249, 2009. - [16] J. Morales Rodriguez, E. Medina, J. Mahseredjian, A. Ramirez, K. Sheshyekani, and I. Kocar, "Frequency-domain fitting techniques: A review," *IEEE Transactions on Power Delivery*, vol. 35, no. 3, pp. 1102–1110, 2020. - [17] F. Dicler, Contributions to the FPGA and CPU Implementation of Frequency-Dependent Network Equivalents for Real-Time and Offline Electromagnetic Transient Power System Simulators. Thesis, 2021. - [18] F. Hajizadeh, L. Alavoine, T. Ould-Bachir, F. Sirois, and J. P. David, "FPGA-based FDNE models for the accurate real-time simulation of power systems in aircrafts," in *International Conference on Renewable Energy Research and Applications (ICRERA)*, pp. 344–348, 2023. - [19] Y. Uguen, F. D. Dinechin, V. Lezaud, and S. Derrien, "Application-specific arithmetic in high-level synthesis tools," ACM Trans. Archit. Code Optim., vol. 17, mar 2020. - [20] F. Hajizadeh, T. Ould-Bachir, and J. P. David, "CuFP: An HLS library for customized floating-point operators," *Electronics*, vol. 13, no. 14, 2024 - [21] H. Chalangar, T. Ould-Bachir, K. Sheshyekani, and J. Mahseredjian, "Methods for the accurate real-time simulation of high-frequency power converters," *IEEE Transactions on Industrial Electronics*, vol. 69, no. 9, pp. 9613–9623, 2022. - [22] J. Morales, J. Mahseredjian, A. Ramirez, K. Sheshyekani, and I. Kocar, "A loewner/mpm—vf combined rational fitting approach," *IEEE Transactions on Power Delivery*, vol. 35, no. 2, pp. 802–808, 2019.