LU Factorization Algorithm with Minimum Degree Ordering in Power Distribution Network Problems

Power systems computations for nowadays common large distributed systems typically involve the usage of very large sparse matrices, whose analysis and verification is very time and memory consuming. When blocked, sparse matrices can be processed much more efficiently, and this made blocked sparse matrices widely used in acquiring solutions for power system problems. The established sparse matrix storage and reordering techniques however do not fully utilize the existing computer architecture, thus search for efficient sparse system solution is ongoing. This paper presents adjustments of well-known LU factorization algorithm suitable for use in power distribution network applications. LU factorization algorithm processes data in blocks and uses minimum degree ordering to accelerate the computations.


INTRODUCTION
With the emergence of multi-core processors and general purpose GPUs, processing power of computing devices is increasing rapidly. Processing capabilities of devices and their performance when it comes to data computation is often very hard to match. The main reasons behind this performance mismatch are inability to adapt existing algorithms to parallel computing features of devices, and memory bottleneck.
In multi-core processors, all algorithms need to be parallel in order to fully utilize their computational power [2]. This, often referred as "the end of the free lunch" shows that improvement rate of algorithms performance in relation to transistor densities of computer processors is not constant. In other words, as transistor density in computer processors increase, the performance of numerical algorithms does not increase in constant rate [2]. To accommodate the technological change of multiprocessor and GPU systems, the change in underlying architecture was necessary. Single-Instruction Multiple-Data architecture (SIMD) is ideal to use for applications that require intensive computations of sparse matrices.
In addition to the above, memory limitations, as predicted by Wulf in 1995, have been reached [6]. This means that, regardless of the increase in processing power or increase in memory size, the computational performance with the existing architecture is about to reach its peak. Memory optimizations on algorithms that involve processing of large data sets are inevitable. This is especially true for applications that use large sparse matrices. These matrices are characterized by low degree of data locality, which result in many cache misses that severely detriment computational performance of devices.
Power distribution networks (PDN)use very large and very sparse matrices, which require sophisticated algorithms to deliver reliable and efficient functionality of their systems.
State estimationapproximates power flow using real-time PDN measurements. It maintains the information about the status of every flow and every voltage in the system at all times. Based on this information, state estimator predicts the PDN behavior. It determines the optimal system state, and provides the best estimates for complete PDN (all line flows, loads, generator outputs, etc.) [14]. State estimator is using power flow calculations repeatedly, which determine exact power flow of profiled network information.
VVC manages controllable network equipment in order to provide optimal network state, based on the information gathered from the state estimator. OFR optimizes the network topology by adding/removing network feeders to/from the network. This is very useful, especially in situations when sections of the network fail or get overloaded.
All these tasks need to deal with calculations of sparse matrices, and inevitably factorization of sparse matrices. But general purpose sparse matrix solutions cannot be applied for PDN application, as this is real-time application with complex structure and data. Designing dedicated power system solution that will contribute fast, reliable and robust analysis of the PDN is big challenge, especially considering the fact that these networks are constantly growing. This paper describes matrices used in PDN and presents the LU factorization algorithm with Minimum Degree ordering, which processes PDN matrices in blocks.

POWER DISTRIBUTION NETWORKS MATRICES
Power distribution networks are hierarchical networks, with high voltage lines transferring the electricity to local networks, which further distribute the power to the end users. As any other network, PDN can be represented as graph and expressed as matrix.

Power Distribution Network Graphs and Matrices
In power distribution networks graphs, nodes represent electrical busses or the diagonal elements of incidence matrices, and edges represent transmission lines or offdiagonal nonzero matrix elements [1]. Figure 1 shows IEEE 13-bust test feeder network [14], and its equivalent graph. Depending on the problem being analyzed, different incidence matrices exist in power system network analysis, such as nodal admittance incidence matrix, branch path incidence matrix, basic cut set incidence matrix, etc. [6]. In this report we are analyzing nodal admittance matrix, and, given the 3-phase network, graph presented in Figure 1 would produce 39x39 nodal admittance matrix (13x13 block matrix with 3x3 sized blocks).
Nodal admittance matrix represents coefficient matrix in Kirchhoff's linear system of equations, where, given the nodal impedance values and current flowing between the nodes, corresponding voltage needs to be obtained. Eq. 1 shows the Kirchhoff's linear system of equations for network presented in Figure 1, and each impedance element Y i,j in the nodal admittance matrix is a 3x3 matrix block. (Kirchhoff was actually the first one to develop theory of graph-trees for applications in electrical networks [6]).  Solving this problem requires elimination of nodes within the graph, or factorization of coefficient (nodal admittance) matrix in the process of gaining the solution.
Node elimination (matrix factorization) can be done using different algorithms, and in this work we opted for LU factorization algorithm, with minimum degree ordering, which will be explained in following text.

Characteristics of Power Distribution Network Matrices
PDNmatrices are irregular, indefinite and very sparse. These matrices are the sparsest matrices encountered in real life applications [1]. Implementing solutions for regular sparse linear systems is somehow straightforward [3], but in irregular matrices, this task becomes very difficult [1,3,8,13]. This is the reason why solutions provided by current power system solvers have not been as efficient as those designed for other types of sparse matrices [8,9].
Most of the nonzero elements in PDN matrices are concentrated along the diagonal, however nonzero elements exist off-diagonal as well. Number of nonzero elements per row is indefinite, and matrices reflect symmetry. Operations on sparse matrices should take time proportional to the size of data and the of nonzero arithmetic operations on it [4].  From figures 1and 2 it can be seen that PDN matrix is block matrix (a matrix whose elements are other smaller matrices). Nodal admittance matrix used in the analysis of PDN maycontain blocks sized 1x1, 2x2 or 3x3, depending on the electric power phase. Figure 2 shows that each element in nodal admittance matrix Y is a block of size 3x3, indicating that this network is a 3 phase network.
Some blocks can be reduced to 2x2 or 1x1 sized matrices. For example, branch BC, indicated by Y 2,3 is a 1-phase branch, and it contain only 1 nonzero element, and branch BD, indicated by Y 2,4 is a 2-phase branch, and it contains4 nonzero elements (matrix 2x2).

METHODOLOGY: LU FACTORIZATION BASED ON MINIMUM DEGREE ORDERING
The main focus of this ongoing research is to implement direct method for solution of system of linear equations in the area of power distribution networks efficiently. Linear system of equations can be presented as Ax = b, where A is sparse matrix, representing coefficients in the linear system of equations, b is vector of right hand-side values, and x is a vector of unknowns. Sparse matrix A can be factored into two separate triangular sparse matrices L, lower triangular sparse matrix with 1's on the diagonal, and U, upper triangular sparse matrix. LU and Cholesky factorizations are most commonly used direct methods in power distribution system network analysis. In this work, we will focus on LU factorization algorithm.
In power systems applications, LU factorization is used to solve the system of Kirchhoff's equations: Yv =i, where Y is nodal admittance matrix, i is vector of currents flowing between corresponding nodes, and v is a vector representing branch voltages between corresponding nodes.
Classical LU factorization algorithm can be done in three distinct steps: 1. Matrix factorization or forward elimination stage: in this step, triangular matrices L and U are produced from the original matrix A, such that A = L*U. Calculation of elements of triangular matrices L and U can be done using the following equations: Substituting L and U for A, and Ux for vector v, we can perform further LU factorization algorithm steps, as presented below.
The classical LU factorization algorithm for dense systems is presented in equations Eq. 2-5. For sparse matrix LU factorization algorithm, number of computations is reduced due to the large number of zero elements in the matrix, and the sparse matrix LU factorization algorithm can be viewed as having one explicit for loop, with additional calculations for only those indices that represent nonzero element in original matrix, or fill-in nonzero element that results from factorization process [1]. During the process of factorization, elements of the original sparse matrix below the diagonal must be eliminated using row operations, and in this process, some elements with value zero may become non zero elements, called fill-in elements.
The amount of fill-ins may be controlled by ordering the matrix before its factorization [10]. Different ordering algorithms exist, and in this research we are using minimum degree ordering algorithm or Markowitz ordering algorithm.
Minimum Degree ordering is greedy algorithm that reorders the rows/columns of symmetric sparse matrix so that the row/column with fewest non-zero elements at a given factorization stage is the next one to be eliminated [7]. Physical reordering of the matrix wastes time, so Markowitz ordering strategy is actually to choose pivots with the added constraint of minimum fill-in. From graph point of view, a node with minimum degree, or the least number of edges is to be eliminated first. This way, the amount of work and storage needed during the elimination of nodes in a graph is reduced [5,10], and less fill-ins are generated (fill-ins would represent the creation of new edges that did not exist in the original matrix). This is demonstrated in Figure 3, where elimination of node B results in creation of two new edges, AC and AD. When node with higher degree is eliminated first, the amount of computations is increased. This is obvious from matrix shown in Figure 4, where all fill-ins are shaded. The pseudo-code of the three steps of the LU factorization algorithm with minimum degree ordering is presented in Figures5-7. Figure 5 shows the matrix elimination process based on minimum degree ordering. The pivots for the elimination process are chosen based on minimum degree ordering and are done in place. Once eliminated, the node is removed from the graph, and its matrix's row and column are marked as done. The elimination order is being recorded in order vector, and this order will be used in forward and backward elimination process.

Minimum Degree Ordered LU Factorization Based on Blocks
Blocking of sparse matrices improves their processing performance for three reasons: only one index per block is needed, and number of indices used in coding is reduced from one per non zero entry to one per block; multiplier vector blocks are loaded once, and used m number of times for matrix of m by n blocks; and in processors with load floating point quad instructions, less load instructions is needed to load nonzero entries into two registers [12]. LU factorization with blocks is shown in equations 6 and 7, where each factor used in calculation (L ij , A ij , U ij ) is block matrix.φ represents electric power phase, so matrix blocks A, U and L of size φ φ. Equations 4 and 5 can be restructured in similar way.
Implementing LU factorization presented in Figures 5-7would be more efficient if blocks are used. The process of factorization will then contain matrix-matrix and matrix-vector operations, which are computationally very demanding. These computations can be accelerated through SIMD-ization or parallelization.

Example: π equivalent using LU factorization with Minimum Degree Ordering
Generalized nodal current injection equation for thekphase branch connected between the m-phase node i and then-phase node j is calledπ equivalent, and is shown in Figure 8. This is the most common way in which PDN branch is modeled.
Kirchhoff's set of equationsfor Figure 8 is shown in Eq. 8. Sub-matrix elements (I and V sub-matrices) for nodesi and j are defined based on the number ofnode phases, and impedance sub-matrices of branch k are defined based on the number of phases of branch k.
Nodal admittance matrix requires less storage compared to phase admittance matrix [40], and its LU factorization is therefore faster.
This example shows the simplest possible matrix used in PDN, because it represents a single network branch, or physically a single wire. PDN consists of hundreds of such wires/branches, and for this reason its computation is very complex and time consuming. Using Blocked LU factorization with minimum degree ordering as an algorithm in PDN system solution should accelerate computations. With code optimization using SIMD instruction, we believe that this acceleration will be even more.

CONCLUSION AND FUTURE WORK
The analysis of minimum degree ordered LU factorization algorithm on very sparse matrices specific for PDN has been presented. Sparse matrices are first blocked based on the number of phases used in network sections, and the algorithm is modified to run on these fixed-size blocked matrices.
Our future work would consider processing of matrices with blocks of different size as shown in Figure 9, where the original Y matrix is of size 12x12. Since originally 3phase power comes from the slack, 3x3 blocks are applied and original Y matrix can be seen as 4x4 blocked matrix. The number of power phases changes as the network branches out, so rows and columns (rows 8-10) in the Y matrix have all zero elements, as that particular phase is not used in that branch. Removing 0's from the matrix would result changing the block size for different impedance matrix elements. After the LU factorization with minimum degree ordering is done on blocks with various size, the SIMD-ization of the code will be implemented.