Difference between revisions of "Repeated Thomas algorithm, pointwise version"
[unchecked revision] | [unchecked revision] |
Line 182: | Line 182: | ||
Thus, in terms of serial complexity, the two-sided elimination method is qualified as a ''linear complexity'' algorithm. | Thus, in terms of serial complexity, the two-sided elimination method is qualified as a ''linear complexity'' algorithm. | ||
+ | |||
+ | |||
+ | '''1.7 Information graph''' | ||
+ | |||
+ | The information graph of the two-sided elimination method is shown in fig. 1. It is obvious that the graph is parallel, the degree of parallelism being at most four and two at its forward and backward paths, respectively. During the forward path, the two branches and also two of their sub-branches (namely, the left sub-branch, corresponding to the matrix decomposition, and the right branch, responsible for the solution of the first bidiagonal system) can be executed in parallel.The right sub-branches correspond to the backward path. It is evident from the figure that not only the mathematical content of this process but also the structure of the algorithm graph and the direction of the corresponding data flows are in complete agreement with the appellation "backward path." The variant with divisions replaced by reciprocal number calculations is illustrated by the graph in fig. 2. | ||
+ | |||
+ | '''1.8 Parallelization resource of the algorithm''' |
Revision as of 20:41, 31 January 2016
Primary authors of this description: A.V.Frolov, Vad.V.Voevodin (Section 2.2), A.M.Teplov (Section 2.4)
Contents
1 Properties and structure of the algorithm 1.1 General description of the algorithm 1.2 Mathematical description of the algorithm 1.3 Computational kernel of the algorithm 1.4 Macro structure of the algorithm 1.5 Implementation scheme of the serial algorithm 1.6 Serial complexity of the algorithm 1.7 Information graph 1.8 Parallelization resource of the algorithm 1.9 Input and output data of the algorithm 1.10 Properties of the algorithm 2 Software implementation of the algorithm 2.1 Implementation peculiarities of the serial algorithm 2.2 Locality of data and computations 2.2.1 Locality of implementation 2.2.1.1 Structure of memory access and a qualitative estimation of locality 2.2.1.2 Quantitative estimation of locality 2.3 Possible methods and considerations for parallel implementation of the algorithm 2.4 Scalability of the algorithm and its implementations 2.4.1 Scalability of the algorithm 2.4.2 Scalability of of the algorithm implementation 2.5 Dynamic characteristics and efficiency of the algorithm implementation 2.6 Conclusions for different classes of computer architecture 2.7 Existing implementations of the algorithm 3 References
1 Properties and structure of the algorithm
1.1 General description of the algorithm
The two-sided elimination method is a variant of Gaussian elimination used for solving a system of linear algebraic equations (SLAE) [1][2] Ax = b, where
A = \begin{bmatrix}
a_{11} & a_{12} & 0 & \cdots & \cdots & 0 \\ a_{21} & a_{22} & a_{23}& \cdots & \cdots & 0 \\ 0 & a_{32} & a_{33} & \cdots & \cdots & 0 \\ \vdots & \vdots & \ddots & \ddots & \ddots & 0 \\ 0 & \cdots & \cdots & a_{n-1 n-2} & a_{n-1 n-1} & a_{n-1 n} \\ 0 & \cdots & \cdots & 0 & a_{n n-1} & a_{n n} \\
\end{bmatrix}, x = \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \\ \end{bmatrix}, b = \begin{bmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{n} \\ \end{bmatrix}
However, presentations of the elimination method [3] often use a different notation and a numbering for the right-hand side and matrix of the system. For instance, the above SLAE can be written as
A = \begin{bmatrix}
c_{0} & -b_{0} & 0 & \cdots & \cdots & 0 \\ -a_{1} & c_{1} & -b_{1} & \cdots & \cdots & 0 \\ 0 & -a_{2} & c_{2} & \cdots & \cdots & 0 \\ \vdots & \vdots & \ddots & \ddots & \ddots & 0 \\ 0 & \cdots & \cdots & -a_{N-1} & c_{N-1} & -b_{N-1} \\ 0 & \cdots & \cdots & 0 & -a_{N} & c_{N} \\
\end{bmatrix}\begin{bmatrix} y_{0} \\ y_{1} \\ \vdots \\ y_{N} \\ \end{bmatrix} = \begin{bmatrix} f_{0} \\ f_{1} \\ \vdots \\ f_{N} \\ \end{bmatrix}
(here, N=n-1). If each equation is written separately, then we have
c_{0} y_{0} - b_{0} y_{1} = f_{0}, -a_{i} y_{i-1} + c_{i} y_{i} - b_{i} y_{i+1} = f_{i}, 1 \le i \le N-1, -a_{N} y_{N-1} + c_{N} y_{N} = f_{N}.
Similarly to the classical (monotone) elimination method, the two-sided variant eliminates unknowns in a system; however, unlike the former, it simultaneously performs elimination from both corners (the upper and lower ones) of the SLAE. In principle, the two-sided elimination can be regarded as the simplest version of the reduction method (in which m = 1 and monotone eliminations are conducted in opposite directions).
1.2 Mathematical description of the algorithm
Let m be the index of an equation which is the meeting point of two branches (the upper and lower ones) of the forward elimination path.
In the notation introduced above, the forward elimination path consists in calculating the elimination coefficients top down
[math]\alpha_{1} = b_{0}/c_{0}[/math],
[math]\beta_{1} = f_{0}/c_{0}, [/math]
[math]\alpha_{i+1} = b_{i}/(c_{i}-a_{i}\alpha_{i}), \quad i = 1, 2, \cdots , m-1[/math],
[math]\beta_{i+1} = (f_{i}+a_{i}\beta_{i})/(c_{i}-a_{i}\alpha_{i}), \quad i = 1, 2, \cdots , m-1. [/math]
and bottom up:
[math]\xi_{N} = a_{N}/c_{N}[/math],
[math]\eta_{N} = f_{N}/c_{N}[/math],
[math]\xi_{i} = a_{i}/(c_{i}-b_{i}\xi_{i+1})[/math], [math]\quad i = N-1, N-2, \cdots , m[/math],
[math]\eta_{i} = (f_{i}+b_{i}\eta_{i+1})/(c_{i}-b_{i}\xi_{i+1})[/math], [math]\quad i = N-1, N-2, \cdots , m. [/math]
Conventional descriptions of two-sided elimination (see [3]) do not usually give a formula for [math]y_{m-1}[/math]. This component is calculated later in the backward elimination path. However, by postponing the calculation of [math]y_{m-1}[/math] till the moment when [math]y_{m}[/math] is already calculated, we extend the critical path of the graph. Meanwhile, both components can be calculated simultaneously and almost indepependently one from the other.
Figure 1. Graph of the two-sided elimination method for n=14. The input and output data are not shown. The symbols / and f denote division and operation a+bc or a-bc, respectively.
1.3 Computational kernel of the algorithm
The computational kernel of this algorithm can be thought of as compiled of two parts (forward and backward elimination paths), as was also the case with the classical monotone elimination method. However, the width of both is twice as large compared to the monotone variant. The kernel of the forward path consists of two independent sequences of divisions, multiplications, and additions/subtractions. The kernel of the backward path contains only two independent sequences of multiplications and additions.
1.4 Macro structure of the algorithm
The macro structure of this algorithm can be represented as the combination of the forward and backward elimination paths. In addition, the forward path can be split into two macro units, namely, the forward paths of both right- and left-eliminations executed simultaneously, that is, in parallel, for two halves of the SLAE. The backward path can also be split into two macro units, namely, the backward paths of both right- and left-eliminations executed simultaneously, that is, in parallel, for two halves of the SLAE.
1.5 Implementation scheme of the serial algorithm
The method is executed as the following sequence of steps:
1. Initialize the forward elimination path:
[math]\alpha_{1} = b_{0}/c_{0}[/math],
[math]\beta_{1} = f_{0}/c_{0}[/math],
[math]\xi_{N} = a_{N}/c_{N}[/math],
[math]\eta_{1} = f_{N}/c_{N}[/math].
2. Execute the forward path formulas:
[math]\alpha_{i+1} = b_{i}/(c_{i}-a_{i}\alpha_{i})[/math], [math]\quad i = 1, 2, \cdots , m-1[/math],
[math]\beta_{i+1} = (f_{i}+a_{i}\beta_{i})/(c_{i}-a_{i}\alpha_{i})[/math], [math]\quad i = 1, 2, \cdots , m-1[/math],
[math]\xi_{i} = a_{i}/(c_{i}-b_{i}\xi_{i+1})[/math], [math]\quad i = N-1, N-2, \cdots , m[/math],
[math]\eta_{i} = (f_{i}+b_{i}\eta_{i+1})/(c_{i}-b_{i}\xi_{i+1})[/math], [math]\quad i = N-1, N-2, \cdots , m[/math].
3. Initialize the backward elimination path:
[math]y_{m-1} = (\beta_{m}+\alpha_{m}\eta_{m})/(1-\xi_{m}\alpha_{m})[/math],
[math]y_{m} = (\eta_{m}+\xi_{m}\beta_{m})/(1-\xi_{m}\alpha_{m})[/math].
4. Execute the backward path formulas:
[math]y_{i} = \alpha_{i+1} y_{i+1} + \beta_{i+1}[/math], [math]\quad i = m-1, m-2, \cdots , 1, 0[/math],
[math]y_{i+1} = \xi_{i+1} y_{i} + \eta_{i+1}[/math], [math]\quad i = m, m+1, \cdots , N-1[/math].
The formulas of the forward path contain double divisions by the same expressions. These divisions can be replaced by the calculation of reciprocal numbers succeeded by the multiplication by these numbers (see fig. 2).
Figure 2. Detailed graph of the two-sided elimination method for n=6 (each reciprocal number is only once calculated). The input and output data are not shown. Each calculation of a reciprocal number is marked by inv, and each multiplication is marked by mult. The operations repeated after the replacement of the right-hand side of SLAE are set out in grey.
1.6 Serial complexity of the algorithm
Consider a tridiagonal SLAE consisting of n equations with n unknowns. The two-sided elimination method as applied to solving such a SLAE in its (fastest) serial form requires
2n+2 divisions,
3n-2 additions/subtractions, and
3n-2 multiplications.
Thus, in terms of serial complexity, the two-sided elimination method is qualified as a linear complexity algorithm.
1.7 Information graph
The information graph of the two-sided elimination method is shown in fig. 1. It is obvious that the graph is parallel, the degree of parallelism being at most four and two at its forward and backward paths, respectively. During the forward path, the two branches and also two of their sub-branches (namely, the left sub-branch, corresponding to the matrix decomposition, and the right branch, responsible for the solution of the first bidiagonal system) can be executed in parallel.The right sub-branches correspond to the backward path. It is evident from the figure that not only the mathematical content of this process but also the structure of the algorithm graph and the direction of the corresponding data flows are in complete agreement with the appellation "backward path." The variant with divisions replaced by reciprocal number calculations is illustrated by the graph in fig. 2.
1.8 Parallelization resource of the algorithm