Difference between revisions of "Repeated Thomas algorithm, pointwise version"
[unchecked revision] | [unchecked revision] |
Line 125: | Line 125: | ||
The macro structure of this algorithm can be represented as the combination of the forward and backward elimination paths. In addition, the forward path can be split into two macro units, namely, the forward paths of both right- and left-eliminations executed simultaneously, that is, in parallel, for two halves of the SLAE. The backward path can also be split into two macro units, namely, the backward paths of both right- and left-eliminations executed simultaneously, that is, in parallel, for two halves of the SLAE. | The macro structure of this algorithm can be represented as the combination of the forward and backward elimination paths. In addition, the forward path can be split into two macro units, namely, the forward paths of both right- and left-eliminations executed simultaneously, that is, in parallel, for two halves of the SLAE. The backward path can also be split into two macro units, namely, the backward paths of both right- and left-eliminations executed simultaneously, that is, in parallel, for two halves of the SLAE. | ||
+ | |||
+ | |||
+ | '''1.5 Implementation scheme of the serial algorithm''' | ||
+ | |||
+ | The method is executed as the following sequence of steps: | ||
+ | |||
+ | 1. Initialize the forward elimination path: | ||
+ | |||
+ | <math>\alpha_{1} = b_{0}/c_{0}</math>, | ||
+ | |||
+ | <math>\beta_{1} = f_{0}/c_{0}</math>, | ||
+ | |||
+ | <math>\xi_{N} = a_{N}/c_{N}</math>, | ||
+ | |||
+ | <math>\eta_{1} = f_{N}/c_{N}</math>. | ||
+ | |||
+ | 2. Execute the forward path formulas: | ||
+ | |||
+ | <math>\alpha_{i+1} = b_{i}/(c_{i}-a_{i}\alpha_{i})</math>, <math>\quad i = 1, 2, \cdots , m-1</math>, | ||
+ | |||
+ | <math>\beta_{i+1} = (f_{i}+a_{i}\beta_{i})/(c_{i}-a_{i}\alpha_{i})</math>, <math>\quad i = 1, 2, \cdots , m-1</math>, | ||
+ | |||
+ | <math>\xi_{i} = a_{i}/(c_{i}-b_{i}\xi_{i+1})</math>, <math>\quad i = N-1, N-2, \cdots , m</math>, | ||
+ | |||
+ | <math>\eta_{i} = (f_{i}+b_{i}\eta_{i+1})/(c_{i}-b_{i}\xi_{i+1})</math>, <math>\quad i = N-1, N-2, \cdots , m</math>. | ||
+ | |||
+ | 3. Initialize the backward elimination path: | ||
+ | |||
+ | <math>y_{m-1} = (\beta_{m}+\alpha_{m}\eta_{m})/(1-\xi_{m}\alpha_{m})</math>, | ||
+ | |||
+ | <math>y_{m} = (\eta_{m}+\xi_{m}\beta_{m})/(1-\xi_{m}\alpha_{m})</math>. | ||
+ | |||
+ | 4. Execute the backward path formulas: | ||
+ | |||
+ | <math>y_{i} = \alpha_{i+1} y_{i+1} + \beta_{i+1}</math>, <math>\quad i = m-1, m-2, \cdots , 1, 0</math>, | ||
+ | |||
+ | <math>y_{i+1} = \xi_{i+1} y_{i} + \eta_{i+1}</math>, <math>\quad i = m, m+1, \cdots , N-1</math>. | ||
+ | |||
+ | The formulas of the forward path contain double divisions by the same expressions. These divisions can be replaced by the calculation of reciprocal numbers succeeded by the multiplication by these numbers (see fig. 2). |
Revision as of 18:18, 31 January 2016
Primary authors of this description: A.V.Frolov, Vad.V.Voevodin (Section 2.2), A.M.Teplov (Section 2.4)
Contents
1 Properties and structure of the algorithm 1.1 General description of the algorithm 1.2 Mathematical description of the algorithm 1.3 Computational kernel of the algorithm 1.4 Macro structure of the algorithm 1.5 Implementation scheme of the serial algorithm 1.6 Serial complexity of the algorithm 1.7 Information graph 1.8 Parallelization resource of the algorithm 1.9 Input and output data of the algorithm 1.10 Properties of the algorithm 2 Software implementation of the algorithm 2.1 Implementation peculiarities of the serial algorithm 2.2 Locality of data and computations 2.2.1 Locality of implementation 2.2.1.1 Structure of memory access and a qualitative estimation of locality 2.2.1.2 Quantitative estimation of locality 2.3 Possible methods and considerations for parallel implementation of the algorithm 2.4 Scalability of the algorithm and its implementations 2.4.1 Scalability of the algorithm 2.4.2 Scalability of of the algorithm implementation 2.5 Dynamic characteristics and efficiency of the algorithm implementation 2.6 Conclusions for different classes of computer architecture 2.7 Existing implementations of the algorithm 3 References
1 Properties and structure of the algorithm
1.1 General description of the algorithm
The two-sided elimination method is a variant of Gaussian elimination used for solving a system of linear algebraic equations (SLAE) [1][2] Ax = b, where
A = \begin{bmatrix}
a_{11} & a_{12} & 0 & \cdots & \cdots & 0 \\ a_{21} & a_{22} & a_{23}& \cdots & \cdots & 0 \\ 0 & a_{32} & a_{33} & \cdots & \cdots & 0 \\ \vdots & \vdots & \ddots & \ddots & \ddots & 0 \\ 0 & \cdots & \cdots & a_{n-1 n-2} & a_{n-1 n-1} & a_{n-1 n} \\ 0 & \cdots & \cdots & 0 & a_{n n-1} & a_{n n} \\
\end{bmatrix}, x = \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \\ \end{bmatrix}, b = \begin{bmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{n} \\ \end{bmatrix}
However, presentations of the elimination method [3] often use a different notation and a numbering for the right-hand side and matrix of the system. For instance, the above SLAE can be written as
A = \begin{bmatrix}
c_{0} & -b_{0} & 0 & \cdots & \cdots & 0 \\ -a_{1} & c_{1} & -b_{1} & \cdots & \cdots & 0 \\ 0 & -a_{2} & c_{2} & \cdots & \cdots & 0 \\ \vdots & \vdots & \ddots & \ddots & \ddots & 0 \\ 0 & \cdots & \cdots & -a_{N-1} & c_{N-1} & -b_{N-1} \\ 0 & \cdots & \cdots & 0 & -a_{N} & c_{N} \\
\end{bmatrix}\begin{bmatrix} y_{0} \\ y_{1} \\ \vdots \\ y_{N} \\ \end{bmatrix} = \begin{bmatrix} f_{0} \\ f_{1} \\ \vdots \\ f_{N} \\ \end{bmatrix}
(here, N=n-1). If each equation is written separately, then we have
c_{0} y_{0} - b_{0} y_{1} = f_{0}, -a_{i} y_{i-1} + c_{i} y_{i} - b_{i} y_{i+1} = f_{i}, 1 \le i \le N-1, -a_{N} y_{N-1} + c_{N} y_{N} = f_{N}.
Similarly to the classical (monotone) elimination method, the two-sided variant eliminates unknowns in a system; however, unlike the former, it simultaneously performs elimination from both corners (the upper and lower ones) of the SLAE. In principle, the two-sided elimination can be regarded as the simplest version of the reduction method (in which m = 1 and monotone eliminations are conducted in opposite directions).
1.2 Mathematical description of the algorithm
Let m be the index of an equation which is the meeting point of two branches (the upper and lower ones) of the forward elimination path.
In the notation introduced above, the forward elimination path consists in calculating the elimination coefficients top down
[math]\alpha_{1} = b_{0}/c_{0}[/math],
[math]\beta_{1} = f_{0}/c_{0}, [/math]
[math]\alpha_{i+1} = b_{i}/(c_{i}-a_{i}\alpha_{i}), \quad i = 1, 2, \cdots , m-1[/math],
[math]\beta_{i+1} = (f_{i}+a_{i}\beta_{i})/(c_{i}-a_{i}\alpha_{i}), \quad i = 1, 2, \cdots , m-1. [/math]
and bottom up:
[math]\xi_{N} = a_{N}/c_{N}[/math],
[math]\eta_{N} = f_{N}/c_{N}[/math],
[math]\xi_{i} = a_{i}/(c_{i}-b_{i}\xi_{i+1})[/math], [math]\quad i = N-1, N-2, \cdots , m[/math],
[math]\eta_{i} = (f_{i}+b_{i}\eta_{i+1})/(c_{i}-b_{i}\xi_{i+1})[/math], [math]\quad i = N-1, N-2, \cdots , m. [/math]
Conventional descriptions of two-sided elimination (see [3]) do not usually give a formula for [math]y_{m-1}[/math]. This component is calculated later in the backward elimination path. However, by postponing the calculation of [math]y_{m-1}[/math] till the moment when [math]y_{m}[/math] is already calculated, we extend the critical path of the graph. Meanwhile, both components can be calculated simultaneously and almost indepependently one from the other.
1.3 Computational kernel of the algorithm
The computational kernel of this algorithm can be thought of as compiled of two parts (forward and backward elimination paths), as was also the case with the classical monotone elimination method. However, the width of both is twice as large compared to the monotone variant. The kernel of the forward path consists of two independent sequences of divisions, multiplications, and additions/subtractions. The kernel of the backward path contains only two independent sequences of multiplications and additions.
1.4 Macro structure of the algorithm
The macro structure of this algorithm can be represented as the combination of the forward and backward elimination paths. In addition, the forward path can be split into two macro units, namely, the forward paths of both right- and left-eliminations executed simultaneously, that is, in parallel, for two halves of the SLAE. The backward path can also be split into two macro units, namely, the backward paths of both right- and left-eliminations executed simultaneously, that is, in parallel, for two halves of the SLAE.
1.5 Implementation scheme of the serial algorithm
The method is executed as the following sequence of steps:
1. Initialize the forward elimination path:
[math]\alpha_{1} = b_{0}/c_{0}[/math],
[math]\beta_{1} = f_{0}/c_{0}[/math],
[math]\xi_{N} = a_{N}/c_{N}[/math],
[math]\eta_{1} = f_{N}/c_{N}[/math].
2. Execute the forward path formulas:
[math]\alpha_{i+1} = b_{i}/(c_{i}-a_{i}\alpha_{i})[/math], [math]\quad i = 1, 2, \cdots , m-1[/math],
[math]\beta_{i+1} = (f_{i}+a_{i}\beta_{i})/(c_{i}-a_{i}\alpha_{i})[/math], [math]\quad i = 1, 2, \cdots , m-1[/math],
[math]\xi_{i} = a_{i}/(c_{i}-b_{i}\xi_{i+1})[/math], [math]\quad i = N-1, N-2, \cdots , m[/math],
[math]\eta_{i} = (f_{i}+b_{i}\eta_{i+1})/(c_{i}-b_{i}\xi_{i+1})[/math], [math]\quad i = N-1, N-2, \cdots , m[/math].
3. Initialize the backward elimination path:
[math]y_{m-1} = (\beta_{m}+\alpha_{m}\eta_{m})/(1-\xi_{m}\alpha_{m})[/math],
[math]y_{m} = (\eta_{m}+\xi_{m}\beta_{m})/(1-\xi_{m}\alpha_{m})[/math].
4. Execute the backward path formulas:
[math]y_{i} = \alpha_{i+1} y_{i+1} + \beta_{i+1}[/math], [math]\quad i = m-1, m-2, \cdots , 1, 0[/math],
[math]y_{i+1} = \xi_{i+1} y_{i} + \eta_{i+1}[/math], [math]\quad i = m, m+1, \cdots , N-1[/math].
The formulas of the forward path contain double divisions by the same expressions. These divisions can be replaced by the calculation of reciprocal numbers succeeded by the multiplication by these numbers (see fig. 2).