Repeated Thomas algorithm, pointwise version

From Algowiki
Revision as of 21:28, 31 January 2016 by Икрамов (talk | contribs)
Jump to navigation Jump to search

Primary authors of this description: A.V.Frolov, Vad.V.Voevodin (Section 2.2), A.M.Teplov (Section 2.4)

Contents

1 Properties and structure of the algorithm 1.1 General description of the algorithm 1.2 Mathematical description of the algorithm 1.3 Computational kernel of the algorithm 1.4 Macro structure of the algorithm 1.5 Implementation scheme of the serial algorithm 1.6 Serial complexity of the algorithm 1.7 Information graph 1.8 Parallelization resource of the algorithm 1.9 Input and output data of the algorithm 1.10 Properties of the algorithm 2 Software implementation of the algorithm 2.1 Implementation peculiarities of the serial algorithm 2.2 Locality of data and computations 2.2.1 Locality of implementation 2.2.1.1 Structure of memory access and a qualitative estimation of locality 2.2.1.2 Quantitative estimation of locality 2.3 Possible methods and considerations for parallel implementation of the algorithm 2.4 Scalability of the algorithm and its implementations 2.4.1 Scalability of the algorithm 2.4.2 Scalability of of the algorithm implementation 2.5 Dynamic characteristics and efficiency of the algorithm implementation 2.6 Conclusions for different classes of computer architecture 2.7 Existing implementations of the algorithm 3 References


1 Properties and structure of the algorithm

1.1 General description of the algorithm

The two-sided elimination method is a variant of Gaussian elimination used for solving a system of linear algebraic equations (SLAE) [1][2] Ax = b, where

A = \begin{bmatrix}

a_{11} & a_{12}  & 0 &    \cdots & \cdots & 0 \\
a_{21} & a_{22}  & a_{23}&  \cdots & \cdots & 0 \\
0 &  a_{32} & a_{33}  &     \cdots & \cdots & 0 \\
\vdots & \vdots & \ddots & \ddots & \ddots & 0 \\
0 & \cdots & \cdots & a_{n-1 n-2} & a_{n-1 n-1}  & a_{n-1 n} \\
0 & \cdots & \cdots & 0 & a_{n n-1}  & a_{n n} \\

\end{bmatrix}, x = \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \\ \end{bmatrix}, b = \begin{bmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{n} \\ \end{bmatrix}

However, presentations of the elimination method [3] often use a different notation and a numbering for the right-hand side and matrix of the system. For instance, the above SLAE can be written as

A = \begin{bmatrix}

c_{0} & -b_{0}  & 0      &  \cdots & \cdots & 0 \\
-a_{1} & c_{1}  & -b_{1} &  \cdots & \cdots & 0 \\
0      &  -a_{2} & c_{2} &  \cdots & \cdots & 0 \\
\vdots & \vdots & \ddots & \ddots & \ddots & 0 \\
0 & \cdots & \cdots & -a_{N-1} & c_{N-1}  & -b_{N-1} \\
0 & \cdots & \cdots & 0 & -a_{N}  & c_{N} \\

\end{bmatrix}\begin{bmatrix} y_{0} \\ y_{1} \\ \vdots \\ y_{N} \\ \end{bmatrix} = \begin{bmatrix} f_{0} \\ f_{1} \\ \vdots \\ f_{N} \\ \end{bmatrix}

(here, N=n-1). If each equation is written separately, then we have

c_{0} y_{0} - b_{0} y_{1} = f_{0}, -a_{i} y_{i-1} + c_{i} y_{i} - b_{i} y_{i+1} = f_{i}, 1 \le i \le N-1, -a_{N} y_{N-1} + c_{N} y_{N} = f_{N}.

Similarly to the classical (monotone) elimination method, the two-sided variant eliminates unknowns in a system; however, unlike the former, it simultaneously performs elimination from both corners (the upper and lower ones) of the SLAE. In principle, the two-sided elimination can be regarded as the simplest version of the reduction method (in which m = 1 and monotone eliminations are conducted in opposite directions).


1.2 Mathematical description of the algorithm

Let m be the index of an equation which is the meeting point of two branches (the upper and lower ones) of the forward elimination path.

In the notation introduced above, the forward elimination path consists in calculating the elimination coefficients top down

[math]\alpha_{1} = b_{0}/c_{0}[/math],

[math]\beta_{1} = f_{0}/c_{0}, [/math]

[math]\alpha_{i+1} = b_{i}/(c_{i}-a_{i}\alpha_{i}), \quad i = 1, 2, \cdots , m-1[/math],

[math]\beta_{i+1} = (f_{i}+a_{i}\beta_{i})/(c_{i}-a_{i}\alpha_{i}), \quad i = 1, 2, \cdots , m-1. [/math]

and bottom up:

[math]\xi_{N} = a_{N}/c_{N}[/math],

[math]\eta_{N} = f_{N}/c_{N}[/math],

[math]\xi_{i} = a_{i}/(c_{i}-b_{i}\xi_{i+1})[/math], [math]\quad i = N-1, N-2, \cdots , m[/math],

[math]\eta_{i} = (f_{i}+b_{i}\eta_{i+1})/(c_{i}-b_{i}\xi_{i+1})[/math], [math]\quad i = N-1, N-2, \cdots , m. [/math]

Conventional descriptions of two-sided elimination (see [3]) do not usually give a formula for [math]y_{m-1}[/math]. This component is calculated later in the backward elimination path. However, by postponing the calculation of [math]y_{m-1}[/math] till the moment when [math]y_{m}[/math] is already calculated, we extend the critical path of the graph. Meanwhile, both components can be calculated simultaneously and almost indepependently one from the other.

Figure 1. Graph of the two-sided elimination method for n=14. The input and output data are not shown. The symbols / and f denote division and operation a+bc or a-bc, respectively.


1.3 Computational kernel of the algorithm

The computational kernel of this algorithm can be thought of as compiled of two parts (forward and backward elimination paths), as was also the case with the classical monotone elimination method. However, the width of both is twice as large compared to the monotone variant. The kernel of the forward path consists of two independent sequences of divisions, multiplications, and additions/subtractions. The kernel of the backward path contains only two independent sequences of multiplications and additions.


1.4 Macro structure of the algorithm

The macro structure of this algorithm can be represented as the combination of the forward and backward elimination paths. In addition, the forward path can be split into two macro units, namely, the forward paths of both right- and left-eliminations executed simultaneously, that is, in parallel, for two halves of the SLAE. The backward path can also be split into two macro units, namely, the backward paths of both right- and left-eliminations executed simultaneously, that is, in parallel, for two halves of the SLAE.


1.5 Implementation scheme of the serial algorithm

The method is executed as the following sequence of steps:

1. Initialize the forward elimination path:

[math]\alpha_{1} = b_{0}/c_{0}[/math],

[math]\beta_{1} = f_{0}/c_{0}[/math],

[math]\xi_{N} = a_{N}/c_{N}[/math],

[math]\eta_{1} = f_{N}/c_{N}[/math].

2. Execute the forward path formulas:

[math]\alpha_{i+1} = b_{i}/(c_{i}-a_{i}\alpha_{i})[/math], [math]\quad i = 1, 2, \cdots , m-1[/math],

[math]\beta_{i+1} = (f_{i}+a_{i}\beta_{i})/(c_{i}-a_{i}\alpha_{i})[/math], [math]\quad i = 1, 2, \cdots , m-1[/math],

[math]\xi_{i} = a_{i}/(c_{i}-b_{i}\xi_{i+1})[/math], [math]\quad i = N-1, N-2, \cdots , m[/math],

[math]\eta_{i} = (f_{i}+b_{i}\eta_{i+1})/(c_{i}-b_{i}\xi_{i+1})[/math], [math]\quad i = N-1, N-2, \cdots , m[/math].

3. Initialize the backward elimination path:

[math]y_{m-1} = (\beta_{m}+\alpha_{m}\eta_{m})/(1-\xi_{m}\alpha_{m})[/math],

[math]y_{m} = (\eta_{m}+\xi_{m}\beta_{m})/(1-\xi_{m}\alpha_{m})[/math].

4. Execute the backward path formulas:

[math]y_{i} = \alpha_{i+1} y_{i+1} + \beta_{i+1}[/math], [math]\quad i = m-1, m-2, \cdots , 1, 0[/math],

[math]y_{i+1} = \xi_{i+1} y_{i} + \eta_{i+1}[/math], [math]\quad i = m, m+1, \cdots , N-1[/math].

The formulas of the forward path contain double divisions by the same expressions. These divisions can be replaced by the calculation of reciprocal numbers succeeded by the multiplication by these numbers (see fig. 2).


Figure 2. Detailed graph of the two-sided elimination method for n=6 (each reciprocal number is only once calculated). The input and output data are not shown. Each calculation of a reciprocal number is marked by inv, and each multiplication is marked by mult. The operations repeated after the replacement of the right-hand side of SLAE are set out in grey.


1.6 Serial complexity of the algorithm

Consider a tridiagonal SLAE consisting of n equations with n unknowns. The two-sided elimination method as applied to solving such a SLAE in its (fastest) serial form requires

2n+2 divisions,

3n-2 additions/subtractions, and

3n-2 multiplications.

Thus, in terms of serial complexity, the two-sided elimination method is qualified as a linear complexity algorithm.


1.7 Information graph

The information graph of the two-sided elimination method is shown in fig. 1. It is obvious that the graph is parallel, the degree of parallelism being at most four and two at its forward and backward paths, respectively. During the forward path, the two branches and also two of their sub-branches (namely, the left sub-branch, corresponding to the matrix decomposition, and the right branch, responsible for the solution of the first bidiagonal system) can be executed in parallel.The right sub-branches correspond to the backward path. It is evident from the figure that not only the mathematical content of this process but also the structure of the algorithm graph and the direction of the corresponding data flows are in complete agreement with the appellation "backward path." The variant with divisions replaced by reciprocal number calculations is illustrated by the graph in fig. 2.


1.8 Parallelization resource of the algorithm

Both branches of the forward elimination path are executed in parallel when [math]N=2m-1[/math], that is, when [math]n=2m[/math]. Then the two-sided elimination method requires that the following layers be executed:

[math]m+1[/math] division layers (all the layers except for one contain 4 divisions, and the remaining layer contains 2 divisions),

[math]2m-1[/math] multiplication layers and the same number of addition/subtraction layers ([math]m-1[/math] layers contain four operations, another [math]m-1[/math] layers contain two operations, and the remaining layer contains three operations).

Thus, in terms of the parallel form height, the two-sided elimination method is qualified as an algorithm of complexity O(n). In terms of the parallel form width, its complexity is 4.

Two branches cannot be synchronized if [math]n[/math] is odd. It is therefore preferable to choose problems of even dimensions.


1.9 Input and output data of the algorithm

Input data: tridiagonal matrix A (with entries a_{ij}), vector b (with components b_{i}).

Output data: vector x (with components x_{i}).

Size of the output data: n.


1.10 Properties of the algorithm

It is clearly seen that the ratio of the serial to parallel complexity is a constant (which is less than 4).

The computational power of the algorithm, understood as the ratio of the number of operations to the total size of the input and output data, is also a constant.

Within the framework of the chosen version, the algorithm is completely determined.