Properties of the algorithm:
 Sequential complexity: [math]O(n^3)[/math]
 Height of the parallel form: [math]O(n)[/math]
 Width of the parallel form: [math]O(n^2)[/math]
 Amount of input data: [math]\frac{n (n + 1)}{2}[/math]
 Amount of output data: [math]\frac{n (n + 1)}{2}[/math]

1 Properties and structure of the algorithm
1.1 General description
The Cholesky decomposition algorithm was first proposed by AndreLouis Cholesky (October 15, 1875  August 31, 1918) at the end of the First World War shortly before he was killed in battle. He was a French military officer and mathematician. The idea of this algorithm was published in 1924 by his fellow officer and, later, was used by Banachiewicz in 1938 [7]. In the Russian mathematical literature, the Cholesky decomposition is also known as the squareroot method [13] due to the square root operations used in this decomposition and not used in Gaussian elimination.
Originally, the Cholesky decomposition was used only for dense real symmetric positive definite matrices. At present, the application of this decomposition is much wider. For example, it can also be employed for the case of Hermitian matrices. In order to increase the computing performance, its block versions are often applied.
In the case of sparse matrices, the Cholesky decomposition is also widely used as the main stage of a direct method for solving linear systems. In order to reduce the memory requirements and the profile of the matrix, special reordering strategies are applied to minimize the number of arithmetic operations. A number of reordering strategies are used to identify the independent matrix blocks for parallel computing systems.
1.2 Mathematical description
Input data: a symmetric positive definite matrix [math]A[/math] whose elements are denoted by [math]a_{ij}[/math]).
Output data: the lower triangular matrix [math]L[/math] whose elements are denoted by [math]l_{ij}[/math]).
The Cholesky algorithm can be represented in the form
 [math]
\begin{align}
l_{11} & = \sqrt{a_{11}}, \\
l_{j1} & = \frac{a_{j1}}{l_{11}}, \quad j \in [2, n], \\
l_{ii} & = \sqrt{a_{ii}  \sum_{p = 1}^{i  1} l_{ip}^2}, \quad i \in [2, n], \\
l_{ji} & = \left (a_{ji}  \sum_{p = 1}^{i  1} l_{ip} l_{jp} \right ) / l_{ii}, \quad i \in [2, n  1], j \in [i + 1, n].
\end{align}
[/math]
There exist block versions of this algorithm; however, here we consider only its “dot” version.
In a number of implementations, the division by the diagonal element [math]l_{ii}[/math] is made in the following two steps: the computation of [math]\frac{1}{l_{ii}}[/math] and, then, the multiplication of the result by the modified values of [math]a_{ji}[/math] . Here we do not consider this computational scheme, since this scheme has worse parallel characteristics than that given above.
 Read more…