Algorithm level

Difference between revisions of "Dense matrix multiplication (serial version for real matrices)"

From Algowiki
Jump to navigation Jump to search
[quality revision][checked revision]
 
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Primary authors of this description: [[:ru:Участник:Frolov|A.V.Frolov]], [[:ru:Участник:VadimVV|Vad.V.Voevodin]] ([[#Locality of data and computations|Section 2.2]]), [[:ru:Участник:Teplov|A.M.Teplov]] (раздел [[#Scalability of the algorithm and its implementations|Section 2.4]])
+
{{level-a}}
  
== Description of the properties and structure of the algorithm ==
+
Primary authors of this description: [[:ru:Участник:Frolov|A.V.Frolov]].
  
=== General description ===
+
== Properties and structure of the algorithm ==
  
'''Matrix multiplication''' is one of the basic procedures in algorithmic linear algebra, which is widely used in a number of different methods. Here, we consider the product <math>С = AВ</math>&nbsp; of dense real matrices (serial real version), that is, the version in which neither a special form of matrices nor the associative properties of the addition operation are used.
+
=== General description of the algorithm ===
  
=== Mathematical description ===
+
'''Matrix multiplication''' is one of the basic procedures in algorithmic linear algebra, which is widely used in a number of different methods. Here, we consider the product <math>C = AB</math>&nbsp; of dense real matrices (serial real version), that is, the version in which neither a special form of matrices nor the associative properties of the addition operation are used<ref>Voevodin V.V., Kuznetsov Yu.A. Matrices and computations, Moscow: Nauka, 1984.</ref>.
  
Input data: dense matrix <math>A</math> of size m-by-n (with entries <math>a_{ij}</math>), dense matrix <math>B</math> of size n-by-l (with entries <math>b_{ij}</math>).
+
=== Mathematical description of the algorithm ===
 +
 
 +
Input data: dense matrix <math>A</math> of size <math>m</math>-by-<math>n</math> (with entries <math>a_{ij}</math>), dense matrix <math>B</math> of size <math>n</math>-by-<math>l</math> (with entries <math>b_{ij}</math>).
  
 
Output data: dense matrix <math>C</math> (with entries <math>c_{ij}</math>).
 
Output data: dense matrix <math>C</math> (with entries <math>c_{ij}</math>).
Line 30: Line 32:
 
Depending on the problem requirements, these sums are calculated with or without using the accumulation mode.  
 
Depending on the problem requirements, these sums are calculated with or without using the accumulation mode.  
  
=== Macrostructure of the algorithm ===
+
=== Macro structure of the algorithm ===
  
 
As already noted in [[#Computational kernel of the algorithm|computational kernel of the algorithm description]], the basic part of the dense matrix multiplication is compiled of repeated dot products of the rows of <math>A</math> and the columns of <math>B</math> (there are on the whole <math>ml</math> such products):
 
As already noted in [[#Computational kernel of the algorithm|computational kernel of the algorithm description]], the basic part of the dense matrix multiplication is compiled of repeated dot products of the rows of <math>A</math> and the columns of <math>B</math> (there are on the whole <math>ml</math> such products):
Line 56: Line 58:
 
* <math>mnl</math> multiplications and the same number of additions.  
 
* <math>mnl</math> multiplications and the same number of additions.  
  
The use of accumulation requires that multiplications and subtractions be done in the double precision mode (or such procedures as the Fortran function DPROD be used). This increases the computation time for performing the matrix multiplication.  
+
The use of accumulation requires that multiplications and additions be done in the double precision mode (or such procedures as the Fortran function DPROD be used). This increases the computation time for performing the matrix multiplication.  
  
In terms of serial complexity, the forward substitution is qualified as a ''cubic complexity'' algorithm (or a ''trilinear complexity'' algorithm if the matrices are rectangular).
+
In terms of serial complexity, the dense matrix multiplication is qualified as a ''cubic complexity'' algorithm (or a ''trilinear complexity'' algorithm if the matrices are rectangular).
  
 
=== Information graph ===
 
=== Information graph ===
  
We describe the [[глоссарий#Граф алгоритма|algorithm graph]] both analytically and graphically.  
+
We describe the [[Glossary#Algorithm graph|algorithm graph]] both analytically and graphically.  
  
 
The algorithm graph of multiplying dense matrices consists of a single group of vertices placed at integer nodes of a three-dimensional domain. The corresponding operation is  
 
The algorithm graph of multiplying dense matrices consists of a single group of vertices placed at integer nodes of a three-dimensional domain. The corresponding operation is  
Line 83: Line 85:
 
* an output data item <math>c_{ij}</math> if <math>k = n</math>.
 
* an output data item <math>c_{ij}</math> if <math>k = n</math>.
  
[[file:Dense mtrx product.png|thumb|center|800px|Multiplication of dense matrices with demonstration of the output data]]
+
[[file:Dense mtrx product.png|thumb|center|800px|Figure 1. Multiplication of dense matrices with demonstration of the output data]]
  
=== Parallelism resource of the algorithm ===
+
=== Parallelization resource of the algorithm ===
  
 
The parallel version of the algorithm for multiplying square matrices of order n requires that the following layers be successively performed:  
 
The parallel version of the algorithm for multiplying square matrices of order n requires that the following layers be successively performed:  
Line 99: Line 101:
 
In terms of the parallel form height, the dense matrix multiplication is qualified as a quadratic complexity algorithm. In terms of the parallel form width, its complexity is also ''quadratic'' (for square matrices) or ''bilinear'' (for general rectangular matrices).
 
In terms of the parallel form height, the dense matrix multiplication is qualified as a quadratic complexity algorithm. In terms of the parallel form width, its complexity is also ''quadratic'' (for square matrices) or ''bilinear'' (for general rectangular matrices).
  
=== Description of the input and output data ===
+
=== Input and output data of the algorithm ===
  
 
'''Input data''': matrix <math>A</math> (with entries <math>a_{ij}</math>), matrix <math>B</math> (with entries <math>b_{ij}</math>)).
 
'''Input data''': matrix <math>A</math> (with entries <math>a_{ij}</math>), matrix <math>B</math> (with entries <math>b_{ij}</math>)).
Line 107: Line 109:
 
'''Output data ''': matrix <math>C</math> (with entries <math>c_{ij}</math>).
 
'''Output data ''': matrix <math>C</math> (with entries <math>c_{ij}</math>).
  
''''''Size of the output data''': <math>ml</math>
+
'''Size of the output data''': <math>ml</math>
  
 
=== Properties of the algorithm ===
 
=== Properties of the algorithm ===
Line 116: Line 118:
  
 
The algorithm of dense matrix multiplication is completely determined. We do not consider any other order of performing associative operations in this version.  
 
The algorithm of dense matrix multiplication is completely determined. We do not consider any other order of performing associative operations in this version.  
 +
 +
== Software implementation of the algorithm ==
 +
 +
=== Implementation peculiarities of the serial algorithm ===
 +
 +
In its simplest form, the matrix multiplication algorithm in Fortran can be written as follows:
 +
 +
<source lang="fortran">
 +
       
 +
DO  I = 1, M
 +
        DO  J = 1, L
 +
S = 0.
 +
DO  K = 1, N
 +
S = S + DPROD(A(I,K), B(K,J))
 +
END DO
 +
        C(I, J) = S
 +
        END DO
 +
END DO
 +
</source>
 +
In this case the <math>S</math> variable must be double precision to implement the accumulation mode.
 +
 +
=== Possible methods and considerations for parallel implementation of the algorithm ===
 +
=== Run results ===
 +
=== Conclusions for different classes of computer architecture ===
 +
 +
== References ==
 +
 +
<references />
  
 
[[Ru:Перемножение плотных неособенных матриц (последовательный вещественный вариант)]]
 
[[Ru:Перемножение плотных неособенных матриц (последовательный вещественный вариант)]]
  
[[Category:Started articles]]
+
[[Category:Finished articles]]

Latest revision as of 10:55, 8 July 2022


Primary authors of this description: A.V.Frolov.

1 Properties and structure of the algorithm

1.1 General description of the algorithm

Matrix multiplication is one of the basic procedures in algorithmic linear algebra, which is widely used in a number of different methods. Here, we consider the product [math]C = AB[/math]  of dense real matrices (serial real version), that is, the version in which neither a special form of matrices nor the associative properties of the addition operation are used[1].

1.2 Mathematical description of the algorithm

Input data: dense matrix [math]A[/math] of size [math]m[/math]-by-[math]n[/math] (with entries [math]a_{ij}[/math]), dense matrix [math]B[/math] of size [math]n[/math]-by-[math]l[/math] (with entries [math]b_{ij}[/math]).

Output data: dense matrix [math]C[/math] (with entries [math]c_{ij}[/math]).

Formulas of the method:

[math] \begin{align} c_{ij} = \sum_{k = 1}^{n} a_{ik} b_{kj}, \quad i \in [1, m], \quad i \in [1, l]. \end{align} [/math]

There exists also a block version of the method; however, this description treats only the pointwise version.

1.3 Computational kernel of the algorithm

The computational kernel of the dense matrix multiplication can be compiled of repeated products of [math]A[/math] with the columns of [math]B[/math] (there are on the whole [math]l[/math] such products) or (upon more detailed inspection) of repeated dot products of the rows of [math]A[/math] and the columns of [math]B[/math] (there are on the whole [math]ml[/math] such products):

[math]\sum_{k = 1}^{n} a_{ik} b_{kj}[/math]

Depending on the problem requirements, these sums are calculated with or without using the accumulation mode.

1.4 Macro structure of the algorithm

As already noted in computational kernel of the algorithm description, the basic part of the dense matrix multiplication is compiled of repeated dot products of the rows of [math]A[/math] and the columns of [math]B[/math] (there are on the whole [math]ml[/math] such products):

[math]\sum_{k = 1}^{n} a_{ik} b_{kj}[/math]

These calculations are performed with or without using the accumulation mode.

1.5 Implementation scheme of the serial algorithm

For all [math]i[/math] from [math]1[/math] to [math]m[/math] and for all [math]j[/math] from [math]1[/math] to [math]l[/math], do

[math]c_{ij} = \sum_{k = 1}^{n} a_{ik} b_{kj}[/math]

We emphasize that sums of the form [math]\sum_{k = 1}^{n} a_{ik} b_{kj}[/math] are calculated in the accumulation mode by adding the products [math]a_{ik} b_{kj}[/math] to the current (temporary) value of [math]c_{ij}[/math].The index [math]k[/math] increases from [math]1[/math] to [math]n[/math]. All the sums are initialized to zero. The general scheme is virtually the same if summations are performed in the decreasing order of indices; therefore, this case is not considered. Other orders of summation change the parallel properties of the algorithm and are considered in separate descriptions.

1.6 Serial complexity of the algorithm

The multiplication of two square matrices of order [math]n[/math] (that is, [math]m=n=l[/math]) in the (fastest) serial version requires

  • [math]n^3[/math] multiplications and the same number of additions.

The multiplication of an [math]m[/math]-by-[math]n[/math] matrix by an [math]n[/math]-by-[math]l[/math] matrix in the (fastest) serial version requires

  • [math]mnl[/math] multiplications and the same number of additions.

The use of accumulation requires that multiplications and additions be done in the double precision mode (or such procedures as the Fortran function DPROD be used). This increases the computation time for performing the matrix multiplication.

In terms of serial complexity, the dense matrix multiplication is qualified as a cubic complexity algorithm (or a trilinear complexity algorithm if the matrices are rectangular).

1.7 Information graph

We describe the algorithm graph both analytically and graphically.

The algorithm graph of multiplying dense matrices consists of a single group of vertices placed at integer nodes of a three-dimensional domain. The corresponding operation is [math]a+bc[/math].

The natural coordinates of this domain are as follows:

  • [math]i[/math] varies from [math]1[/math] to [math]m[/math], taking all the integer values in this range;
  • [math]j[/math] varies from [math]1[/math] to [math]l[/math], taking all the integer values in this range;
  • [math]k[/math] varies from [math]1[/math] to [math]n[/math], taking all the integer values in this range.

The arguments of the operation are as follows:

  • [math]a[/math] is:
    • the constant [math]0[/math] if [math]k = 1[/math];
    • the result of performing the operation corresponding to the vertex with coordinates [math]i, j, k-1[/math] if [math]k \gt 1[/math];
  • [math]b[/math] is the element [math]a_{ik}[/math] of the input data;
  • [math]c[/math] is the element [math]b_{kj}[/math] of the input data;

The result of performing the operation is:

  • an intermediate data item if [math]k \lt n[/math];
  • an output data item [math]c_{ij}[/math] if [math]k = n[/math].
Figure 1. Multiplication of dense matrices with demonstration of the output data

1.8 Parallelization resource of the algorithm

The parallel version of the algorithm for multiplying square matrices of order n requires that the following layers be successively performed:

  • [math]n[/math] layers of multiplications and the same numbers of layers for addition (there are [math]n^2[/math] operations in each layer).

The multiplication of an [math]m[/math]-by-[math]n[/math] matrix by an [math]n[/math]-by-[math]l[/math] matrix in the (fastest) serial version requires

  • [math]n[/math] layers of multiplications and the same numbers of layers for addition (there are [math]ml[/math] operations in each layer).

The use of accumulation requires that multiplications and subtractions be done in the double precision mode. For the parallel version, this implies that virtually all the intermediate calculations in the algorithm must be performed in double precision if the accumulation option is used. Unlike the situation with the serial version, this results in a certain increase in the required memory size.

In terms of the parallel form height, the dense matrix multiplication is qualified as a quadratic complexity algorithm. In terms of the parallel form width, its complexity is also quadratic (for square matrices) or bilinear (for general rectangular matrices).

1.9 Input and output data of the algorithm

Input data: matrix [math]A[/math] (with entries [math]a_{ij}[/math]), matrix [math]B[/math] (with entries [math]b_{ij}[/math])).

Size of the input data : [math]mn+nl[/math]

Output data : matrix [math]C[/math] (with entries [math]c_{ij}[/math]).

Size of the output data: [math]ml[/math]

1.10 Properties of the algorithm

It is clear that, in the case of unlimited resources, the ratio of the serial to parallel complexity is quadratic or bilinear (since this is the ratio of the cubic or trilinear complexity to linear one).

The computational power of the dense matrix multiplication, understood as the ratio of the number of operations to the total size of the input and output data, is linear.

The algorithm of dense matrix multiplication is completely determined. We do not consider any other order of performing associative operations in this version.

2 Software implementation of the algorithm

2.1 Implementation peculiarities of the serial algorithm

In its simplest form, the matrix multiplication algorithm in Fortran can be written as follows:

         
	DO  I = 1, M
        DO  J = 1, L
		S = 0.
		DO  K = 1, N
			S = S + DPROD(A(I,K), B(K,J))
		END DO	
	        C(I, J) = S
        END DO
	END DO

In this case the [math]S[/math] variable must be double precision to implement the accumulation mode.

2.2 Possible methods and considerations for parallel implementation of the algorithm

2.3 Run results

2.4 Conclusions for different classes of computer architecture

3 References

  1. Voevodin V.V., Kuznetsov Yu.A. Matrices and computations, Moscow: Nauka, 1984.