Difference between revisions of "Pairwise summation of numbers"

[quality revision]

Revision as of 13:42, 30 July 2015

Primary authors of this description: A.V.Frolov, Vad.V.Voevodin (Section 2.2), A.M.Teplov (Section 2.4)

1 Properties and structure of the algorithm

1.1 General description of the algorithm

The technique of pairwise execution is used to accelerate the calculation of long sequences of associative operations (for instance, mass summations). It is widely used because of its computational characteristics and the fact that the height of this algorithm is minimum possible. Its popularity among the non-numerical algorithms is related to its recursive nature, which simplifies its formulation.

1.2 Mathematical description of the algorithm

Input data: one-dimensional numeric array of size [math]n[/math].

Output data: sum of the array's elements.

Formulas of the method: At each stage, the current array is first partitioned into pairs, and then the elements in each pair are summed. The resulting sums (and the elements not used at this stage) form an array that is partitioned into pairs at the next stage, etc.

1.3 Computational kernel of the algorithm

The computational kernel of the serial-parallel summation method can be compiled either of elementary pairwise additions (there are on the whole [math]n - 1[/math] additions) or in a recursive manner, that is, as a set of implementations of the pairwise summation for shorter arrays.

1.4 Macro structure of the algorithm

As already noted in the description of the computational kernel, the basic part of the method is compiled of recursive calls of the summation procedure for shorter arrays.

1.5 Implementation scheme of the serial algorithm

In its pure form, the pairwise summation is rarely used in a serial implementation because this complicates the general scheme of the algorithm and sharply increases the size of memory for storing intermediate data.

1.6 Serial complexity of the algorithm

Various ways of summation of an array of size [math]n[/math] differ from each other only by the arrangement of parentheses in the sum. The number of additions is the same for all arrangements, namely, [math]n - 1[/math]. Thus, in terms of serial complexity, the method should be qualified as a linear complexity algorithm.

1.7 Information graph

The algorithm graph is shown in the fig.1 in the case where an array of size 16 is summed. The vertices corresponding to the input data are depicted in blue, while those that correspond to output data are depicted in red.

Figure 1. Pairwise summation of an array

1.8 Parallelization resource of the algorithm

The parallel version of the pairwise summation for an array of size [math]n[/math] requires that [math]\lceil \log_2 n \rceil[/math] layers be successively performed. The number of additions per layer decreases from [math]\frac{n}{2}[/math] to [math]1[/math]). In terms of the parallel form height, the pairwise summation is qualified as a logarithmic complexity algorithm. Its complexity is linear in terms of the parallel form width.

1.9 Input and output data of the algorithm

Input data: array [math]x[/math] (with elements [math]x_i[/math]).

Further restrictions: none.

Size of the input data: [math]N[/math].

Output data: sum of the array's elements.

Size of the output data: single scalar.

1.10 Properties of the algorithm

It is clear that, in the case of unlimited resources, the ratio of the serial to parallel complexity has the order [math]\frac{n}{\log_2 n}[/math] (the ratio of the linear to logarithmic complexity). The computational power of the algorithm, understood as the ratio of the number of operations to the total size of the input and output data, is only 1 (the size of the input and output data is almost equal to the number of operations). The algorithm is completely determined. The edges of the information graph are nonlocal. For any placing of graph vertices, the length of the edges grows exponentially from a layer to the next layer.

2 Software implementation of the algorithm

2.1 Implementation peculiarities of the serial algorithm

2.2 Locality of data and computations

2.2.1 Locality of implementation

2.2.1.1 Structure of memory access and a qualitative estimation of locality

2.2.1.2 Quantitative estimation of locality

2.3 Possible methods and considerations for parallel implementation of the algorithm

2.4 Scalability of the algorithm and its implementations

2.4.1 Scalability of the algorithm

2.4.2 Scalability of of the algorithm implementation

2.5 Dynamic characteristics and efficiency of the algorithm implementation

2.6 Conclusions for different classes of computer architecture

2.7 Existing implementations of the algorithm

3 References

@@ Line 1: / Line 1: @@
 Primary authors of this description: [[:ru:Участник:Frolov|A.V.Frolov]], [[:ru:Участник:VadimVV|Vad.V.Voevodin]] ([[#Locality of data and computations|Section 2.2]]), [[:ru:Участник:Teplov|A.M.Teplov]] ([[#Scalability of the algorithm and its implementations|Section 2.4]])
-== Description of the properties and structure of the algorithm ==
+== Properties and structure of the algorithm ==
-=== General description ===
+=== General description of the algorithm ===
 The technique of '''pairwise execution''' is used to accelerate the calculation of long sequences of associative operations (for instance, mass summations). It is widely used because of its computational characteristics and the fact that the height of this algorithm is minimum possible. Its popularity among the non-numerical algorithms is related to its recursive nature, which simplifies its formulation.
-=== Mathematical description ===
+=== Mathematical description of the algorithm ===
 Input data: one-dimensional numeric array of size <math>n</math>.
@@ Line 19: / Line 19: @@
 The computational kernel of the serial-parallel summation method can be compiled either of elementary pairwise additions (there are on the whole <math>n - 1</math> additions) or in a recursive manner, that is, as a set of implementations of the pairwise summation for shorter arrays.
-=== Macrostructure of the algorithm ===
+=== Macro structure of the algorithm ===
 As already noted in the description of the computational kernel, the basic part of the method is compiled of recursive calls of the summation procedure for shorter arrays.
@@ Line 33: / Line 33: @@
 === Information graph ===
-The algorithm graph is shown in the figure in the case where an array of size 16 is summed. The vertices corresponding to the input data are depicted in blue, while those that correspond to output data are depicted in red.
+The algorithm graph is shown in the fig.1 in the case where an array of size 16 is summed. The vertices corresponding to the input data are depicted in blue, while those that correspond to output data are depicted in red.
-[[file:binary-tree-based summation graph.png|center|thumb|500px|Pairwise summation of an array]]
+[[file:binary-tree-based summation graph.png|center|thumb|500px|Figure 1. Pairwise summation of an array]]
-=== Parallelism resource of the algorithm ===
+=== Parallelization resource of the algorithm ===
 The parallel version of the pairwise summation for an array of size <math>n</math> requires that <math>\lceil \log_2 n \rceil</math> layers be successively performed. The number of additions per layer decreases from <math>\frac{n}{2}</math> to <math>1</math>). In terms of the parallel form height, the pairwise summation is qualified as a ''logarithmic complexity'' algorithm. Its complexity is ''linear'' in terms of the parallel form width.
-=== Description of the input and output data ===
+=== Input and output data of the algorithm ===
 Input data: array <math>x</math> (with elements <math>x_i</math>).
@@ Line 56: / Line 56: @@
 It is clear that, in the case of unlimited resources, the ratio of the serial to parallel complexity has the order <math>\frac{n}{\log_2 n}</math>  (the ratio of the linear to logarithmic complexity). The computational power of the algorithm, understood as the ratio of the number of operations to the total size of the input and output data, is only 1 (the size of the input and output data is almost equal to the number of operations). The algorithm is completely determined. The edges of the information graph are nonlocal. For any placing of graph vertices, the length of the edges grows exponentially from a layer to the next layer.
+== Software implementation of the algorithm ==
+=== Implementation peculiarities of the serial algorithm ===
+=== Locality of data and computations ===
+==== Locality of implementation ====
+===== Structure of memory access and a qualitative estimation of locality =====
+===== Quantitative estimation of locality =====
+=== Possible methods and considerations for parallel implementation of the algorithm  ===
+=== Scalability of the algorithm and its implementations ===
+==== Scalability of the algorithm ====
+==== Scalability of of the algorithm implementation ====
+=== Dynamic characteristics and efficiency of the algorithm implementation ===
+=== Conclusions for different classes of computer architecture ===
+=== Existing implementations of the algorithm ===
+== References ==
+<references />
 [[Ru:Нахождение суммы элементов массива сдваиванием]]
 [[Category:Articles in progress]]

Difference between revisions of "Pairwise summation of numbers"

Revision as of 13:42, 30 July 2015

Contents

1 Properties and structure of the algorithm

1.1 General description of the algorithm

1.2 Mathematical description of the algorithm

1.3 Computational kernel of the algorithm

1.4 Macro structure of the algorithm

1.5 Implementation scheme of the serial algorithm

1.6 Serial complexity of the algorithm

1.7 Information graph

1.8 Parallelization resource of the algorithm

1.9 Input and output data of the algorithm

1.10 Properties of the algorithm

2 Software implementation of the algorithm

2.1 Implementation peculiarities of the serial algorithm

2.2 Locality of data and computations

2.2.1 Locality of implementation

2.2.1.1 Structure of memory access and a qualitative estimation of locality

2.2.1.2 Quantitative estimation of locality

2.3 Possible methods and considerations for parallel implementation of the algorithm

2.4 Scalability of the algorithm and its implementations

2.4.1 Scalability of the algorithm

2.4.2 Scalability of of the algorithm implementation

2.5 Dynamic characteristics and efficiency of the algorithm implementation

2.6 Conclusions for different classes of computer architecture

2.7 Existing implementations of the algorithm

3 References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

File storage

Tools

In other languages