Difference between revisions of "Dot product"

[unchecked revision]

Revision as of 16:48, 13 July 2015

Primary authors of this description: A.V.Frolov, Vad.V.Voevodin (Section 2.2), A.M.Teplov (раздел Section 2.4)

1 Description of the properties and structure of the algorithm

1.1 General description

The dot product of vectors is one of the basic operations in a number of methods. It is used in two versions: as the proper dot product of [math]n[/math]-dimensional vectors (one-dimensional arrays of size [math]n[/math]) and as the scalar product of rows, columns, and other linear subsets of multidimensional arrays. The latter differs from the former in that the corresponding subroutine receives not only the initial addresses of vectors but also parameters of the displacement of the subsequent components with respect to the preceding ones (for the former version, all the displacements are equal to one). Formulas defining the dot product are different for real and complex arithmetic. Here, we deal only with real arithmetic and the serial-parallel implementation.

1.2 Mathematical description

Input data: two one-dimensional numeric arrays of size n.

Output data: the sum of pairwise products of the corresponding elements of both arrays.

Formulas of the method: the number [math]n[/math] is expressed in the form [math]n = (p - 1) k + q[/math], where [math]p[/math] is the number of processors, [math]k = \lceil \frac{n}{p} \rceil[/math], and [math]q = n - k (p - 1)[/math]. Then the [math]i[/math]-th processor ([math]i \lt p[/math]) calculates in the serial mode the partial dot product of the subarrays formed of the elements with indices from [math](i - 1) k + 1[/math] to [math]ik[/math].

[math]S_i = \sum_{j = 1}^k a_{k (i - 1) + j} b_{k (i - 1) + j}[/math]

The [math]p[/math]-th processor calculates in the serial mode the partial dot product of the subarrays formed of the elements with indices from [math](p - 1) k + 1[/math] to [math](p - 1) k + q[/math].

[math]S_p = \sum_{j = 1}^q a_{k (p - 1) + j} b_{k (p - 1) + j}[/math]

Upon ending this process, the processors exchange their data, and then one of them (or all of them simultaneously if they need the result later on) add the partial sums in the serial mode.

[math]\sum_{i = 1}^p S_i[/math]

In the serial-parallel version, all the sums are calculated in the serial mode (usually in the increasing order of indices).

1.3 Computational kernel of the algorithm

The computational kernel of the dot product in the serial-parallel version can be represented as [math]p[/math] calculations of partial dot products with the subsequent serial summation of [math]p[/math] partial results.

1.4 Macro structure of the algorithm

As already noted in the description of the kernel, the basic part of the algorithm consists of the parallel computation of partial dot products, calculated in the serial mode, and the subsequent summation of these partial dot products.

1.5 Implementation scheme of the serial algorithm

The formulas of the method are given above. The order of summation can be different, both in the increasing and decreasing order of indices. If no special reason is pursued, then, usually, the natural order of increasing indices is used.

1.6 Serial complexity of the algorithm

In any implementation of the dot product for arrays consisting of [math]n[/math] elements the number of scalar multiplications is invariably equal to [math]n[/math], while the number of additions is [math]n - 1[/math]. Consequently, in terms of the number of serial operations, the algorithm should be qualified as a linear complexity algorithm.

1.7 Information graph

We describe the graph of the algorithm graphically (see the first figure). It should be noted, however, that, in most cases, programmers prefer to assign zero initial values to variables chosen for summation, which increases the number of additions by one. In such cases, the graph is as shown in the second figure (n=24).

Serial-parallel calculation of the dot product with saving in the number of additions

Serial-parallel calculation of the dot product without saving in the number of additions

1.8 Parallelism resource of the algorithm

The parallel version of the serial-parallel method for calculating the dot product of arrays of size [math]n[/math] requires that the following layers be successively executed:

1 layer of calculating pairwise products,
[math]k - 1[/math] layers of summation for partial dot products ([math]p[/math] branches),
[math]p - 1[/math] layers of summation (one serial branch).

Therefore, the critical path of the algorithm in the parallel version (and the corresponding height of the CPF) depends on the chosen partition of the arrays. In the optimal case ([math]p = \sqrt{n}[/math]), the height of the CPF is [math] 2 \sqrt{n} - 1[/math]. Thus, in terms of the height of the CPF, the serial-parallel method is qualified as a square-root complexity algorithm. In terms of the width of the CPF, it is also a square-root complexity algorithm.

1.9 Description of the input and output data

Input data: array[math]a[/math] (with elements [math]a_i[/math]), array [math]b[/math] (with elements [math]b_i[/math]).

Further restrictions: none.

Size of the input data: [math]2 n[/math].

Output data: the sum of pairwise products of the corresponding elements of two arrays.

Size of the output data: single scalar.

1.10 Properties of the algorithm

It is clear that, in the case of unlimited resources, the ratio of the serial to parallel complexity is expressed by the square root of n (since this is the ratio of the linear to square-root complexity). The computational power of the algorithm, understood as the ratio of the number of operations to the total size of the input and output data, is only 1 (the size of the input and output data is almost equal to the number of operations; more exactly, the former number exceeds the latter by 2). For a given partition of [math]n[/math], the algorithm is completely determined. The edges of the information graph are local. In a number of algorithms using dot products of single precision and the accumulation mode, these products are calculated in double precision in order to reduce round-off errors. Для уменьшения ошибок округления режимом накопления в ряде алгоритмов, использующих скалярное произведение одинарной точности, оно вычисляется с двойной точностью. However, even without accumulation mode, the serial-parallel method for calculating the dot product reduces the effect of round-off errors [math]\sqrt{n}[/math]-fold "on the average".

@@ Line 1: / Line 1: @@
-Основные авторы описания: [[Участник:Frolov|А.В.Фролов]], [[Участник:VadimVV|Вад.В.Воеводин]] ([[#Описание локальности данных и вычислений|раздел 2.2]]), [[Участник:Teplov|А.М.Теплов]] (раздел [[#Масштабируемость алгоритма и его реализации|2.4]])
+Primary authors of this description: [[:ru:Участник:Frolov|A.V.Frolov]], [[:ru:Участник:VadimVV|Vad.V.Voevodin]] ([[#Описание локальности данных и вычислений|Section 2.2]]), [[:ru:Участник:Teplov|A.M.Teplov]] (раздел [[#Масштабируемость алгоритма и его реализации|Section 2.4]])
-== Описание свойств и структуры алгоритма ==
+== Description of the properties and structure of the algorithm ==
-=== Общее описание алгоритма ===
+=== General description ===
-'''Скалярное произведение векторов''' используется в качестве одной из базовых операций в широком круге методов. При этом используется как в версии скалярного произведения собственно <math>n</math>-мерных векторов (одномерных массивов размера <math>n</math>), так и в версии скалярного произведения строк, столбцов и других линейных подмножеств массивов большей размерности. Последняя отличается от первой тем, что соответствующая подпрограмма получает, кроме стартовых адресов векторов, также и параметры смещения следующих элементов относительно предыдущих (в первой версии эти смещения равны 1). Разные формулы существуют для скалярных произведений в вещественной арифметике и для комплексных векторов. Здесь мы рассматриваем только вещественную арифметику и последовательно-параллельную реализацию.
+'''The dot product of vectors''' is one of the basic operations in a number of methods. It is used in two versions: as the proper dot product of <math>n</math>-dimensional vectors (one-dimensional arrays of size <math>n</math>) and as the scalar product of rows, columns, and other linear subsets of multidimensional arrays. The latter differs from the former in that the corresponding subroutine receives not only the initial addresses of vectors but also parameters of the displacement of the subsequent components with respect to the preceding ones (for the former version, all the displacements are equal to one). Formulas defining the dot product are different for real and complex arithmetic. Here, we deal only with real arithmetic and the serial-parallel implementation.
-=== Математическое описание ===
+=== Mathematical description ===
-Исходные данные: два одномерных массива n чисел.
+Input data: two one-dimensional numeric arrays of size n.
-Вычисляемые данные: сумма попарных произведений элементов массива.
+Output data: the sum of pairwise products of the corresponding elements of both arrays.
-Формулы метода:
+Formulas of the method:
-число <math>n</math> разлагается в выражение типа
+the number <math>n</math> is expressed in the form
 <math>n = (p - 1) k + q</math>,
-где <math>p</math> — количество процессоров, <math>k = \lceil \frac{n}{p} \rceil</math>,
+where <math>p</math> is the number of processors, <math>k = \lceil \frac{n}{p} \rceil</math>, and
-<math>q = n - k (p - 1)</math>.
+<math>q = n - k (p - 1)</math>. Then the <math>i</math>-th processor (<math>i < p</math>) calculates in the serial mode the partial dot product of the subarrays formed of the elements with indices from <math>(i - 1) k + 1</math> to <math>ik</math>.
-После этого на <math>i</math>-м процессоре (<math>i < p</math>) последовательно вычисляется «частичное» скалярное произведение подмассивов, начиная с <math>(i - 1) k + 1</math>-го номера элемента, до <math>ik</math>-го номера.
 :<math>S_i = \sum_{j = 1}^k a_{k (i - 1) + j} b_{k (i - 1) + j}</math>
-На <math>p</math>-м процессоре последовательно вычисляется «частичное» скалярное произведение подмассивов, начиная с <math>(p - 1) k + 1</math>-го номера элемента до <math>(p - 1) k + q</math>-го номера.
+The <math>p</math>-th processor calculates in the serial mode the partial dot product of the subarrays formed of the elements with indices from <math>(p - 1) k + 1</math> to <math>(p - 1) k + q</math>.
 :<math>S_p = \sum_{j = 1}^q a_{k (p - 1) + j} b_{k (p - 1) + j}</math>
-По окончании этого процесса процессоры обмениваются данными и на одном из них (либо на всех одновременно, если результат нужен далее на всех процессорах) получившиеся суммы суммируются последовательно друг с другом.
+Upon ending this process, the processors exchange their data, and then one of them (or all of them simultaneously if they need the result later on) add the partial sums in the serial mode.
 :<math>\sum_{i = 1}^p S_i</math>
-При этом в последовательно-параллельном варианте при вычислений сумм из формул используется последовательный порядок суммирования (обычно от меньших индексов к большим).
+In the serial-parallel version, all the sums are calculated in the serial mode (usually in the increasing order of indices).
-=== Вычислительное ядро алгоритма ===
+=== Computational kernel of the algorithm ===
-Вычислительное ядро скалярного произведения в последовательно-параллельном варианте можно представить как <math>p</math> вычислений «частных» скалярных произведений c последующим последовательным суммированием получившихся <math>p</math> чисел.
+The computational kernel of the dot product in the serial-parallel version can be represented as <math>p</math> calculations of partial dot products with the subsequent serial summation of <math>p</math> partial results.
-=== Макроструктура алгоритма ===
+=== Macro structure of the algorithm ===
-Как уже записано в описании ядра алгоритма, основную часть вычисления скалярного произведения составляют параллельное вычисление скалярных произведений меньшей размерности последовательным методом и последовательное вычисление суммы получившихся «частных» скалярных произведений подмассивов.
+As already noted in the description of the kernel, the basic part of the algorithm consists of the parallel computation of partial dot products, calculated in the serial mode, and the subsequent summation of these partial dot products.
-=== Описание схемы реализации последовательного алгоритма ===
+=== Implementation scheme of the serial algorithm ===
-Формулы метода описаны выше. Последовательность исполнения суммирования может быть разная — как по возрастанию, так и по убыванию индексов. Обычно без особых причин порядок не меняют, используя естественный (возрастание индексов).
+The formulas of the method are given above. The order of summation can be different, both in the increasing and decreasing order of indices. If no special reason is pursued, then, usually, the natural order of increasing indices is used.
-=== Последовательная сложность алгоритма ===
+=== Serial complexity of the algorithm ===
-Для вычисления скалярного произведения массивов, состоящих из <math>n</math> элементов, при любых разложениях количество операций умножения неизменно и равно <math>n</math>, а количество операций сложения равно <math>n - 1</math>. Поэтому алгоритм должен быть отнесён к алгоритмам ''линейной'' сложности по количеству последовательных операций.
+In any implementation of the dot product for arrays consisting of <math>n</math> elements the number of scalar multiplications is invariably equal to <math>n</math>, while the number of additions is <math>n - 1</math>. Consequently, in terms of the number of serial operations, the algorithm should be qualified as a ''linear'' complexity algorithm.
-=== Информационный граф ===
+=== Information graph ===
-Опишем граф алгоритма в виде рисунка. Однако следует отметить, что в большинстве случаев программисты не экономят на одном вызове операции сложения, а инициализируют начальное значение переменной нулём. В этом случае граф становится таким, как на втором рисунке (n=24).
+We describe the graph of the algorithm graphically (see the first figure). It should be noted, however, that, in most cases, programmers prefer to assign zero initial values to variables chosen for summation, which increases the number of additions by one. In such cases, the graph is as shown in the second figure (n=24).
 {| align="left"
      |- valign="top"
-     | [[file:series-parallel_dot_product_graph.png|thumb|750px|Последовательно-параллельное вычисление скалярного произведения с экономией операций сложения]]
+     | [[file:series-parallel_dot_product_graph.png|thumb|750px|Serial-parallel calculation of the dot product with saving in the number of additions]]
-     | [[file:Series-parallel_dot_product_graph_straight.png|thumb|790px|Последовательно-параллельное вычисление скалярного произведения без экономии операций сложения]]
+     | [[file:Series-parallel_dot_product_graph_straight.png|thumb|790px| Serial-parallel calculation of the dot product without saving in the number of additions]]
 |}
-=== Описание ресурса параллелизма алгоритма ===
+=== Parallelism resource of the algorithm ===
-Для вычисления скалярного произведения массивов порядка <math>n</math> последовательно-параллельным методом в параллельном варианте требуется последовательно выполнить следующие ярусы:
+The parallel version of the serial-parallel method for calculating the dot product of arrays of size <math>n</math> requires that the following layers be successively executed:
-* 1 ярус вычисления произведений,
+* 1 layer of calculating pairwise products,
-* <math>k - 1</math> ярусов суммирования по частям массивов (<math>p</math> ветвей),
+* <math>k - 1</math> layers of summation for partial dot products (<math>p</math> branches),
-* <math>p - 1</math> ярусов суммирования (одна последовательная ветвь).
+* <math>p - 1</math> layers of summation (one serial branch).
-Таким образом, в параллельном варианте критический путь алгоритма  (и соответствующая ему высота ЯПФ) будет зависеть от произведённого разбиения массива на части. В оптимальном случае (<math>p = \sqrt{n}</math>) высота ЯПФ будет равна <math> 2 \sqrt{n} - 1</math>. При классификации по высоте ЯПФ, таким образом, последовательно-параллельный метод относится к алгоритмам ''со сложностью «корень квадратный»''. При классификации по ширине ЯПФ его сложность также будет ''«корень квадратный»''.
+Therefore, the critical path of the algorithm in the parallel version (and the corresponding height of the CPF) depends on the chosen partition of the arrays. In the optimal case (<math>p = \sqrt{n}</math>), the height of the CPF is <math> 2 \sqrt{n} - 1</math>. Thus, in terms of the height of the CPF, the serial-parallel method is qualified as a square-root complexity algorithm. In terms of the width of the CPF, it is also a square-root complexity algorithm.
-=== Описание входных и выходных данных ===
+=== Description of the input and output data ===
-Входные данные: массивы <math>a</math> (элементы <math>a_i</math>), <math>b</math> (элементы <math>b_i</math>).
+Input data: array<math>a</math> (with elements <math>a_i</math>), array <math>b</math> (with elements <math>b_i</math>).
-Дополнительные ограничения: отсутствуют.
+Further restrictions: none.
-Объём входных данных: <nowiki/><math>2 n</math>.
+Size of the input data: <nowiki/><math>2 n</math>.
-Выходные данные: сумма попарных произведений элементов массивов.
+Output data: the sum of pairwise products of the corresponding elements of two arrays.
-Объём выходных данных: один скаляр.
+Size of the output data: single scalar.
-=== Свойства алгоритма ===
+=== Properties of the algorithm ===
-Соотношение последовательной и параллельной сложности в случае неограниченных ресурсов, как хорошо видно, является ''корнем квадратным'' (отношение линейной к корню квадратному). При этом вычислительная мощность алгоритма, как отношение числа операций к суммарному объему входных и выходных данных — всего-навсего ''1 (входных и выходных данных почти столько же, сколько операций; если точнее - даже больше на 2)''. При этом алгоритм полностью детерминирован при заданном разложении <math>n</math>. Дуги информационного графа локальны. Для уменьшения ошибок округления режимом накопления в ряде алгоритмов, использующих скалярное произведение одинарной точности, оно вычисляется с двойной точностью. Впрочем, у последовательно-параллельного способа вычисления скалярного произведения и без режима накопления влияние ошибок округления «в среднем» меньше в <math>\sqrt{n}</math> раз.
+It is clear that, in the case of unlimited resources, the ratio of the serial to parallel complexity is expressed by the ''square root'' of n (since this is the ratio of the linear to square-root complexity). The computational power of the algorithm, understood as the ratio of the number of operations to the total size of the input and output data, is only ''1 (the size of the input and output data is almost equal to the number of operations; more exactly, the former number exceeds the latter by 2)''. For a given partition of <math>n</math>, the algorithm is completely determined. The edges of the information graph are local. In a number of algorithms using dot products of single precision and the accumulation mode, these products are calculated in double precision in order to reduce round-off errors. Для уменьшения ошибок округления режимом накопления в ряде алгоритмов, использующих скалярное произведение одинарной точности, оно вычисляется с двойной точностью. However, even without accumulation mode, the serial-parallel method for calculating the dot product reduces the effect of round-off errors <math>\sqrt{n}</math>-fold "on the average".
 [[Category:Started articles]]
 [[Ru:Скалярное произведение векторов, вещественная версия, последовательно-параллельный вариант]]

Difference between revisions of "Dot product"

Revision as of 16:48, 13 July 2015

Contents

1 Description of the properties and structure of the algorithm

1.1 General description

1.2 Mathematical description

1.3 Computational kernel of the algorithm

1.4 Macro structure of the algorithm

1.5 Implementation scheme of the serial algorithm

1.6 Serial complexity of the algorithm

1.7 Information graph

1.8 Parallelism resource of the algorithm

1.9 Description of the input and output data

1.10 Properties of the algorithm

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

File storage

Tools

In other languages