Implementation level

DCSC for finding the strongly connected components, C++, MPI, Parallel Boost Graph Library

From Algowiki
Jump to navigation Jump to search


Primary author of this description: I.V.Afanasyev.

1 Links

Parallel Boost Graph Library (function strong_components), the distributed DCSC algorithm is combined with the local search for strongly connected components performed by Tarjan's algorithm.

2 Locality of data and computations

2.1 Locality of implementation

2.1.1 Structure of memory access and a qualitative estimation of locality

2.1.2 Quantitative estimation of locality

3 Scalability of the algorithm and its implementations

3.1 Scalability of the algorithm

The DCSC algorithm has a considerable scalability potential because the underlying breadth-first search can be efficiently parallelized using either quadratic implementation or parallel queues (one queue per processor).

An additional resource of parallelism is provided by the three sets of nodes obtained at each step of the algorithm. These sets can be processed in parallel by using such mechanisms as "tasks" and "nested parallelism."

3.2 Scalability of of the algorithm implementation

Let us study the scalability of the parallel implementation of the Forward-Backward-Trim algorithm in accordance with the scalability methodology. This study was conducted using the Lomonosov-2 supercomputer of the Moscow University Supercomputing Center.

Variable parameters for the start-up of the algorithm implementation and the limits of parameter variations:

number of processors [1 : 28] with step 1; graph size [2^20 : 2^27].

A separate study was conducted to examine the strong scaling-out of this implementation.

The performance was defined as TEPS (Traversed Edges Per Second), which means the number of graph edges processed by the algorithm in a second. This characteristic allows one to compare performances for various graph sizes and estimate how the efficiency of processing reduces with the growth in the graph size.

Figure 1. Parallel implementation of the Forward-Backward-Trim algorithm, scalability of different variants of implementation: performance as a function of the graph size.

4 Dynamic characteristics and efficiency of the algorithm implementation

5 Run results