# Breadth-first search (BFS)

Алгоритм поиска в ширину (BFS) | |

Sequential algorithm | |

Serial complexity | [math]O(|V| + |E|)[/math] |

Input data | [math]O(|V| + |E|)[/math] |

Output data | [math]O(|V|)[/math] |

Parallel algorithm | |

Parallel form height | [math]N/A, \max O(|V|) [/math] |

Parallel form width | [math]N/A, \max O(|E|) [/math] |

Primary author of this description: I.V.Afanasyev.

## Contents

- 1 Properties and structure of the algorithm
- 1.1 General description of the algorithm
- 1.2 Mathematical description of the algorithm
- 1.3 Computational kernel of the algorithm
- 1.4 Macro structure of the algorithm
- 1.5 Implementation scheme of the serial algorithm
- 1.6 Serial complexity of the algorithm
- 1.7 Information graph
- 1.8 Parallelization resource of the algorithm
- 1.9 Input and output data of the algorithm
- 1.10 Properties of the algorithm

- 2 Software implementation of the algorithm
- 2.1 Implementation peculiarities of the serial algorithm
- 2.2 Locality of data and computations
- 2.3 Possible methods and considerations for parallel implementation of the algorithm
- 2.4 Scalability of the algorithm and its implementations
- 2.5 Dynamic characteristics and efficiency of the algorithm implementation
- 2.6 Conclusions for different classes of computer architecture
- 2.7 Existing implementations of the algorithm

- 3 References

## 1 Properties and structure of the algorithm

### 1.1 General description of the algorithm

The **Breadth-First Search (BFS)** makes it possible to calculate the shortest distances (measured in terms of the number of arcs) between a selected node of a directed graph and all the other its nodes and/or build up the directed rooted tree with the distances equal to those in the original graph. Moreover, the breadth-first search allows one to solve the reachability problem (that is, to find out where there exist paths leading from the source node to other nodes of the graph). The algorithm was first presented in the following publications: ^{[1]} and ^{[2]}.

The algorithm traverses the nodes layer-wise. At each step, there is a set of "advanced" nodes, and their neighbor nodes are checked whether they were not yet. The not yet visited nodes are added to the new level of advanced nodes, which are processed at the next step. At first, the set of advanced nodes consists of the source node only, and the graph traversal begins from this node.

The serial complexity of the algorithm is [math]O(|V| + |E|)[/math], where [math]|V|[/math] is the number of nodes and [math]|E|[/math] is the number of arcs in the graph.

### 1.2 Mathematical description of the algorithm

Let [math]G = (V, E)[/math] be an unweighted graph with a singled-out source node [math]u[/math]. The path [math]P(u,v)[/math] between the nodes [math]u[/math] and [math]v[/math] is the set of arcs [math](u, v_1), (v_1, v_2), ... (v_{n-1}, v)[/math]. The length [math]d(u,v)[/math] of a path between the nodes [math]u[/math] and [math]v[/math] is the number of its arcs. The breadth-first search finds the shortest paths [math]d(u,v)[/math] from the node [math]u[/math] to all the other nodes in the manner described below.

At the start of its execution, the algorithm sets [math]d(u)=0[/math] as the distance to the source-node and [math]d(v) = \infty, \forall v \neq u [/math], as the distances to the other nodes. Also, at the start, the algorithm initializes the set [math]F = \{u\}[/math].

Then, at each step, the algorithm forms the set of nodes [math]P = {w} [/math] such that [math]\forall v \in F \exists (v, w) \in E | d(w) = \infty [/math]. In addition, the algorithm updates the distances [math]d(w)=d(v)+1[/math], [math]\forall w \in P [/math]. After this, the algorithm goes to the next step until [math]P \neq \emptyset[/math]. At the beginning of each step, the set F is replaced by P.

### 1.3 Computational kernel of the algorithm

The computational kernel of the algorithm is the traversal of nodes adjacent to a single-out node [math]v[/math] with the subsequent addition of not yet visited nodes to the set [math]P[/math]. This operation is performed at each step for each node [math]v \in F[/math].

### 1.4 Macro structure of the algorithm

The algorithm successively refines the values of the function [math]d(v)[/math].

The structure of the algorithm can be described as follows:

1. Inizialization: all the nodes are assigned the tentative distance [math]d(v)=\infty[/math]; the only exception is the source node for which [math]d(u)=0[/math] .

2. The source node [math]v[/math] is placed to the set of "advanced" nodes [math]F[/math].

3. A traversal of the nodes of the set [math]F[/math].

(a) Inizialization of the set [math]P=\emptyset[/math].

(b) For each node [math]v \in F[/math], a traversal of all the neighbor nodes [math]w | \exists (v, w)[/math]. The nodes [math]v[/math] such that [math]d(w)=\infty[/math] are placed to the set [math]P[/math].

(c) The set [math]F[/math] is replaced by [math]P[/math]. If [math]F \neq \emptyset[/math], the algorithm goes to Step 3.

### 1.5 Implementation scheme of the serial algorithm

The simplest version of the breadth-first search can be implemented in C++ using queues. The following code is composed under the assumption that the graph is stored in the format of an adjacency list; that is, for each node, the indices of the beginning and end of its adjacency list are stored in the array vertices_to_edges_ptrs.

```
// init distances
for(int i = 0; i < vertices_count; i++)
_result[i] = MAX_INT;
// init queue and first vertex
std::queue<int> vertex_queue;
vertex_queue.push(_source_vertex);
_result[_source_vertex] = 1;
// do bfs
while(vertex_queue.size() > 0)
{
int cur_vertex = vertex_queue.front();
vertex_queue.pop();
long long first = vertices_to_edges_ptrs[cur_vertex];
long long last = vertices_to_edges_ptrs[cur_vertex + 1];
for(long long i = first; i < last; i++)
{
int dst_id = dst_ids[i];
if(_result[dst_id] == MAX_INT)
{
_result[dst_id] = _result[src_id] + 1;
vertex_queue.push(dst_id);
}
}
}
```

### 1.6 Serial complexity of the algorithm

The serial complexity of the algorithm is [math]O(|V| + |E|)[/math], where [math]|V|[/math] and [math]|E|[/math] are the number of nodes and the number of arcs, respectively: the algorithm initializes the array of distances, which costs [math]O(|V|)[/math] operations, and then traverses all the nodes with a single visit to each node, which costs [math]O(|E|)[/math] operations. This estimate is valid if the scheme for storing graphs allows one to traverse the nodes adjacent to a chosen node. (This is true, for instance, of adjacent lists and condensed adjacent lists.) For other formats, the estimate for complexity may be greater.

### 1.7 Information graph

The information graph of the classical breadth-first search is shown in figure 1.

thumb|center|500px|Figure 1. Information graph of the BFS algorithm.

The following notation is used in figure 1:

[1] - addition of the source node [math]u[/math] to the set [math]F[/math].

[2] - retrieval of a node [math]v[/math] from the set [math]F[/math].

[3] - inspection of the distances to the neighbor nodes of [math]v[/math].

[4] - addition of not yet visited nodes to the set [math]P[/math].

[5] - replacement of the set [math]F[/math] by [math]P[/math] and checking whether the latter is empty. If this set is not empty, the algorithm goes to the next iteration; otherwise, it terminates.

An implementation of this algorithm comes across its important drawback: operation [4] requires that the addition of nodes to the set P be conflict-free. In practice, this requirement always reduces to a serialization of accesses to the data structure that simulates the given set.

For this reason, one often uses the following modification of the above algorithm (which thereafter is called algorithm-2). It exploits independent data structures for parallel processes. The information graph of this approach is shown in figure 2.

thumb|center|500px|Figure 2. Information graph of the BFS algorithm (independent data structures).

The notation used in figure 2:

[1] - addition of the source node to the set [math]F[/math].

[2] - partition of the data in the set [math]F[/math] between processes.

[3] - loading of data from [math]F[/math] to the set [math]F_i[/math] for the i-th process.

[4] - retrieval of the next node from the set [math]F_i[/math], traversal of its neighbors, and their addition to the set [math]P_i[/math] if they were not yet visited.

[5] - pairwise merging of the sets [math]P_i[/math] for different processes and their final transformation into the set [math]F[/math].

[6] - test of the loop exit condition.

If the data structures simulating the sets [math]F[/math] and [math]P[/math] cannot be implemented, one may use an algorithm of quadratic complexity similar to the Bellman-Ford algorithm. The basic idea is that, at each step, all the arcs of the graph are traversed and the current distance array is updated. The information graph of this modification is shown in figure 3.

The notation used in figure 3:

[1] - initialization of the distances to the source node.

[2] - initialization of the distances to the other nodes.

[3] - loading information on the next arc and updating the distances to the corresponding nodes.

[4] - test of the loop exit condition.

### 1.8 Parallelization resource of the algorithm

In the course of execution, the algorithm in its classical version traverses the graph layer-wise. The current layer is amplified by the not yet visited nodes that are reachable from the preceding layer. A traversal of each layer, as well as of its neighbors, can be performed in a parallel manner. It is impossible to exactly estimate the number of nodes in each layer because this number depends on the connectedness structure of the input graph. Similarly, it is not possible to estimate the number of steps required for the algorithm to find all the shortest distances.

Let us estimate the width of the algorithm parallel form in terms of the maximum number of nodes p in a layer over all the steps. Then the number of parallel operations performed at the current layer is equal to the sum of the numbers of neighbor nodes taken over the nodes of this layer: [math]\sum_{n=1}^{p} degree(v_i)[/math]. This value is different for different layers. The parallel form height is equal to the number of steps in the algorithm. It can only be estimated from above; namely, it does not exceed [math]|V|[/math]).

In the case of quadratic approach to the parallel implementation, [math]O(|E|)[/math] operations are done at each step. They can be performed in parallel; thus, the parallel form width of this modification is [math]O(|E|)[/math]. Similarly to the classical case, the number of steps in the algorithm depends on the structure of graph and can be estimated from above by the value [math]O(|V|)[/math].

### 1.9 Input and output data of the algorithm

**Input data**: graph [math]G(V, E)[/math] with [math]|V|[/math] nodes [math]v_i[/math] and [math]|E|[/math] arcs [math]e_j = (v^{(1)}_{j}, v^{(2)}_{j})[/math], source node [math]u[/math].

**Size of the input data**: [math]O(|V| + |E|)[/math].

**Output data** (possible variants):

- for each node [math]v[/math] of the input graph, the distance [math]d(v)[/math] defined as the number of arcs in the shortest path from [math]u[/math] to [math]v[/math].
- for each node [math]v[/math] of the input graph, the indicator of reachability from the source node [math]u[/math] (shows whether or not the node [math]v[/math] is reachable from [math]u[/math] ).

**Size of the output data**: [math]O(|V|)[/math].

### 1.10 Properties of the algorithm

## 2 Software implementation of the algorithm

### 2.1 Implementation peculiarities of the serial algorithm

### 2.2 Locality of data and computations

#### 2.2.1 Locality of implementation

##### 2.2.1.1 Structure of memory access and a qualitative estimation of locality

##### 2.2.1.2 Quantitative estimation of locality

### 2.3 Possible methods and considerations for parallel implementation of the algorithm

### 2.4 Scalability of the algorithm and its implementations

#### 2.4.1 Scalability of the algorithm

#### 2.4.2 Scalability of of the algorithm implementation

### 2.5 Dynamic characteristics and efficiency of the algorithm implementation

### 2.6 Conclusions for different classes of computer architecture

### 2.7 Existing implementations of the algorithm

- Graph500.
- C++: Boost Graph Library (функции
`breadth_first_search`

,`breadth_first_visit`

). - C++, MPI: Parallel Boost Graph Library (функция
`breadth_first_search`

) - Java: WebGraph (класс
`ParallelBreadthFirstVisit`

), многопоточная реализация. - Python: NetworkX (функция
`bfs_edges`

). - Python/C++: NetworKit (класс
`networkit.graph.BFS`

). - RCC for CPU
- RCC for GPU
- Gap
- Litra