# Construction of the minimum spanning tree (MST)

## 1 Formulation of the problem

Let $G = (V, E)$ be a given connected, undirected graph with edge weights $f(e)$. A subgraph that is a tree and connects all the vertices of $G$ is called a spanning tree. A spanning tree is said to be minimum if the total weight of its edges is minimal among all of the spanning trees.

The formulation of the problem and an algorithm for its solution appeared first in the paper [1].

## 2 Variants of the problem

If the original graph $G$ is disconnected, then a set composed of minimum spanning trees for all the connected components is called a minimum spanning forest (MSF).

For a graph with integer weights, special techniques such as digit-wise sorting can be used, which produces algorithms solving the problem in linear time.

A possible additional requirement in a distributed problem is that data exchanges occur only along the edges of the graph.

## 3 Properties of the problem

Weights. The solution to the problem depends on the relative rather than absolute values of the weights. Thus, instead of assigning the weights, one can define an ordering (that is, a "less than or equal to" predicate) on the set of edge pairs of the graph. Hence, without loss of generality, one can assume that the edge weights are pairwise distinct. To this end, the edges should be ordered first with respect to weights and then with respect to indices. Moreover, any increasing function can be applied to the original weights, and this will not change the solution. It follows that all the weights can be assumed to belong to a given interval, for instance, the interval $[0, 1]$.

Existence and uniqueness. The minimum spanning forest exists for every graph. It is unique if all the edge weights are pairwise distinct. As already noted, the edge weights can always be made distinct. In the case where this condition of distinct weights is not fulfilled, one can easily construct an example with more than one minimum spanning tree.

Merging fragments. Let $F$ be a fragment of the minimum spanning tree of a graph $G$, and let $G'$ be the graph obtained from $G$ by merging the vertices that belong to $F$. Then the union of $F$ and a minimum spanning tree of the graph $G'$ is a minimum spanning tree of the original graph $G$.

Minimum edge of a fragment. Let $F$ be a fragment of the minimum spanning tree, and let $e_F$ be the edge with a minimum weight outgoing from $F$ (that is, exactly one end of this edge is a vertex of $F$). If such an edge $e_F$ is unique, then it belongs to the minimum spanning tree. This property underlies Boruvka's algorithm and Prim's algorithm.

'Minimum edge of a graph. If $e^*$ is the only edge in a graph that has the minimum weight, then it belongs to the minimum spanning tree. This property underlies Kruskal's algorithm.

Associativity with respect to edges. Let $MSF(E)$ be a minimum spanning tree of the graph with edges $E$. Then

$MSF(E_1 \cup E_2 \cup \dots \cup E_k) = MSF(MSF(E_1) \cup MSF(E_2) \cup \dots \cup MSF(E_k)).$

The number of edges in a spanning forest of a graph with $n$ vertices consisting of $c$ connected components is $n-c$. If the number of connected components is known a priori, then this property may be used for an earlier termination of the algorithms.

## 4 Input and output data

Input data: weighted graph $(V, E, W)$ ($n$ vertices $v_i$ and $m$ edges $e_j = (v^{(1)}_{j}, v^{(2)}_{j})$ with weights $f_j$).

Size of the input data: $O(m + n)$.

Output data: the list of edges of a minimum spanning tree (for a disconnected graph, the list of minimum spanning trees for all the connected components).

Size of the output data: $O(n)$.

## 5 Algorithms for solving the problem

There are three classical approaches to solving this problem:

If usual data structures are used, then the serial complexity is $O(m \ln m)$ for all the three algorithms. (Notation: $m$ is the number of edges and $n$ is the number of vertices.)

All the other algorithms tend to be a variation of the above approaches or their combination.

• The GHS algorithm (Gallager, Humblet, Spira)[6] and its subsequent versions [7][8] are distributed variants of Boruvka's algorithm. This algorithm is usually used for the automatic distributed construction of a spanning tree by a commutator network.
• Gabow's algorithm [9] uses a Fibonacci heap for ordering the edges in Boruvka's algorithm. The complexity is $O(m \ln \beta (m, n))$.
• The Fredman-Willard algorithm[10] is designed for graphs with integer weights and has a linear complexity estimate $O(m)$. It combines Prim's algorithm with a specially developed heap algorithm (AF-heap).
• Karger's algorithm [11] solves the problem in linear average-time $O(m)$. (The existence of a determinate algorithm with a linear complexity estimate is an open problem.)

It should be noted that the algorithms whose asymptotic complexity is better than $O(m \ln n)$ usually work slower in practice than the classical algorithms. The constants in the complexity estimates are so large that the better asymptotics gives a noticeable gain only for very large-scale graphs.

## 6 Parallelization resource

1. The properties of '"merging fragments"' and "'the minimum edge of a fragment"' make it possible to process the fragments independently of each other. Among the three classical algorithms, Boruvka's algorithm based on these properties has the maximum parallelization resource.
2. Associativity with respect to edges can be used to parallelize Kruskal's and Prim's algorithms, whose original versions are substantially serial.
3. The use of parallel algorithms for sorting the edges of the entire graph or the parallel sorting of edges incident to each vertex or the parallel sorting inside each fragment.

## 7 References

1. Borůvka, Otakar. “O Jistém Problému Minimálním.” Práce Moravské Přírodovědecké Společnosti III, no. 3 (1926): 37–58.
2. Jarník, Vojtěch. “O Jistém Problému Minimálním (Z Dopisu Panu O. Borůvkovi).” Práce Moravské Přírodovědecké Společnosti 6, no. 4 (1930): 57–63.
3. Kruskal, Joseph B. “On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem.” Proceedings of the American Mathematical Society 7, no. 1 (January 1956): 48–50. doi:10.1090/S0002-9939-1956-0078686-7.
4. Prim, R C. “Shortest Connection Networks and Some Generalizations.” Bell System Technical Journal 36, no. 6 (November 1957): 1389–1401. doi:10.1002/j.1538-7305.1957.tb01515.x.
5. Dijkstra, E W. “A Note on Two Problems in Connexion with Graphs.” Numerische Mathematik 1, no. 1 (December 1959): 269–71. doi:10.1007/BF01386390.
6. Gallager, Robert G, P A Humblet, and P M Spira. “A Distributed Algorithm for Minimum-Weight Spanning Trees.” ACM Transactions on Programming Languages and Systems 5, no. 1 (1983): 66–77. doi:10.1145/357195.357200.
7. Gafni, Eli. “Improvements in the Time Complexity of Two Message-Optimal Election Algorithms,” 175–85, New York, New York, USA: ACM Press, 1985. doi:10.1145/323596.323612.
8. Awerbuch, B. “Optimal Distributed Algorithms for Minimum Weight Spanning Tree, Counting, Leader Election, and Related Problems,” 230–40, New York, New York, USA: ACM Press, 1987. doi:10.1145/28395.28421.
9. Gabow, Harold N, Zvi Galil, Thomas H Spencer, and Robert Endre Tarjan. “Efficient Algorithms for Finding Minimum Spanning Trees in Undirected and Directed Graphs.” Combinatorica 6, no. 2 (June 1986): 109–22. doi:10.1007/BF02579168.
10. Fredman, Michael L, and Dan E Willard. “Trans-Dichotomous Algorithms for Minimum Spanning Trees and Shortest Paths.” Journal of Computer and System Sciences 48, no. 3 (June 1994): 533–51. doi:10.1016/S0022-0000(05)80064-9.
11. Karger, David R, Philip N Klein, and Robert Endre Tarjan. “A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees.” Journal of the ACM 42, no. 2 (March 1, 1995): 321–28. doi:10.1145/201019.201022.