Graph implementation and representation
A graph is a binary relation. It provides a powerful visualization as a set of points (called nodes) connected by lines (called edges) or arrows (called arcs). In this regard, the graph is a generalization of the tree data model. Like trees, graphs come in several forms: directed, undirected and labeled.
Contents
Directed vs Undirected Graphs
A undirected graph means that the relationship along an edge between two nodes is bidirectional, i.e. it can go either way. A directed graph means that the relationship only goes one way.
Minimal Description of the Graph Data Type
As an abstract type, we can view a graph as a collection of elements that are stored at the graph's edges and nodes (nodes are also sometimes referred to as vertices). The essential methods we need for dealing with graphs will be the following:
1 2 3 4 5 6 7 |
|
As discussed before, there are two standard ways of representing a graph: the adjacency list and the adjacency matrix implementation. We shall consider these representations in this section.
Adjacency List Implementation
A common way to implement a graph using an adjacency list is to use either a hashtable with an array as values or use a hashtable with linked lists as a value. Since Python combines the idea of arrays and linked lists, we can easily implement this representation using a dictionary with nodes as keys and a list as a value.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
Performance Considerations
As we have discussed, the two most common ways of implementing graphs are using adjacency matrices and using adjacency lists. We tend to prefer adjacency matrices when the graphs are dense, that is, when the number of edges is near the maximum possible number, which is \(n^2\) for a graph of \(n\) nodes. However, if the graphs are sparse, that is, if most of the possible edges are not present, then the adjacency list representation can save space.
Show that the statement above 'if most of the possible edges are not present, then the adjacency list representation may save space' is correct.
To see why, note that an adjacency matrix for an \(n\) node graph has \(n^2\) bits, and therefore could be packed into \(\frac{n^2}{32} \) \(32\)-bit words. The common adjacency list cell will consist of two words, one for the node and one for the pointer to the next cell. Thus, if the number of edges is \(a\), we need about \(2a\) words for the lists, and \(n\) words for the array of headers. The adjacency list will use less space than the adjacency matrix if \[ n + 2a < \frac{n^2}{32}.\ _\square \]
The following table summarizes the preferred representation for various operations:
\[\begin{array}{l|l|l} \textbf{Operation} & \textbf{Dense Graphs} & \textbf{Sparse Graphs} \\ \hline \text{Lookup edge} & \text{Adjacency matrix} & \text{Either} \\ \text{Find succesor} & \text{Either} & \text{Adjacency lists} \\ \text{Find predecessors} & \text{Adjacency matrix} & \text{Either} \end{array}\]
Minimum Spanning Trees
A spanning tree for an undirected graph \(G\) is the nodes of \(G\) together with a subset of the edges of \(G\) that
- connect the nodes, that is, there is a path between any two nodes using only edges in the spanning tree;
- form an unrooted, unordered tree, that is, there are no (simple) cycles.
Find all the spanning trees of the graph below.
By simple observation, we find 4 total spanning trees.
Now suppose the edges of the graph have weights or lengths. The weight of a tree is just the sum of weights of its edges. Obviously, different spanning trees have different lengths. Here is the problem: "How can you find the minimum length spanning tree?"
Find the minimum spanning tree of the weighted graph below.
Remember that the minimum spanning tree is the spanning tree that minimizes the sum of the edges. By observation we see that the following is the minimum spanning tree:
Kruskal's Minimal Spanning Tree Algorithm
There are several algorithms for finding minimal spanning trees. We shall consider a specific one called Kruskal's algorithm. The algorithm is executed as follows:
Kruskal's Algorithm
- Set an empty set
A={}
andF = E
, whereE
is the set of all edges.Choose an edge
e
inF
of minimum weight, and check whether addinge
toA
creates a cycle.
- If it does, remove
e
fromF
.- If it doesn't, move
e
fromF
toA
.If
F={}
, stop and output the minimal spanning tree(V,A)
.
Kruskal's algorithm is a good example of a greedy algorithm, in which we make a series of decisions, each doing what seems best at the time. The local decisions are which edge to add to the spanning tree formed. In each case, we pick the edge with the least label that does not violate the definition of a spanning tree by completing a cycle. Often the overall effect of a locally optimal solution is not globally optimal. However, Kruskal's algorithm is a case where this is not true.
Show that the running time of Kruskal's algorithm is \(O(m \log n ) \) on a graph with \(n\) nodes and \(m\) is the larger of either the nodes or edges.
Let us assume the graph is represented by adjacency lists, so we can find all edges in \(O(m)\) time.
We must first sort the edges by label, which takes \(O(m \log m ) \) time, if we use merge sort. Next, we consider the edges, taking \(O(m \log n) \) time to do all the merges and finds. It appears that the total time for Kruskal's algorithm is thus \[ O\big(m( \log n + \log m ) \big). \]
Notice however that \(m \leq n^2 \) because there are only \(\frac{n(n-1)}{2} \) pairs of nodes. Thus, \(\log m \leq 2\log n \) and \(m(\log n + \log m) \leq 3m \log n \). Since the constants can be neglected, we conclude that indeed takes \[O(m \log n ) \] time. \(_\square\)