Graph implementation and representation

A graph is a binary relation. It provides a powerful visualization as a set of points (called nodes) connected by lines (called edges) or arrows (called arcs). In this regard, the graph is a generalization of the tree data model. Like trees, graphs come in several forms: directed, undirected and labeled.

Directed vs Undirected Graphs

A undirected graph means that the relationship along an edge between two nodes is bidirectional, i.e. it can go either way. A directed graph means that the relationship only goes one way. Undirected graphs are typically represented by a line with no arrows, which imply a bidirectional relationship between node A and node B. Directed graphs use an arrow to show the relationship from A to B.

Minimal Description of the Graph Data Type

As an abstract type, we can view a graph as a collection of elements that are stored at the graph's edges and nodes (nodes are also sometimes referred to as vertices). The essential methods we need for dealing with graphs will be the following:

nodes(): Returns all the nodes of the graph

edges(): Returns all the edges of a graph

insert_node(value, x): Inserts a node of value v which as a neighbor of x

remove_node(node): Removes the given node of the graph

As discussed before, there are two standard ways of representing a graph: the adjacency list and the adjacency matrix implementation. We shall consider these representations in this section.

Adjacency List Implementation

A common way to implement a graph using an adjacency list is to use either a hashtable with an array as values or use a hashtable with linked lists as a value. Since Python combines the idea of arrays and linked lists, we can easily implement this representation using a dictionary with nodes as keys and a list as a value.

class Node(object):
    def __init__(self, value):
        self.value = value

class Graph(object):
    def __init__(self):
        """Node to list of neighbors hashtable (dict/dictionary in python)"""
        self.nodes = {}

    def nodes(self):
        """Returns all the nodes of the graph"""
        return self.nodes.keys()

    def edges(self):
        """Returns all the edges of a graph"""
        edges = []
        for node in self.nodes:
            for incident in self.nodes[node]:
                edge = (node, incident)
                if edge not in edges:
                    edges.append(edge)
        return edges

    def insert_node(self, value, x):
        """Insert a node with a value which is a neighbor of x"""
        node = Node(value)
        self.nodes[node] = [x]
        self.nodes[x].append(node)

    def remove_node(self, node):
        """Removes a given node  of the graph"""
        for node in self.nodes:
            if node in self.nodes[node]:
                self.nodes[node].remove(node)
        del self.nodes[node]

Performance Considerations

As we have discussed, the two most common ways of implementing graphs are using adjacency matrices and using adjacency lists. We tend to prefer adjacency matrices when the graphs are dense, that is, when the number of edges is near the maximum possible number, which is \(n^2\) for a graph of \(n\) nodes. However, if the graphs are sparse, that is, if most of the possible edges are not present, then the adjacency list representation can save space.

Show that the statement above 'if most of the possible edges are not present, then the adjacency list representation may save space' is correct.

To see why, note that an adjacency matrix for an \(n\) node graph has \(n^2\) bits, and therefore could be packed into \(\frac{n^2}{32} \) \(32\)-bit words. The common adjacency list cell will consist of two words, one for the node and one for the pointer to the next cell. Thus, if the number of edges is \(a\), we need about \(2a\) words for the lists, and \(n\) words for the array of headers. The adjacency list will use less space than the adjacency matrix if \[ n + 2a < \frac{n^2}{32}.\ _\square \]

The following table summarizes the preferred representation for various operations:

\[\begin{array}{l|l|l} \textbf{Operation} & \textbf{Dense Graphs} & \textbf{Sparse Graphs} \\ \hline \text{Lookup edge} & \text{Adjacency matrix} & \text{Either} \\ \text{Find succesor} & \text{Either} & \text{Adjacency lists} \\ \text{Find predecessors} & \text{Adjacency matrix} & \text{Either} \end{array}\]

Minimum Spanning Trees

A spanning tree for an undirected graph \(G\) is the nodes of \(G\) together with a subset of the edges of \(G\) that

connect the nodes, that is, there is a path between any two nodes using only edges in the spanning tree;
form an unrooted, unordered tree, that is, there are no (simple) cycles.

Find all the spanning trees of the graph below. image

By simple observation, we find 4 total spanning trees. image

Now suppose the edges of the graph have weights or lengths. The weight of a tree is just the sum of weights of its edges. Obviously, different spanning trees have different lengths. Here is the problem: "How can you find the minimum length spanning tree?"

Find the minimum spanning tree of the weighted graph below. image

Remember that the minimum spanning tree is the spanning tree that minimizes the sum of the edges. By observation we see that the following is the minimum spanning tree: image

Kruskal's Minimal Spanning Tree Algorithm

There are several algorithms for finding minimal spanning trees. We shall consider a specific one called Kruskal's algorithm. The algorithm is executed as follows:

Kruskal's Algorithm

Set an empty set A={} and F = E, where E is the set of all edges.

Choose an edge e in F of minimum weight, and check whether adding e to A creates a cycle.

If it does, remove e from F.

If it doesn't, move e from F to A.

If F={}, stop and output the minimal spanning tree (V,A).

Kruskal's algorithm is a good example of a greedy algorithm, in which we make a series of decisions, each doing what seems best at the time. The local decisions are which edge to add to the spanning tree formed. In each case, we pick the edge with the least label that does not violate the definition of a spanning tree by completing a cycle. Often the overall effect of a locally optimal solution is not globally optimal. However, Kruskal's algorithm is a case where this is not true.

Show that the running time of Kruskal's algorithm is \(O(m \log n ) \) on a graph with \(n\) nodes and \(m\) is the larger of either the nodes or edges.

Let us assume the graph is represented by adjacency lists, so we can find all edges in \(O(m)\) time.

We must first sort the edges by label, which takes \(O(m \log m ) \) time, if we use merge sort. Next, we consider the edges, taking \(O(m \log n) \) time to do all the merges and finds. It appears that the total time for Kruskal's algorithm is thus \[ O\big(m( \log n + \log m ) \big). \]

Notice however that \(m \leq n^2 \) because there are only \(\frac{n(n-1)}{2} \) pairs of nodes. Thus, \(\log m \leq 2\log n \) and \(m(\log n + \log m) \leq 3m \log n \). Since the constants can be neglected, we conclude that indeed takes \[O(m \log n ) \] time. \(_\square\)

Contents

Directed vs Undirected Graphs

Minimal Description of the Graph Data Type

Performance Considerations

Minimum Spanning Trees

Kruskal's Minimal Spanning Tree Algorithm

See Also