Graphs

There are many systems in the natural world and in society that are amenable to mathematical and computational modeling. However, not everything is easily codified as a system of particles with coordinates and momenta. Some systems and problems such as social networks, ecologies, and genetic regulatory schemes are intrinsically divorced from spacetime descriptions, and instead are more naturally expressed as graphs that reflect their topological properties. At their simplest, graphs are simply collections of nodes – representing some class of objects like people, corporate boards, proteins, or destinations on the globe – and edges, which serve to represent connections like friendships, bridges, or molecular binding interactions.

What is a graph?

Consider the highway system of the eastern coast of the United States. A road inspector is given the task of writing reports about the current condition of each highway. What would be the most economical way for him to traverse all the cities? The problem can be modeled as a graph.

In fact, since graphs are dots and lines , they look like road maps. The dots are called vertices or nodes and the lines are called edges. They may have a value assigned to them (weighted) or they may just be a mere indicator of the existence of a path (unweighted). More formally, a graph can be defined as follows:

A graph \(G\) consists of a set of \(V\) of vertices (or nodes) and a set \(E\) of edges (or arcs) such that each edge \(e \in E\) is associated with a pair of vertices \(\in V\). A graph \(G\) with vertices \(V\) and edges \(E\) is written as \(G=(V,E)\).

Because graphs are so pervasive, it is useful to define different types of graphs. The following are the most common types of graphs:

Undirected graph: An undirected graph is one in which its edges have no orientations, i.e. no direction is associated with any edge. The edges \((x,y)\) and \((y,x)\) are equivalent.

Directed graph: A directed graph or digraph \(G\) consists of a set \(V\) of vertices (or nodes) and a set of edges (or arcs) such that each edge \(e \in E\) is associated with an ordered pair of vertices. If there is an edge \((x,y)\), it is completely distinct from the edge \((y,x)\).

Undirected graphs are typically represented by a line with no arrows, which implies a bidirectional relationship between node A and node B. Directed graphs use an arrow to show the relationship from A to B.

Directed acyclic graph: A directed acyclic graph (commonly abbreviated as DAG) is a directed graph with no directed cycles. A cycle is any path \(\{A_1, \ldots, A_n\}\) such that the edges \(A_1\rightarrow A_2\), \(A_2\rightarrow A_3\), \(\ldots\), and \(A_n\rightarrow A_1\) all exist, thus forming a loop. A DAG is a graph without a single cycle.

List all the edges and vertices of the undirected graph \(G\) in the figure above.

The graph \(G\) consists of the set of vertices \(V\) = {Massachusetts, Maine, Connecticut, New York, Maryland, New Jersey}.

Its edges are \(E =\) {(Maine,Massachusetts) , (Massachusetts, Connecticut) , (Connecticut,New York), (New York,Maine), (New York,Massachusetts), (New Jersey, Maine),(Maryland, New York), (Maine, Maryland)}.

Note that since the graph is undirected, the order of the tuples in denoting the edges is unimportant. \(_\square\)

Representation of Graphs

Above we represented a graph by drawing it. To represent it in a computer, however, we need more formal ways of representing graphs. Here we discuss the two most common ways of representing a graph: the adjacency matrix and the adjacency list.

The adjacency matrix

Represent the graph above using an adjacency matrix.

To obtain the adjacency matrix of the graph, we first label the rows and columns with the corresponding ordered vertices. If there exists an edge between two vertices \(i\) and \(j\), then their corresponding cell in the matrix will be assigned \(1\). If there does not exist an edge, then the cell will be assigned the value \(0\). The adjacency matrix for the graph above is thus

\[\quad \begin{bmatrix} & a & b & c & d & e\\ a & 0 & 1 & 1 & 0 & 1 \\ b & 1 & 0 & 0 & 0 & 0\\ c & 1 & 0 & 0 & 1 & 1\\ d & 0 & 0 & 1 & 0 & 0\\ e & 1 & 0 & 1 & 0 & 0\\ \end{bmatrix}. \ _\square \quad\]

Adjacency list

An adjacency list representation of a graph is a way of associating each vertex (or node) in the graph with its respective list of neighboring vertices. A common way to do this is to create a Hash table. This table will contain each vertice as a key and the list of adjacent vertices of that vertices as a value.

For our example above, the adjacency list representation will look as follows:

We can see that the adjacency list is much less expensive on memory as the adjacency matrix is very sparse.

Most graph algorithms involve visiting each vertex in \(V\), starting from a root node \(v_0\). There are several ways of achieving this. The two most common traversal algorithms are breadth-first search and depth-first search.

Breadth-first Search

In a breadth-first search, we start with the start node, followed by its adjacent nodes, then all nodes that can be reached by a path from the start node containing two edges, three edges, and so on. Formally the BFS algorithm visits all vertices in a graph \(G\), that are \(k\) edges away from the source vertex \(s\) before visiting any vertex \(k+1\) edges away. This is done until no more vertices are reachable from \(s\). The image below demonstrates exactly how this traversal proceeds:

For a graph \(G = (V,E)\) and a source vertex \(v\), breadth-first search traverses the edges of \(G\) to find all reachable vertices from \(v\). It also computes the shortest distance to any reachable vertex. Any path between two points in a breadth-first search tree corresponds to the shortest path from the root \(v\) to any other node \(s\).

BFS(v){
    {add v to queue and mark it}
    Add(Q, v)
    Mark v as visited
    while (not IsEmpty(Q)) do
    begin
        w = QueueFront(Q)
        Remove(Q)
        {loop invariant : there is a path from vertex w to every vertex in the queue Q}
        for each unvisited vertex u adjacent to w do
        begin
            Mark u as visited
            Add(Q , u)
        end { for }
    end{ while
}

We may think of three types of vertices in BFS as tree verties, those that have been taken of the data structure. fringe vertices, those adjacent to tree vertices but not yet visited, and undiscovered vertices, those that we have not encountered yet. If each visited vertex is connected to the edge that caused it to be added to the data structure, then these edges form a tree.

To search a connected component of a graph systematically, we begin with one vertex on the fringe, all others unseen, and perform the following step until all vertices have been visited: "move one vertex \(x\) from the fringe to the tree, and put any unseen vertices adjacent to \(x\) on the fringe." Graph traversal methods differ in how it is decided which vertex should be moved from the fringe to the tree. For breadth-first search we want to choose the vertex from the fringe that was least recently encountered; this corresponds to using a queue to hold vertices on the fringe.

What is the state of the queue at each iteration of BFS, if it is called from node 'a'?

PP

The table below shows the contents of the queue as the procedure. BFS visits vertices in the graph above. BFS will visit the same vertices as DFS. In this example all of them.

\[\begin{array}{l|r} \textbf{Node Visited} & \textbf{Queue} \\ \hline \text{a} & \text{a} \\ \text{ } & \text{(empty)} \\ \text{b } & \text{b} \\ \text{f } & \text{b f} \\ \text{i} & \text{b f i} \\ \text{ } & \text{f i} \\ \text{c} & \text{f i c} \\ \text{e} & \text{f i c e} \\ \text{ } & \text{ i c e} \\ \text{g } & \text{ i c e g} \\ \text{ } & \text{ c e g} \\ \text{ } & \text{ e g} \\ \text{d } & \text{ e g d} \\ \text{ } & \text{ g d} \\ \text{ } & \text{ d} \\ \text{ } & \text{ (empty)} \\ \text{ h} & \text{ h} \\ \text{ } & \text{ (empty) } \end{array}\]

Depth-first Search

Depth-ﬁrst search explores edges out of the most recently discovered vertex \(s\) that still has unexplored edges leaving it. Once all of ’s edges have been explored, the search “backtracks” to explore edges leaving the vertex from which was discovered. This process continues until we have discovered all the vertices that are reachable from the original source vertex. If any undiscovered vertices remain, then depth-ﬁrst search selects one of them as a new source, and it repeats the search from that source. The algorithm repeats this entire process until it has discovered every vertex:

Visit a vertex \(s\).
Mark \(s\) as visited.
Recursively visit each unvisited vertex attached to \(s\).

A recursive implementation of DFS:

procedure DFS(G,v):
    label v as discovered
    for all edges from v to w in G.adjacentEdges(v) do
        if vertex w is not labeled as discovered then
            recursively call DFS(G,w)

A non-recursive implementation of DFS, it delays whether a vertex has been discovered until the vertex has been popped from the stack.

procedure DFS-iterative(G,v):
    let S be a stack
    S.push(v)
    while S is not empty
        v = S.pop() 
        if v is not labeled as discovered:
            label v as discovered
            for all edges from v to w in G.adjacentEdges(v) do
                S.push(w)

Contrasting Traversals

Similar to tree traversal, the code for breadth-first search is slightly different from depth-first search. The most commonly mentioned difference is that BFS uses a queue to store alternative choices instead of a stack. This small change however leads to a classical graph traversal algorithm. Depth-first search goes down a single path until the path leads to the goal or we reach a limit. When a path is completely explored we back track. BFS however explores all paths from the starting location at the same time.

As we increase the size of our graph, the contrast between depth-first and breadth-first search is quite evident. Depth-first search explores the graph by looking for new vertices far away from the start point, taking closer vertices only when dead ends are encountered; breadth-first search completely covers the area close to the starting point, moving farther away only when everything close has been looked at. Again, the order in which the nodes are visited depends largely upon the effects of this ordering on the order in which vertices appear on the adjacency lists.

Contents

What is a graph?

Representation of Graphs

Breadth-first Search

Depth-first Search

Contrasting Traversals

Additional Problems

Contents

Tagged under #ComputerScience as this problem is quite tedious to do without it, though not impossible.

Image Credit: http://hdscreen.me/wallpaper/2645876-bridges-fantasy-art-landscapes-mountains