Graphs
There are many systems in the natural world and in society that are amenable to mathematical and computational modeling. However, not everything is easily codified as a system of particles with coordinates and momenta. Some systems and problems such as social networks, ecologies, and genetic regulatory schemes are intrinsically divorced from spacetime descriptions, and instead are more naturally expressed as graphs that reflect their topological properties. At their simplest, graphs are simply collections of nodes – representing some class of objects like people, corporate boards, proteins, or destinations on the globe – and edges, which serve to represent connections like friendships, bridges, or molecular binding interactions.
Contents
What is a graph?
Consider the highway system of the eastern coast of the United States. A road inspector is given the task of writing reports about the current condition of each highway. What would be the most economical way for him to traverse all the cities? The problem can be modeled as a graph.
In fact, since graphs are dots and lines , they look like road maps. The dots are called vertices or nodes and the lines are called edges. They may have a value assigned to them (weighted) or they may just be a mere indicator of the existence of a path (unweighted). More formally, a graph can be defined as follows:
A graph \(G\) consists of a set of \(V\) of vertices (or nodes) and a set \(E\) of edges (or arcs) such that each edge \(e \in E\) is associated with a pair of vertices \(\in V\). A graph \(G\) with vertices \(V\) and edges \(E\) is written as \(G=(V,E)\).
Because graphs are so pervasive, it is useful to define different types of graphs. The following are the most common types of graphs:
Undirected graph: An undirected graph is one in which its edges have no orientations, i.e. no direction is associated with any edge. The edges \((x,y)\) and \((y,x)\) are equivalent.
Directed graph: A directed graph or digraph \(G\) consists of a set \(V\) of vertices (or nodes) and a set of edges (or arcs) such that each edge \(e \in E\) is associated with an ordered pair of vertices. If there is an edge \((x,y)\), it is completely distinct from the edge \((y,x)\).
Directed acyclic graph: A directed acyclic graph (commonly abbreviated as DAG) is a directed graph with no directed cycles. A cycle is any path \(\{A_1, \ldots, A_n\}\) such that the edges \(A_1\rightarrow A_2\), \(A_2\rightarrow A_3\), \(\ldots\), and \(A_n\rightarrow A_1\) all exist, thus forming a loop. A DAG is a graph without a single cycle.
List all the edges and vertices of the undirected graph \(G\) in the figure above.
The graph \(G\) consists of the set of vertices \(V\) = {Massachusetts, Maine, Connecticut, New York, Maryland, New Jersey}.
Its edges are \(E =\) {(Maine,Massachusetts) , (Massachusetts, Connecticut) , (Connecticut,New York), (New York,Maine), (New York,Massachusetts), (New Jersey, Maine),(Maryland, New York), (Maine, Maryland)}.
Note that since the graph is undirected, the order of the tuples in denoting the edges is unimportant. \(_\square\)
Government surveillance agencies have a tendency to accumulate strange new powers during times of panic. The US National Security Agency (NSA) now has the ability to monitor the communications of suspected individuals as well as the communications of people within some number of hops of any suspect. In the communication network above, which person is connected to the greatest number of people through 1 hop or less?
Details and Assumptions:
- Each dot represents a person.
- Each line represents communication between the people on either end.
- If X communicates with Y, and Y communicates with Z, we say that X and Z have a 1-hop connection, and that X has a 0-hop connection with Y.
Representation of Graphs
Above we represented a graph by drawing it. To represent it in a computer, however, we need more formal ways of representing graphs. Here we discuss the two most common ways of representing a graph: the adjacency matrix and the adjacency list.
The adjacency matrix
Represent the graph above using an adjacency matrix.
To obtain the adjacency matrix of the graph, we first label the rows and columns with the corresponding ordered vertices. If there exists an edge between two vertices \(i\) and \(j\), then their corresponding cell in the matrix will be assigned \(1\). If there does not exist an edge, then the cell will be assigned the value \(0\). The adjacency matrix for the graph above is thus
\[\quad \begin{bmatrix} & a & b & c & d & e\\ a & 0 & 1 & 1 & 0 & 1 \\ b & 1 & 0 & 0 & 0 & 0\\ c & 1 & 0 & 0 & 1 & 1\\ d & 0 & 0 & 1 & 0 & 0\\ e & 1 & 0 & 1 & 0 & 0\\ \end{bmatrix}. \ _\square \quad\]
Adjacency list
An adjacency list representation of a graph is a way of associating each vertex (or node) in the graph with its respective list of neighboring vertices. A common way to do this is to create a Hash table. This table will contain each vertice as a key and the list of adjacent vertices of that vertices as a value.
For our example above, the adjacency list representation will look as follows:
We can see that the adjacency list is much less expensive on memory as the adjacency matrix is very sparse.
Most graph algorithms involve visiting each vertex in \(V\), starting from a root node \(v_0\). There are several ways of achieving this. The two most common traversal algorithms are breadth-first search and depth-first search.
Breadth-first Search
In a breadth-first search, we start with the start node, followed by its adjacent nodes, then all nodes that can be reached by a path from the start node containing two edges, three edges, and so on. Formally the BFS algorithm visits all vertices in a graph \(G\), that are \(k\) edges away from the source vertex \(s\) before visiting any vertex \(k+1\) edges away. This is done until no more vertices are reachable from \(s\). The image below demonstrates exactly how this traversal proceeds:
For a graph \(G = (V,E)\) and a source vertex \(v\), breadth-first search traverses the edges of \(G\) to find all reachable vertices from \(v\). It also computes the shortest distance to any reachable vertex. Any path between two points in a breadth-first search tree corresponds to the shortest path from the root \(v\) to any other node \(s\).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
We may think of three types of vertices in BFS as tree verties, those that have been taken of the data structure. fringe vertices, those adjacent to tree vertices but not yet visited, and undiscovered vertices, those that we have not encountered yet. If each visited vertex is connected to the edge that caused it to be added to the data structure, then these edges form a tree.
To search a connected component of a graph systematically, we begin with one vertex on the fringe, all others unseen, and perform the following step until all vertices have been visited: "move one vertex \(x\) from the fringe to the tree, and put any unseen vertices adjacent to \(x\) on the fringe." Graph traversal methods differ in how it is decided which vertex should be moved from the fringe to the tree. For breadth-first search we want to choose the vertex from the fringe that was least recently encountered; this corresponds to using a queue to hold vertices on the fringe.
What is the state of the queue at each iteration of BFS, if it is called from node 'a'?
The table below shows the contents of the queue as the procedure. BFS visits vertices in the graph above. BFS will visit the same vertices as DFS. In this example all of them.
\[\begin{array}{l|r} \textbf{Node Visited} & \textbf{Queue} \\ \hline \text{a} & \text{a} \\ \text{ } & \text{(empty)} \\ \text{b } & \text{b} \\ \text{f } & \text{b f} \\ \text{i} & \text{b f i} \\ \text{ } & \text{f i} \\ \text{c} & \text{f i c} \\ \text{e} & \text{f i c e} \\ \text{ } & \text{ i c e} \\ \text{g } & \text{ i c e g} \\ \text{ } & \text{ c e g} \\ \text{ } & \text{ e g} \\ \text{d } & \text{ e g d} \\ \text{ } & \text{ g d} \\ \text{ } & \text{ d} \\ \text{ } & \text{ (empty)} \\ \text{ h} & \text{ h} \\ \text{ } & \text{ (empty) } \end{array}\]
Depth-first Search
Depth-first search explores edges out of the most recently discovered vertex \(s\) that still has unexplored edges leaving it. Once all of ’s edges have been explored, the search “backtracks” to explore edges leaving the vertex from which was discovered. This process continues until we have discovered all the vertices that are reachable from the original source vertex. If any undiscovered vertices remain, then depth-first search selects one of them as a new source, and it repeats the search from that source. The algorithm repeats this entire process until it has discovered every vertex:
- Visit a vertex \(s\).
- Mark \(s\) as visited.
- Recursively visit each unvisited vertex attached to \(s\).
A recursive implementation of DFS:
1 2 3 4 5 |
|
A non-recursive implementation of DFS, it delays whether a vertex has been discovered until the vertex has been popped from the stack.
1 2 3 4 5 6 7 8 9 |
|
Contrasting Traversals
Similar to tree traversal, the code for breadth-first search is slightly different from depth-first search. The most commonly mentioned difference is that BFS uses a queue to store alternative choices instead of a stack. This small change however leads to a classical graph traversal algorithm. Depth-first search goes down a single path until the path leads to the goal or we reach a limit. When a path is completely explored we back track. BFS however explores all paths from the starting location at the same time.
As we increase the size of our graph, the contrast between depth-first and breadth-first search is quite evident. Depth-first search explores the graph by looking for new vertices far away from the start point, taking closer vertices only when dead ends are encountered; breadth-first search completely covers the area close to the starting point, moving farther away only when everything close has been looked at. Again, the order in which the nodes are visited depends largely upon the effects of this ordering on the order in which vertices appear on the adjacency lists.
Additional Problems
John lives in the Trees of Ten Houses, and it is a most ideal and idyllic place for him and the other dwellers up in the canopy. They have invested a tremendous amount of time in engineering these houses, and to ensure no house felt isolated from the others, they built a fresh, finely crafted bridge between each and every house!
Unfortunately, the Trees of Ten Houses were not immune to thunderstorms, nor were the bridges well engineered. The night was treacherous, howling with wind and freezing with rain, so the odds for the bridges were not good--each bridge seemed just as likely to survive as to be shattered!
Fortunately, as there were so very many bridges in the Trees of Ten Houses, when John did wake the following morning, he found he was able to make his way to each and every house using only the existing bridges, though round-about routes may have been necessary. As they began rebuilding, John became curious... what were the chances that they'd all be so lucky?
More formally, if \(P\) is the probability that, after the storm, John is able to traverse to each and every house, what is \(\big\lfloor 10^{10} P \big\rfloor?\)
Details and Assumptions:
- The Trees of Ten Houses do, in fact, contain precisely 10 houses.
- Before the storm, there exists a single bridge between each and every unique pair of houses.
- The storm destroys each bridge with independent probability \(\frac{1}{2}\).
- John is allowed to traverse through others' houses to try to reach all of them, but he must only use the surviving bridges to get there. No vine swinging allowed.