Floyd-Warshall Algorithm

The Floyd-Warshall algorithm is a shortest path algorithm for graphs. Like the Bellman-Ford algorithm or the Dijkstra's algorithm, it computes the shortest path in a graph. However, Bellman-Ford and Dijkstra are both single-source, shortest-path algorithms. This means they only compute the shortest path from a single source. Floyd-Warshall, on the other hand, computes the shortest distances between every pair of vertices in the input graph.

Imagine that you have 5 friends: Billy, Jenna, Cassie, Alyssa, and Harry. You know a few roads that connect some of their houses, and you know the lengths of those roads. But, Floyd-Warshall can take what you know and give you the optimal route given that information. For example, look at the graph below, it shows paths from one friend to another with corresponding distances.

Graph of friends Graph of friends

Floyd-Warshall will tell the optimal distance between each pair of friends. It will clearly tell you that the quickest path from Alyssa's house to Harry's house is the connecting edge that has a weight of 1. But, it will also tell you that the quickest way to get from Billy's house to Jenna's house is to first go through Cassie's, then Alyssa's, then Harry's house before ending at Jenna's. This is the power of Floyd-Warshall; no matter what house you're currently in, it will tell the fastest way to get to every other house.

The Floyd-Warshall algorithm is an example of dynamic programming. It breaks the problem down into smaller subproblems, then combines the answers to those subproblems to solve the big, initial problem. The idea is this: either the quickest path from A to C is the quickest path found so far from A to C, or it's the quickest path from A to B plus the quickest path from B to C.

Floyd-Warshall is extremely useful in networking, similar to solutions to the shortest path problem. However, it is more effective at managing multiple stops on the route because it can calculate the shortest paths between all relevant nodes. In fact, one run of Floyd-Warshall can give you all the information you need to know about a static network to optimize most types of paths. It is also useful in computing matrix inversions.

Overview

At the heart of Floyd-Warshall is this function:

$\text{ShortestPath}(i, j, k).$

This function returns the shortest path from $A$ to $C$ using the vertices from 1 to $k$ in the graph. The vertices are individually numbered ${1, 2, ..., k}$ .

There is a base case and a recursive case. The base case is that the shortest path is simply the weight of the edge connecting $A$ and $C:$

$\text{ShortestPath}(i, j, 0) = \text{weight}(i, j).$

The recursive case will take advantage of the dynamic programming nature of this problem. There are two possible answers for this function. Either the shortest path between $i$ and $j$ is the shortest known path, or it is the shortest known path from $i$ to some vertex (let's call it $z$ ) plus the shortest known path from $z$ to $j:$

$\text{ShortestPath}(i, j, k) = \text{min}\big(\text{ShortestPath}(i, j, k-1), \text{ShortestPath}(i, k, k-1) + \text{ShortestPath}(k, j, k-1)\big).$

Basically, what this function setup is asking this: "Is the vertex $k$ an intermediate of our shortest path (any vertex in the path besides the first or the last)?"

If $k$ is not an intermediate vertex, then the shortest path from $i$ to $j$ using the vertices in $\{1, 2, ..., k-1\}$ is also the shortest path using the vertices in $\{1, 2, ..., k\}.$

If $k$ is an intermediate vertex, then the path can be broken down into two paths, each of which uses the vertices in $\{1, 2, ..., k-1\}$ to make a path that uses all vertices in $\{1, 2, ..., k\}.$ That is because the vertex $k$ is the middle point. This is illustrated in the image below.

Dynamic programming decomposition Dynamic programming decomposition

Algorithm

The Floyd-Warshall algorithm can be described by the following pseudo code:

Create a |V| x |V| matrix, M, that will describe the distances between vertices
For each cell (i, j) in M:
    if i == j:
        M[i][j] = 0
    if (i, j) is an edge in E:
        M[i][j] = weight(i, j)
    else:
        M[i][j] = infinity
for k from 1 to |V|:
    for i from 1 to |V|:
        for j from 1 to |V|:
            if M[i][j] > M[i][k] + M[k][j]:
                M[i][j] = M[i][k] + M[k][j]

Complexity

The Floyd-Warshall algorithm runs in $O\big(|V|^{3}\big)$ time. This is because of the three nested for loops that are run after the initialization and population of the distance matrix, M.

See also: big O notation.

	Average
Floyd-Warshall	$O(\|V\|^{3})$

Floyd-Warshall is completely dependent on the number of vertices in the graph. As you might guess, this makes it especially useful for a certain kind of graph, and not as useful for other kinds.

Is the Floyd-Warshall algorithm better for sparse graphs or dense graphs? (A sparse graph is one that does not have many edges connecting its vertices, and a dense graph has many edges.)

The Floyd-Warshall algorithm is best suited for dense graphs since it is not at all dependent on the number of edges. Performing Floyd-Warshall on a sparse graph erases its main benefit. For sparse graphs, Johnson's algorithm is more suitable. $_\square$

Sample Python Implementation

The following implementation of Floyd-Warshall is written in Python.

In this implementation, infinity is represented by a really large integer.

class Edge:
    def __init__(self, start, end, weight):
        self.start = start
        self.end = end
        self.weight = weight

class Graph:
    def __init__(self):
        self.adj = {} #Adjacency matrix that holds graph data
        self.vertexCount = 0

    def addVertex(self, vertex):
        if vertex in self.adj:
            return "Vertex already exists"
        if vertex != self.vertexCount:
            return "Don't skip a vertex"
        self.adj[vertex] = []
        self.vertexCount += 1

    def addEdge(self, start, end, weight):
        if start not in self.adj:
            return "Starting vertex not in graph"
        if end not in self.adj:
            return "Ending vertex not in graph"
        if start == end:
            return "Cannot have same start and end vertex"
        edge = Edge(start, end, weight)
        self.adj[start].append(edge)

    def doesEdgeExist(self, start, end):
        for vertex in self.adj:
            for edge in self.adj[vertex]:
                if edge.start == start and edge.end == end:
                    return (True, edge)
        return (False, None)

    def floydwarshall(self):
        M = [[9999999 for x in range(len(self.adj))] for y in range(len(self.adj))]
        for x in range(len(M)):
            for y in range(len(M[0])):
                if x == y:
                    M[x][y] = 0
                exists, edge = self.doesEdgeExist(x, y)
                if exists:
                    M[x][y] = edge.weight
        for k in range(len(M)):
            for i in range(len(M)):
                for j in range(len(M)):
                    newDistance = M[i][k] + M[k][j]
                    if newDistance < M[i][j]:
                        M[i][j] = newDistance
        return M

The Edge class on line 1 is a simple object that holds information about the edge such as endpoints and weight. The vertex is just a simple integer for this implementation. The Graph class uses a dictionary--initialized on line 9--to represent the graph. Keys in this dictionary are vertex numbers and the values are a list of edges.

The floydwarshall() function on line 33 creates a matrix M. It populates this matrix with shortest path information for each vertex. For example, the shortest path distance from vertex 0 to vertex 2 can be found at M[0][2].

Path Reconstruction

In general, Floyd-Warshall, at its most basic, only provides the distances between vertices in the resulting matrix. However, a simple change can allow the algorithm to reconstruct the shortest path as well. There are many different ways to do this, and all of them have their costs in memory. Speed is not a factor with path reconstruction because any time it takes to reconstruct the path will pale in comparison to the basic algorithm itself.

The most common way is to compute a sequence of predecessor matrices. During path calculation, even the matrices

$P^{(0)}, P^{(1)}, ..., P^{(n)}$

can be computed. $P^{(k)}_{ij}$ is defined as the predecessor of vertex $j$ on a shortest path from vertex $i$ with all intermediate vertices in the set $1, 2, ... , k$ . So, for each iteration of the main loop, a new predecessor matrix is created.

The recursive formula for this predecessor matrix is as follows:

If $i = j$ or $\text{weight}(i, j) = \infty, P^{0}_{ij} = 0.$
If $i \neq j$ and $\text{weight}(i, j) \lt \infty, P^{0}_{ij} = i.$

Contents