Combinatorial Optimization
Combinatorial optimization is an emerging field at the forefront of combinatorics and theoretical computer science that aims to use combinatorial techniques to solve discrete optimization problems. A discrete optimization problem seeks to determine the best possible solution from a finite set of possibilities.
From a computer science perspective, combinatorial optimization seeks to improve an algorithm by using mathematical methods either to reduce the size of the set of possible solutions or to make the search itself faster. From a combinatorics perspective, it interprets complicated questions in terms of a fixed set of objects about which much is already known: sets, graphs, polytopes, and matroids. Combinatorial optimization refers primarily to the methods used to approach such problems and, for the most part, does not provide guidelines on how to turn real-world problems into abstract mathematical questions, or vice versa.
This subject originally grew out of graph theoretic concerns like edge colorings in undirected graphs and matchings in bipartite graphs. Much of the initial progress is due entirely to theorems in graph theory and their duals. With the advent of linear programming, these methods were applied to problems including assignment, maximal flow, and transportation. In the modern era, combinatorial optimization is useful for the study of algorithms, with special relevance to artificial intelligence, machine learning, and operations research.
The Marriage Problem, in its traditional formulation, seeks to find a set of \(n\) couples in a set of \(n\) men and \(n\) women such that each couple contains one man and one woman and no two people would prefer each other to their own partners. Each woman has an ordering of preferences for the men, and each man has an ordering of preferences for the women.
Given a set of preferences for the \(n\) men and \(n\) women, does there exist a solution? What is a good way to find such a solution?
It turns out there will always exist a solution, and this is known as Hall's Marriage Theorem. This general sort of problem is known as a matching problem, and it is one of a few central types of problems in combinatorial optimization.
Fundamentals
This subject encompasses the study of partially ordered sets, graphs, and matroids.
A partially ordered set, abbreviated to poset, is a nonempty set \(P\) together with a partial order \(\preceq\) on \(P\) such that
- For all \(x \in P\), \(x \preceq x\).
- For all \(x, y \in P\), \(x \preceq y\) and \(y \preceq x\) implies \(x = y\).
- For all \(x, y, z \in P\), \(x \preceq y\) and \(y \preceq z\) implies \(x \preceq z\).
In a poset \(P\), two distinct \(x, y \in P\) are comparable if \(x \preceq y\) or \(y \preceq x\). A chain of \(P\) is a subset of \(P\) in which any two elements are comparable. An antichain of \(P\) is a subset of \(P\) in which any two elements are not comparable.
A graph \(G = (V, \, E)\) is a set \(V\) (denoted vertices) together with a set \(E \subseteq \{ \{v_1, \, v_2\} \mid v_1, v_2 \in V, \, v_1 \ne v_2\}\) (denoted edges) of \(2\) element subsets of \(V\). A bipartite graph is a graph \(G = (V, \, E)\) such that there exist \(V_1\) and \(V_2\) satisfying \(V = V_1 \sqcup V_2\) and for all \(v_1, v_1' \in V_1\), \(v_2, v_2' \in V_2\), \(\{v_1, \, v_1'\} \not\in E\) and \(\{v_2, \, v_2'\} \not\in E\). Vertex \(v\) and edge \(e\) are said to be incident if \(v \in e\).
A vertex matching of graph \(G\) is a set of edges such that no two edges are incident to the same vertex. A vertex cover of \(G\) is a set of vertices such that every edge is incident to at least one element of the vertex cover.
A path is a finite sequence of distinct edges \((e_1, \, \dots, e_n)\) such that for each \(k \in \{1, \dots, \, n-1\}\), \(e_k \cap e_{k-1} \ne \emptyset\). The path is said to start at vertex \(v \in e_1 \setminus e_1 \cap e_2\) and end at \(w \in e_n \setminus e_n \cap e_{n-1}\).
A collection of paths is edge-disjoint if no pair of them contain the same edge. An edge-separating set of two vertices \(a\) and \(b\) (or more generally, sets of vertices \(A\) and \(B\)) is a set of edges such that any path from \(a\) to \(b\) (or, from a vertex of \(A\) to a vertex of \(B\)) contains at least one of those edges.
A matriod is a pair of sets \((E, \, \mathcal{F})\) such that
- \(\mathcal{F}\) contains the emptyset \(\emptyset\).
- If \(X \subseteq Y\) and \(Y \in \mathcal{F}\), then \(X \in \mathcal{F}\).
- If \(X, \, Y \in \mathcal{F}\) and \(|X| > |Y|\), then there is an \(x\) in \(X \setminus Y\) such that \(Y \cup \{x\} \in \mathcal{F}\).
Min-Max Relations
The tools of interest in combinatorics suggest that there may be broader themes of optimality. In fact, intuitively speaking, the interplay between minimality and maximality rises from the construction of the objects themselves. For instance, in graph theory, trees are graphs that are, equivalently, maximally acyclic and minimally connected. Computational problems on the defining features of graphs may often be solved through the use of a greedy algorithm (and such an algorithm inherently requires some maximal or minimal feature); for instance, in the solution to the minimal spanning tree problem, Kruskal's algorithm encodes the definition of a tree as "maximally acyclic" while Prim's Algorithm takes advantage of the characterization as "minimally connected".
Given a poset \(P\), the maximal size of an antichain of \(P\) is equal to the minimal size of a partition of \(P\) into chains.
Proceed by induction on the size of the poset: the theorem holds for the poset of one element.
Now suppose it holds for all posets of size at most \(n\), and let \(P\) be a \(n+1\) element poset with some maximal element \(x \in P\). Then, \(P \setminus \{x\}\) (with the inherited partial order) has a partition of minimal size into \(k\) chains \(C_1, \dots, \, C_k\). By the inductive hypothesis, the maximal size of an antichain is \(k\), and since any two elements of an antichain may not be in the same chain, we may exhibit antichains of maximal size simply by taking \(x_i \in C_i\) for each \(i\) and considering \(\{x_1, \dots, \, x_k\}\); let \(A = \{a_1, \dots, \, a_k\}\) be the antichain of maximal size in which each \(a_i\) is the maximal element of \(C_i\) (guaranteed to be unique since all elements of a chain are comparable). Then either \(x\) is not comparable to any element of \(A\), in which case \(\{x\}, \, C_1, \dots, \, C_k\) is a partition of \(P\) into \(k + 1\) chains and \(\{x\} \cup A\) is an antichain of length \(k + 1\). Or \(x\) is comparable to an element \(a_j \in A\) for some fixed \(j\), and \(C_1, \dots, \, C_j \cup \{x\}, \dots, \, C_k\) is a partition of \(P\) into \(k\) chains and \(A\) is still a maximal antichain of \(P\). So \(P\) satisfies Dilworth's Theorem, and by the principle of mathematical induction, the proof is complete.
Mirsky's Theorem.
Given a poset \(P\), the maximal size of a chain of \(P\) is equal to the minimal size of a partition of \(P\) into antichains.
Suppose \(C\) is a chain of maximal size of \(P\) and \(\{ A_1, \dots, \, A_n \}\) is a cover of antichains. Then, since \(\mid C \cap A_i \mid \le 1\) for all \(i\), it follows that \[\mid C \cap (A_1 \cup \dots \cup A_n) \mid = \mid C \cap P \mid = \mid C \mid \le n.\] Then, note that there is a cover of antichains \(\{A_1', \dots, \, A_n' \}\) where for each \(k \in \{1, \dots, \, n\}\), \[A_k' := \{x \in P \mid \text{the chain of maximal size containing maximal element } x \text{ has length } k \}.\] Note immediately that \(A_k'\) is indeed an antichain; for if distinct \(x, y \in A_k'\) are comparable (say, \(x \preceq y\)), then \(y\) is the maximal element of a chain of length \(k+1\), given by \(y\) together with the chain of length \(k\) and maximal element \(x\) (it follows from transitivity that this is indeed a chain). Additionally, the \(\{A_k'\}_{k \in [n]}\) form a partition of \(P\), since the maximal length of a chain of \(n\), so every element of \(P\) is the maximal element of some chain of length either \(1, \, 2, \dots, \, n\). Thus, they necessarily cover \(P\) and therefore \(\{A_k'\}_{k \in [n]}\) is a cover of \(P\) by antichains. It follows that \(n \le \mid C \mid\). We conclude the theorem as stated.
\(\textbf{König's Theorem.}\)
Given a bipartite graph \(G\), the maximal size of a vertex matching of \(G\) is equal to the minimal size of a vertex cover of \(G\).
Suppose \(G = (V, \, E)\) is a bipartite graph, and \(A\) is a vertex cover of \(G\). Let \(B = V \setminus A\). Then, consider the poset \(P\) on \(V\) with partial order \(\preceq\) such that for any \(a \in A\) and \(b \in B\), \(a \preceq b\) iff \(a\) is connected to \(b\). The result is then precisely Dilworth's Theorem.
Given a set \(X\) and collection \(\mathcal{S} = \{S_1, \dots, \, S_n\} \subseteq \mathcal{P}(X)\), there exists a set of \(n\) distinct elements \(\{x_1, \dots, \, x_n\} \subseteq X\) such that \(x_i \in S_i\) for each \(i \in \{1, \dots, \, n\}\) iff for any \(\mathcal{T} \subseteq \mathcal{S}\), the inequality \(\mid \mathcal{T} \mid \le \mid \bigcup \mathcal{T} \mid\) holds.
Menger's Theorem.
Given a graph \(G\) and vertices \(v\) and \(w\), the maximal size of a set of edge-disjoint paths from \(v\) to \(w\) is equal to the minimal size of an edge-separating set of \(v\) and \(w\).
Linear Programming
In its mathematical formulation, a linear program takes as input an \(m \times n\) matrix \(A\) and column vectors \(b\) of length \(m\) and \(c\) of length \(n\). Its goal is the following:
Find a column vector \(x\) of length \(n\) satisfying \(Ax \le b\) such that \(c^T x\) is maximal. Otherwise, decide that there is no \(x\) satisfying that constraint or that there is no maximal \(x\) and \(x\) is unbounded.
If such an \(x\) exists, it is known as an optimal solution.
The problem is known as infeasible if there is no solution \(x\) satisfying \(A x \le b\), and it is known as unbounded if for all real numbers \(r\), there is a column vector \(x\) such that \(c^T x > r\).
Any linear program that is not infeasible and not unbounded must have an optimal solution.
In essence, this follows from theorems of linear algebra.
Linear programs are often discussed in terms of polyhedra, and when the polyhedra is bounded, polytopes. Graphically, the points \(x\) satisfying \(A x \le b\) create a polyhedron, and its dimension is \[n - \text{max}(\text{rank}(A) \mid A \text{ is an } n \times n \text{-matrix with } Ax = Ay \text{ for all points } x \text{ and } y).\] By moving around on the faces of the polyhedron, it is possible to find an optimal solution. This gives rise to the simplex method, which while effective (and polynomial) in general has very poor (exponential) performance in the worst case.
All linear programs can be reduced to the "standard form" \(\text{max}(c, \, x)\) where \(A x = b\) and \(x \ge 0\).
For instance, suppose a linear program has the following constraints on positive \(x_1\), \(x_2\), and \(x_3\):
\[\begin{align*} x_1 + x_2 &\le 10 \\ 2x_1 + x_2 + x_3 &\le 21. \end{align*}\]
Then, the "slack variables" \(y_1\) and \(y_2\) (each \(\ge 0\)) should be introduced such that
\[\begin{align*} x_1 + x_2 + y_1 &= 10 \\ 2x_1 + x_2 + x_3 + y_2 &= 21. \end{align*}\]
It follows that whatever the linear program seeks to maximize can be maximized with the introduction of the \(y_i\)'s as well.
Algorithms
Prim's algorithm provides a method for solving one of the simplest problems of combinatorial optimization: finding a minimum spanning tree on a (weighted) graph. It takes advantage of the fact that tress are minimally connected graphs and that graphs have a matroid structure (and therefore are susceptible to certain implementations of the greedy algorithm). It works as follows:
- Select a vertex on the graph.
- Of the edges connecting the selected vertices to the rest of the graph, select the one with minimal weight (and the associated vertex).
- Repeat step 2 until all vertices are in the tree.
The algorithm manages to iteratively take a local feature (the minimal weight in some neighborhood of the graph) and turn it into a global feature (a minimum spanning tree of the graph). This is a greedy algorithm. With proper implementation, this algorithm can have complexity \(O(E + V \log V).\)
Kruskal's algorithm tackles the same problem and takes advantage of the fact that trees are maximally acyclic graphs. It works as follows:
- Order the edges from minimal weight to maximal weight, and start with an empty set of edges.
- Add the edge of minimal weight that does not make the set contain any cycles.
- Repeat step 2 until the set determines a spanning tree.
This algorithm works because any other spanning tree could be made from the same process-- but would be later in the sequence. It follows that the spanning tree in question is minimal. The algorithm has complexity \(O(E \log V)\).