Dilworth's Theorem
Dilworth's Theorem is a result about the width of partially ordered sets. It is equivalent to (and hence can be used to prove) several beautiful theorems in combinatorics, including Hall's marriage theorem. One well-known corollary of Dilworth's theorem is a result of Erdős and Szekeres on sequences of real numbers: every sequence of rs+1 real numbers has an increasing subsequence of length r+1 or a decreasing subsequence of length s+1.
Contents
Definitions
A partially ordered set is a set \( S \) with a relation \( \le \) on \( S \) satisfying:
(1) \( a \le a \) for all \( a \in S\) (reflexivity)
(2) if \( a\le b \) and \( b\le a \), then \( a=b \) (antisymmetry)
(3) if \( a \le b \) and \( b \le c \), then \( a \le c \) (transitivity)
Note that for two given elements \( a \) and \( b\), it may not be the case that \( a \) and \( b \) are comparable, that is, \( a \le b \) or \( b \le a \). If this is true for all pairs \( a \) and \( b \) in \( S \), we say that \( S \) is totally ordered.
The set \(S\) of people in the world, with \( \le \) defined by "is a direct descendant of," is a partially ordered set. The set \( T \) of positive integers, with \( \le \) defined by "divides," is a partially ordered set.
Note that neither of these sets are totally ordered. A brother and sister are not comparable in \( S\), since they are not direct descendants of each other; and \( 2 \) and \( 3 \) are not comparable in \( T \), since neither divides the other.
There are two natural ways to define the width of a finite partially ordered set, which can be thought of as measuring how far away from totally ordered it is; the width is defined to be a certain positive integer, which is \( 1\) if and only if the set is totally ordered.
A chain in a partially ordered set is a subset of elements which are all comparable to each other. An antichain is a subset of elements, no two of which are comparable to each other.
(Definition 1) The width of a finite partially ordered set \( S\) is the minimum number of chains needed to cover \( S \), i.e. the minimum number of chains such that any element of \( S \) is in at least one of the chains.
(Definition 2) The width of a finite partially ordered set \( S \) is the maximum size of an antichain in \( S\).
As mentioned above, both of these definitions indicate in some sense how far away \( S\) is from being totally ordered. Note that a chain is a totally ordered subset of \( S \), so both definitions give width \( 1 \) for a totally ordered set.
Let \( S \) be the set of divisors of \( 30 \), with divisibility as the partial order. Then the following chains cover \( S \): \[ \{1,2,6,30\}, \{3,15\},\{5,10\}. \] And \( \{2,3,5\} \) is an antichain of length \( 3\). It is not immediately obvious, but the chain cover is minimal (though not unique), and the antichain is maximal (though not unique). So both definitions of width give \( 3\) for this partially ordered set.
Hasse diagrams
Partially ordered sets are often visualized via Hasse diagrams, which are arrangements of the elements of the set with edges connecting any two comparable elements that have no elements in between them, that is: \( x \) and \( y \) are connected if \( x \le y \) and there is no element \( z \ne x,y \) such that \( x\le z \le y \). The diagram puts \( y \) above \(x\) in this case.
Here are two typical examples.
Note that chains are represented in the diagram as series of points linked pairwise by edges. The widths of these sets are \( 3 \) and \( 5 \), respectively, using either definition, which corresponds to the "width" of the diagram. Note also that these particular diagrams are drawn so that reading horizontally on any given level produces an antichain. The diagram also helps clarify the fact we cited earlier, that the size of an antichain is at least as large as the number of chains required to cover the set.
Statement of Dilworth's Theorem and Mirsky's Theorem
So Dilworth's Theorem says that the two definitions of width given above coincide with each other.Dilworth's Theorem: Let \( S \) be a finite partially ordered set. The size of a maximal antichain equals the size of a minimal chain cover of \( S \).
As is often the case, when proving that two quantities are equal, the proof shows that they are less than or equal to each other. One of these inequalities is much easier than the other: Suppose the maximal antichain has length \( d \); then any chain cover has to have at least \( d \) chains, because no two members of the antichain can be in the same chain, but they each have to be in one of the chains because the chains cover.
The other half of the proof of the theorem is given at the end of this wiki. The following "dual" theorem is easier to prove:
Mirsky's Theorem: Let \( S \) be a finite partially ordered set. The size of a maximal chain in \( S\) equals the size of a minimal antichain cover of \( S \).
Proof of Mirsky's Theorem: Again, one half is easy. Suppose the maximal chain has length \( d \); then any antichain cover has at least \( d \) antichains, since no two elements of the chain can be in the same antichain.
Now define a function \( f \colon S \to {\mathbb N} \) by \( f(s) = \) the size of a maximal chain whose largest element is \( s \). Note that if \( f(s) = f(t) \), then \( s \) and \( t \) cannot be comparable (why?). So, for any positive integer \(n \), the set \( f^{-1}(n) \) of all the elements \( s\) such that \( f(s) = n \) is an antichain. If the maximal chain has length \( d \), then the sets \( f^{-1}(1), f^{-1}(2), \ldots, f^{-1}(d) \) form an antichain cover of \( S \). It is not immediately obvious that these sets are all nonempty (though they are), but this shows that there is an antichain cover of size at most \( d \), which is the second half of the proof of the theorem. \( \square\)
One way to view the sets \( f^{-1}(k)\) constructed in the proof are as the horizontal "strips" of the Hasse diagram, if it is drawn in such a way that a point is put on level \( k\) if the longest chain ending in that point has length \( k \). It is straightforward to check that the two Hasse diagrams in the above examples are constructed in that way.
Let \( X \) be a set of subsets of \( \{1,2,3,4\} \) such that no element of \( X \) is completely contained in any other element of \( X \): that is, for any two distinct subsets \( A,B \in X\), \( A \nsubseteq B \) and \( B \nsubseteq A \).
What is the maximum possible value of \( |X| \)?
(Bonus: generalize to \( \{1,2,\ldots,n\} \).)
Applications
Erdős-Szekeres Theorem: A sequence of at least \( rs+1 \) real numbers has an increasing subsequence of length \( r+1 \) or a decreasing subsequence of length \( s+1 \).
For instance: any sequence of \( 7 \) real numbers has an increasing subsequence of length \( 4 \) or a decreasing subsequence of length \( 3 \). Or: any sequence of \( 101 \) real numbers has a monotone subsequence of length \( 11 \).
Proof: Suppose we are given a sequence of \( rs+1 \) real numbers. Define an ordering \( \preceq \) on this sequence by \( a_i \preceq a_j \) if \(i \le j \) and \( a_i \le a_j \). A chain with respect to this ordering is just an increasing subsequence, and an antichain is a decreasing subsequence. Suppose there is no antichain of length \( s+1 \). Then the width of the set is at most \( s \), so by Dilworth's Theorem the set can be covered by \( \le s \) chains. If all of these chains have length \( \le r \), then the set has at most \( rs \) elements, which is impossible. So at least one of the chains has length \( \ge r+1 \), so there is an increasing subsequence of length \( r+1 \). \( \square\)
This is a special case of a general fact about partially ordered sets, which follows from Dilworth's Theorem in exactly the same way:
A partially ordered set of size \( \ge rs+1 \) has a chain of length \( r+1 \) or an antichain of length \( s+1 \).
Let \( I_1, I_2, \ldots, I_{10} \) be intervals on the real number line. Suppose that no four of the intervals are disjoint. Show that there must be four intervals that share a common point.
Solution: Define a partial order by \( I \le I' \) if and only if \( I \) and \( I' \) are disjoint and \( I \) is to the left of \( I' \). The assumption is that there is no chain of length \( 3+1 \), so there must be an antichain of length \( 3 +1 \), which is a subset of four intervals, no two of which are disjoint. Hence they share a common point. (Why?)
Proof of Dilworth's Theorem
The easiest proof is by induction on the size of the set. Let \( d \) be the size of the largest antichain of \( S \). The proof will show that \(S \) can be covered by \( d\) chains. The base case is trivial. So suppose the result has been proven for all sets smaller than \( S \).
First, if no two elements of \( S\) are comparable, then \( S \) itself is an antichain and it can be covered by \( d=|S|\) chains each of length \( 1 \), so the result holds. Otherwise, consider the set of elements of \(S\) which are comparable to at least one other element of \(S.\) Let \(m\) be a minimal element of this set (\(m \le z\) for all comparable \(z\)), and choose \( M \) to be a maximal element of the nonempty subset \(\{x \in S | x \ne m, m \le x\}.\) Then \(M\) is clearly a maximal element (\(M \ge z\) for all comparable \(z\)). Let \( T = S - \{m,M\} \). If the largest antichain in \( T \) has size \( \le d-1 \), then \( T \) can be covered by \( d-1 \) chains, and so \( S \) can be covered by those plus the chain \( \{m,M\}\), and the result will be proven for \( S \).
So now suppose that the largest antichain in \( T \) has size \( d \) (it can't be larger because \( T \) is a subset of \( S \)). Call this antichain \( A\).
The idea of the rest of the proof is: picture the Hasse diagram for \( S \) where the largest antichain consists of a horizontal strip. Take everything below the strip and everything above the strip, use induction to cover these by chains, and then link the chains together by connecting them across the strip.
That is, construct the two sets \[ \begin{align} S^+ &= \{ x\in S\colon x \ge a \text{ for some } a \in A\} \\ S^- &= \{ x\in S\colon x \le a \text{ for some } a \in A\} \\ \end{align} \] Then \( S^+ \cup S^- \) must be all of \(S\), because if it weren't then \( A \) would not be a maximal antichain in \( S \). And \( S^+ \cap S^- = A \), because if \( x \) is in the intersection, then \(a \le x \le b \) for some elements \( a,b \in A \), so \( a \) and \( b\) are comparable by transitivity, so the only possibility is that \( a=b \) and they both equal \( x \).
Since \(m\) and \(M\) are not in \( A\), it must be the case that \( m \notin S^+ \) and \( M \notin S^- \), so both sets \( S^+ \) and \( S^- \) are strictly smaller than \( S \). The inductive hypothesis applies to both \( S^-\) and \( S^+ \), so they are both covered by \( d \) chains, each of which must contain exactly one element of \( A \). Call them \( C_a^-\) and \( C_a^+\). Now we can stitch together these covers to get a cover of all of \( S \), by the chains \( C_a^- \cup \{a\} \cup C_a^+\). This cover has \(d \) chains, so the result follows by induction. \(\square\)