Schwartz-Zippel Lemma

The Schwartz-Zippel lemma is a result that helps detect when a polynomial is identically zero. It is useful in graph theory and theoretical computer science.

Motivation

The motivation for the lemma is the following problem:

Given a polynomial \( f(x_1,x_2,\ldots,x_n)\) with coefficients in a field \(F,\) is \( f \) the zero polynomial?

Depending on how \(f\) is defined, it is sometimes easier to explicitly plug in values for the variables than it is to expand the polynomial completely and check that at least one of the coefficients is nonzero. For example, if \( f\) is a sum of products of a large number of terms (e.g. the determinant of a matrix), expanding the product and evaluating the coefficients has exponential complexity in the degree of \( f.\)

If evaluating at a certain set of values gives a nonzero answer, the polynomial is not the zero polynomial; but if the evaluation at a randomly chosen set of values gives \( 0, \) the polynomial may or may not be zero. The Schwartz-Zippel lemma gives an upper bound on the probability that the polynomial is zero in the latter case, which makes it quite useful for applications where a probabilistic test is sufficient.

Statement of the Lemma

Let \( F \) be a field. Let \( f(x_1,x_2,\ldots,x_n) \) be a polynomial of total degree \( d\), and suppose that \( f \) is not the zero polynomial. Let \( S \) be a finite subset of \( F.\) Let \( r_1,r_2,\ldots,r_n\) be chosen at random uniformly and independently from \( S.\) Then the probability that \( f(r_1,r_2,\ldots,r_n) = 0 \) is \( \le \frac{d}{|S|}.\)

Remarks:

(1) If \( n=1,\) then the result follows from the fact that a polynomial of degree \( d\) over a field has at most \( d\) roots in the field. \((\)This follows from the division algorithm for \( F[x]\) and the existence of unique factorization into irreducibles in \( F[x];\) note that this fact is not true if \( F\) is replaced by, say, the integers mod \( 8.)\) Equality holds in the inequality of the lemma if and only if \(f\) has \( d\) roots and \( S \) contains all of them. The proof of the lemma, given in the last section, is an induction on \( n,\) so this argument for \( n=1\) establishes the base case.

(2) Repeating the process of choosing elements of \( S\), say \(k\) times, the probability that \( f\) evaluates to \( 0 \) every time is \( \le \left( \frac{d}{|S|}\right)^k.\) If \( d <|S| \) and \( k \) is large enough, this probability becomes vanishingly small. That is the point: if the process is repeated enough times--say, so that the probability that a nonzero polynomial will evaluate to zero every time becomes less than \( \frac1{10^6}\)--and the given polynomial evaluates to zero every time, then we are quite confident that the polynomial is identically zero. This is not a deterministic algorithm--it is of course possible that \( f\) is not identically zero and the process happened to make a one-in-a-million set of choices--but it is fast and practical.

(3) There are other, similar versions of the lemma. For instance, if each \( r_i \) is chosen from a subset \( S_i\) of \( F,\) and the degree of \( f \) as a polynomial in \( x_i \) is \( d_i,\) then another form of the lemma states that the probability that \( f(r_1,r_2,\ldots,r_n)=0 \) is at most \( \sum\limits_{i=1}^n \dfrac{d_i}{|S_i|}.\)

(4) If \( S\) is a finite field, there is the added wrinkle that some polynomials which are not the zero polynomial nevertheless evaluate to \( 0 \) everywhere. For instance, Fermat's little theorem implies that \( x^p-x\) is one such polynomial in \( {\mathbb F}_p[x].\) For polynomials like this, Schwartz-Zippel cannot be expected to give any meaningful information, and indeed this is because the degree \( d=p \) is \( \ge |S| \) for any subset \( S\) of \( {\mathbb F}_p.\) That is, Schwartz-Zippel is only useful when the degree d of the polynomial is smaller than the size of the field \(F.\)

Let \( F = {\mathbb F}_3,\) the integers mod \( 3.\) Let \( f(x,y) = x(x+y). \) Let \( S = F.\) Then Schwartz-Zippel says that the probability that \( (r,s) \) is a zero of \( f\) is \( \le \frac23.\) Of the nine ordered pairs \( (r,s) \) to consider, it turns out that five of them are zeroes: \( (0,0), (0,1), (0,2), (1,2), (2,1).\) Indeed, \( \frac59 \le \frac23.\) So Schwartz-Zippel gives a nontrivial upper bound on the number of zeroes of a nonzero polynomial; for a degree-2 polynomial over \( {\mathbb F}_3,\) there are at most \( 6.\)

Application to Graph Theory

Let \( G \) be a graph with an even number of vertices, say \( 2n.\) A perfect matching on \( G \) is a choice of \( n\) edges such that every vertex of \( G \) is an endpoint of exactly one of the edges.

The red edges are a perfect matching on this graph.

The natural computational problems about perfect matchings are as follows:

Given a graph \( G,\) how efficiently can we
(1) determine whether there is a perfect matching on \( G?\)
(2) find a perfect matching if one exists?

This section will describe a probabilistic algorithm that solves problem (1) using the Schwartz-Zippel lemma. The polynomial that governs the existence of a perfect matching turns out to be the determinant of a certain matrix defined below.

Let \( G \) be a graph with vertices \( v_1, v_2, \ldots, v_{2n}.\) Let \( A \) be the \( 2n \times 2n \) matrix with polynomial entries defined as follows: the \( ij \) entry is \[ A_{ij} = \begin{cases} x_{ij} &\text{if } v_i \text{ and } v_j \text{ share an edge and } i < j \\ -x_{ji} &\text{if } v_i \text{ and } v_j \text{ share an edge and } i > j \\ 0&\text{otherwise}. \end{cases} \] Then \( A \) is called the Tutte matrix of \( G.\)

Note that the determinant of the Tutte matrix is a polynomial in the variables \( x_{ij}.\)

Let \(G \) be a square, with vertices labeled \(v_1,v_2,v_3,v_4\) going around the square. Then the Tutte matrix of \( G \) is \[ \begin{pmatrix} 0&x_{12}&0&x_{14} \\ -x_{12}&0&x_{23}&0 \\ 0&-x_{23}&0&x_{34} \\ -x_{14}&0&-x_{34}&0 \end{pmatrix}. \]

(Tutte, 1947) There is a perfect matching on \( G \) if and only if the determinant of the Tutte matrix is not the zero polynomial.

Here is an outline of the proof. The details are left to the interested reader.

First, if there is a perfect matching on \( G,\) plug in \( x_{ij} = 1 \) if \(v_i\) and \( v_j\) are matched, and \( x_{ij} = 0\) otherwise. It is not hard to show that the determinant evaluates to \( \pm 1 \) for those values of the variables, so it cannot be the zero polynomial. On the other hand, the proof that a perfect matching exists if the determinant is nonzero proceeds by writing the determinant as a signed sum of products of the entries and pairing off terms that cancel, until the only terms left correspond to groupings of the vertices into paths of even length that start and end at the same point. Any such path admits a perfect matching. \(_\square\)

The determinant of the Tutte matrix of the square above is \[ x_{12}^2x_{34}^2 + x_{14}^2x_{23}^2 + 2x_{12}x_{23}x_{34}x_{14} \ne 0. \]

So this polynomial can be checked probabilistically to be nonzero by Schwartz-Zippel over \( {\mathbb F}_p\) \((\)one can check that if it is nonzero over the rationals, it is nonzero over any finite field, because it contains a monomial with coefficient \( 1).\) This solves problem (1).

For problem (2), the procedure in (1) can be converted into an inefficient algorithm: start with an edge, check whether the graph with that edge removed has a perfect matching (using Tutte matrices); if not, then that edge is part of any matching, so write it down and then delete those two vertices and all the associated edges and keep going; if so, then delete that edge and keep going.

This is obviously going to be very slow if the graph has a lot of edges. There is a version of this idea that is efficiently parallelized, but the details are beyond the scope of this wiki.

Proof of the Lemma

As discussed above, the proof is by induction on \( n,\) with the base case \( n=1\) having been established.

Now suppose the result is true for \( n-1\) variables. Write \( f\) as a polynomial in the first variable: \[ f(x_1,\ldots,x_n) = x_1^k c_k(x_2,\ldots,x_n) + x_1^{k-1} c_{k-1}(x_2,\ldots,x_n) + \cdots + c_0(x_2,\ldots,x_n), \] with \( c_k(x_2,\ldots,x_n)\) not the zero polynomial.

By the inductive hypothesis, the probability that \( c_k(r_2,\ldots,r_n) =0 \) for a randomly uniformly independently chosen set of \( r_i \in S \) is \( \le \frac{d-k}{|S|}.\) If it is not zero, the polynomial \( f(x_1,r_2,r_3,\ldots,r_n) \) will be nonzero, since the coefficient of \( x_1^k \) is nonzero. By the base case, the probability that plugging in \( r_1 \) will give \( 0 \) for a random uniform choice of \( r_1 \in S \) is \( \le \frac{k}{|S|}. \)

To summarize, if \( f(r_1,r_2,\ldots,r_n) =0,\) then either \( c_k(r_2,\ldots,r_n) = 0,\) or the nonzero degree-\( k\) single-variable polynomial \( f(x_1,r_2,r_3,\ldots,r_n) \) evaluates to \( 0 \) at \( r_1.\) The probability of the first case is \( \le \frac{d-k}{|S|} \) and the probability of the second case is \( \le \frac{k}{|S|} \cdot p,\) where \( p\) is the probability that \( c_k(r_2,\ldots,r_n)\) is nonzero. So the probability that \( f(r_1,r_2,\ldots,r_n)=0\) is \[ \le \frac{d-k}{|S|} + \frac{k}{|S|} \cdot p \le \frac{d-k}{|S|} + \frac{k}{|S|} = \frac{d}{|S|}, \] as desired. \(_\square\)

Contents