The code accompanying this post is available here.
The Schwartz-Zippel lemma is an important result in real algebraic geometry. It bounds the probability that a polynomial evaluates to zero on a random point. It provides a probabilistic method to solve the problem of deciding whether a polynomial is identically zero, or, equivalently, to determine if two polynomials are identical.
Why would we want an algorithm to do this? If you've looked at polynomials in school, it's likely pretty obvious that two polynomials are identical -- just look at them! But, polynomials do not always come in a nice form like that: they may be described as an arithmetic circuit, or they may be factored and expanding may be too expensive. This problem, called polynomial identity testing, or PIT, is a fundamental problem in computer science, and a bunch of interesting questions and applications arise from it (see Wikipedia for some).
Let \(P(x_1, \ldots, x_n)\) be a polynomial in \(n\) variables with coefficients in an integral domain \(F\). Let \(S \subseteq F\) be a finite set. Choose coordinates \(x_1, \ldots, x_n\) from \(S\) independently and uniformly at random. The Schwartz-Zippel lemma states that if \(P\) is not identically zero, then the probability that \(P(x_1, \ldots, x_n) = 0\) on a random point \((x_1, \ldots, x_n) \in S\) is at most \(\frac{d}{|S|}\), where \(d\) is the total degree of \(P\) (the maximum degree of any monomial) and \(|S|\) is the size of the set \(S\).
As an example, suppose we fix the integral domain \(F\) to be \(\mathbb{Z}\), and we consider the polynomial \(P(x) = x^2 - 1\). The total degree of \(P\) is 2, and let the set \(S\) be \(\{0, 1, 2, 3\}\). Then, the probability that \(P(x) = 0\) on a random point \(x \in S\) is at most \(\frac{2}{4} = \frac{1}{2}\). Importantly, this upper bound depends only on \(|S|\), not on which elements make up the set \(S\).
A major point of confusion I had when learning this lemma is the direction of the implication here. The lemma tells you: given a non-zero polynomial, what is the probability that it evaluates to zero on a random point? It does not tell you: given a polynomial that evaluates to zero on a random point, what is the probability that it is identically zero? This is a subtle point, especially because I previously said that the lemma provides a probabilistic test for checking if a polynomial is identically zero. So, how can we apply the lemma to test if a polynomial is identically zero if it assumes the polynomial is non-zero?
The algorithm is simple. Sample the coordinates \(x_1, \ldots, x_n\) from \(S\) independently and uniformly at random. If \(P(x_1, \ldots, x_n) \neq 0\), then return "NONZERO". If \(P(x_1, \ldots, x_n) = 0\), then return "ZERO".
Observe that if \(P\) is non-zero on any point, then it is certainly not identically zero, so the algorithm returns "NONZERO". In the second case, if \(P\) evaluates to zero on a random point, then it is identically zero, so the algorithm returns "ZERO". But -- and this is where the lemma comes in -- this step may be wrong! If \(P\) is not zero, then, with the probability bound given by the lemma, it will evaluate to zero on that random point, and so the algorithm will erroneously return "ZERO". Hence, we can say that the algorithm returns "ZERO" with an error bound of \(\frac{d}{|S|}\). This also allows us to bound the probability that the algorithm is incorrect overall, by conditioning on the event that \(P\) is non-zero. Namely: \[ \begin{aligned} &\mathbb{P}[\text{wrong}] \\ &= \mathbb{P}[\text{wrong} \mid P \text{ is non-zero}] \, \mathbb{P}[P \text{ is non-zero}] + \mathbb{P}[\text{wrong} \mid P \text{ is zero}] \, \mathbb{P}[P \text{ is zero}] \\ &\leq \frac{d}{|S|} \, \mathbb{P}[P \text{ is non-zero}] + 0 \, \mathbb{P}[P \text{ is zero}] \\ &\leq \frac{d}{|S|} \end{aligned} \] Above, we used the fact that if \(P\) is zero, then the algorithm will always return "ZERO" without error.
This bound is quite good, as, for example, we can choose \(S\) to be large enough to make the probability of error arbitrarily small. Another way is to run the algorithm multiple times. Indeed, if we run the algorithm \(k\) times, each time independently sampling a new random point, then the probability of error is at most \((\frac{d}{|S|})^k\).
You can see my implementation of the algorithm here. It's only a few lines of code. A key fact to note about the implementation is that I don't convert the input to a SymPy Poly object, because SymPy tries to simplify the polynomial, and may just simplify it to zero, which doesn't help with testing the algorithm. So, instead, I keep the polynomial as a SymPy Expr object, and I walk the expression tree to compute a bound on the total degree of the polynomial. Or, you can pass your own total degree bound if you have some additional information.
Many problems can be formulated as polynomial identity testing. An interesting such example is the problem of determining whether a graph has a perfect matching. For a graph \(G = (V, E)\), we can construct a matrix called the Tutte matrix of \(G\), which is a square matrix \(T\) of size \(|V| \times |V|\) with entries given by: \[ T_{i,j} = \begin{cases} x_{i,j} & \text{if } (i,j) \in E \text{ and } i < j \\ -x_{j,i} & \text{if } (i,j) \in E \text{ and } i > j \\ 0 & \text{otherwise} \end{cases} \] where \(x_{i,j}\) are variables. The determinant of this matrix is a polynomial in the variables \(x_{i,j}\), of degree \(|V|\). A theorem of Tutte states that \(G\) has a perfect matching if and only if the determinant of the Tutte matrix is not identically zero. We can do this check with the algorithm we described above.
Now, if implementing this test, we could construct the Tutte matrix and the determinant symbolically, e.g., using a SymPy Matrix object. However, based on the size of the matrix, the determinant expression might be quite large. Instead, we can create a \(|V| \times |V|\) matrix with random values drawn uniformly and independently from \(S\), representing the variables \(x_{i,j}\). Then, we construct the Tutte matrix using these values, and compute the determinant numerically. In my implementation, this is what I do.
Remarkably, you can use polynomial identity testing to test whether a number is prime. Namely, \(n\) is prime if and only if the polynomial \((1+z)^n - 1 - z^n = 0 \mod n\). Note that the variable of this polynomial is only \(z\), but the degree of the polynomial is \(n\), i.e., the number whose primality we want to check. In order to check if this zero, we could expand the polynomial and check if it is \(0 \mod n\), but this is clearly infeasible for even moderately large \(n\). So, we might want to use the Schwartz-Zippel lemma and work over the field \(\mathbb{Z}_n\). However, we don't even know if \(n\) is prime, so \(\mathbb{Z}_n\) may not even be a field. So, we have a catch-22: we need to know if \(n\) is prime to use the Schwartz-Zippel lemma, but we were trying to use the Schwartz-Zippel lemma to check if \(n\) is prime! The paper of Agrawal and Biswas (2003) shows how to get around this issue, by testing the divisibility of \((1+z)^n - 1 - z^n\) by a random polynomial. I plan to write a post about this soon.