Computer verification of Wiles' proof of Fermat's Last Theorem

Er is ook een Nederlandstalige versie beschikbaar.

On 5 July 2005, Prof. Dr. Jan Bergstra proposed a list of ten challenging research problems for computer science in [NOAG-ict] p. 52. The first problem of the list reads as follows:

"Formalize and verify by computer a proof of Fermat's Last Theorem, as proved by A. Wiles in 1995. This problem is in the spirit of a long and strong Dutch tradition."

This is indeed a challenge. I expect the problem to be solved in around fifty years, with a very wide margin. On this page, I try to explain some aspects of the problem, and to give relevant links to the literature and to other web pages.


What is Fermat's Last Theorem?

Around 1640, Pierre de Fermat wrote in the margin of his Latin translation of Diophantos' Arithmetica a remark that we can render in present terms as follows:

Theorem. For integer variables a, b, c, different from 0, and n > 2, the equation an + bn = cn has no solution.

As an aside, Diophantos' discussion was about the case n=2. Then there are infinitely many solutions, e.g. 42 + 32 = 52 and 122 + 52 = 132.

After his death, Fermat's marginal notes were published by his son. It is likely that Fermat's remark of having a beautiful proof was based on a mistake, cf. [Struik]. According to Stevens [CSS], Fermat did have a proof for the case n = 4. The case n = 3 was proved by Leonhard Euler around 1760. In 1825, the case n=5 was proved by Legendre and Dirichlet. In 1847, Kummer proved all cases where n is a so-called regular prime (the smallest nonregular prime is 37). See [Edwards] for the proofs for n = 3 and 4 , and for Kummer's theory. In 1976, Fermat's Last Theorem was valid for all cases with n less than a million [Wagstaff].

In 1993, Andrew Wiles announced a proof a Fermat's Last Theorem. It turned out that the proof still had a gap, but Wiles repaired this gap together with Taylor in 1995. This work appeared in [Wiles, TaWi]. Currently, the main source is the book [CSS]. Much more readable is [vdP]. Indeed, for a mathematician, this book almost reads as well as Dan Brown's Da Vinci Code.


Back to the aspects.

Why bother?

In some sense, Fermat's Last Theorem is not much more than a curiosity. Yet it played an important role in the history of mathematics as a touchstone for mathematical theories. It is a part of number theory, a branch of mathematics that before 1980 was regarded as useless for society but good for mathematician to learn the trade.

Nowadays, number theory is highly relevant because of its applications in cryptography, the science of secure transmission of vulnerable and valuable data, e.g. on the internet.

Wiles' proof is more important than Fermat's Last Theorem itself. This is because Wiles proves the theorem by proving a weak version of the so-called Modularity Conjecture of Shimura, Taniyama, and Weil. This conjecture, developed between 1957 and 1967, asserts that every elliptic curve with rational coefficients is a modular curve. Such a result is important since it yields a new connection between different branches of mathematics. As yet, I do not know, however, of cryptological applications of the Modularity Conjecture.
Back to the aspects.

How does Wiles prove Fermat's Last Theorem?

I am following the sketch of Stevens (or Serre) in [CSS]. The proof consists of an elementary part, theorems 1 up to 4 below, and a part that requires much deeper mathematics. I give the proofs of theorems 1 and 4. See e.g. [Edwards] for the proofs of theorems 2 and 3. In theorems 5, 6, and 7, many concepts occur that I cannot explain. These are set in italics. First, two definitions, probably known, followed by a convention that serves to shorten the sentences.

A prime number is an integer number > 1 that has no other divisors than 1 and itself.
Two integer numbers are called coprime if they have greatest common divisor 1 (i.e. they have no divisors > 1).

A triple of nonzero integers (a,b,c) is called a Fermat triple for exponent n if an + bn = cn and n > 2. We want to prove that Fermat triples do not exist. This is done by assuming the existence of a Fermat triple for exponent n, and deriving a contradiction from this assumption.

Theorem 1. If there is a Fermat triple for exponent n, there is a Fermat triple (a,b,c) for exponent n with a and b coprime.

Proof. Let (x,y,z) be some Fermat triple for exponent n. Let d be the greatest common divisor of x and y. Since xn + yn = zn, the number dn divides zn; therefore d divides z. Put a = x/d, b = y/d, c = z/d. Then (a,b,c) is a Fermat triple for exponent n with a and b coprime.

Theorem 2. There is no Fermat triple for exponent 4.

Theorem 3. There is no Fermat triple for exponent 3.

Theorem 4. If there is a Fermat triple for an exponent > 4, then there is a Fermat triple for an exponent which is a prime number > 4.

Proof. Let (x, y, z) be some Fermat triple for an exponent n > 4. First assume that n is not a power of 2. Since n > 2, it has a divisor p which is a prime number > 2. Write n = qp. Then we have (xq)p + (yq)p = (zq)p. Therefore (xq,yq,zq) is a Fermat triple for exponent p. Theorem 3 implies that p > 3. Since p is prime, it is > 4. Next assume that n is a power of 2. Since n > 2, we can write n = 4q. In the same way as above, we get that (xq,yq,zq) is a Fermat triple for exponent 4, contradicting Theorem 2.

Theorem 5 (Gerhart Frey 1986, Jean Pierre Serre, 1987) Let (a,b,c) be a Fermat triple for an exponent which is prime number p > 4. Assume that a and b are coprime, that b is even, and that a-3 is a multiple of 4. Then the projective curve given by the equation y2 = x(x-ap)(x+bp) is elliptic and semistable and remarkable.

The word remarkable here is an abbreviation of properties of the twodimensional representation of the Galois group of the field of the algebraic numbers that is associated to the curve. See [CSS] for the details.

Theorem 6 (Ribet, 1990) A remarkable elliptic curve cannot be modular.

Theorem 7 (Wiles and Taylor, 1995) Every semistable elliptic curve is modular.

Remark. Since 1967, the insiders suspect that every elliptic curve is modular. This is the so-called Taniyama-Shimura-Weil conjecture. Wiles (and Taylor) thus proved a special case of this conjecture. The full conjecture has finally been proven in [BCDT].

Summary. In view of theorems 1, 2, 3, 4, we may assume that (a,b,c) is a Fermat triple for an exponent which is a prime number p > 4, and that a and b are coprime. It follows that a and b are not both even. If a and b are both odd, c is even. It follows that precisely one of the triple is even. By permuting the three numbers and optionally giving some an opposite sign, we can arrange that b is even while a and c are odd. Then a-1 or a-3 is a multiple of 4. If a-1 is a multiple of 4, we can arrange that a-3 is a multiple of 4 by giving all three opposite signs. Theorem 5 then gives us a remarkable, semistable, elliptic curve. This curve is not modular because of Theorem 6, but it is modular because of Theorem 7. This is a contradiction. Therefore Fermat triples do not exist. This proves Fermat's Last Theorem.
Back to the aspects.

A proof is a proof. What more do we need?

Every mathematician occasionally makes a mistake in a proof. Many mathematicians occasionally find mistakes in published proofs of other people. Hence a continuous effort to clarify and simplify all proofs of important theorems. On the other hand, the late Van Lint from Eindhoven University once said that one does not verify a proof, since a good mathematician recognizes an erroneous theorem by smelling (a poor translation for the Dutch expression).

So the question is whether we can safely believe Wiles' proof, e.g. see [vdP] Lecture V. On a conference in Boston in 1995, the insiders in the field scrutinized Wiles' proof. They concluded that it was correct, but when I try to read the proceedings [CSS] of this meeting, I continuously stumble over steps that I cannot follow. It is known that even the insiders can be wrong. So there is room for a more objective validation of this proof.
Back to the aspects.

How to verify a proof by computer?

In the Netherlands, computer verification of mathematical proofs started in 1967 in the Automath project of N.G. de Bruijn, cf. [NGdV]. Landau's book "Grundlagen der Analyse" was verified completely in the doctoral thesis of L.S. van Benthem Jutting in 1977.

In the Computational Logic Project of R.S. Boyer and J S. Moore, the theorem provers THM and NQTHM were used to verify several mathematical theories. Among them: every natural number is a product of prime numbers essentially in a unique way (1972), Gauss' Law of Quadratic Reciprocity (1984), and Goedel's Incompleteness Theorem (1986).

In 1973, the Mizar project started with the aim of formalizing and verifying all mathematics.

By now, there are computer verifications of Theorem 2 and Theorem 3, i.e. Fermat's Last Theorem for exponents 4 and 3. This is done for exponent 4 by Delahaye and Mayero by means of the theorem prover Coq (see [BeC]), and for both exponents by Roelof Oosterhuis with the theorem prover Isabelle.

Indeed, there are several different theorem provers, computer applications that can be used to formalize mathematical theories and to verify their proofs. One first has to encode the theory into a language that the proof tool can interpret. In a second stage, one has to inform the tool how validity of the theorem is proved. In fact, the word "theorem prover" is inadequate. The tool is almost never able to prove a theorem independently. It can however verify the validity of the arguments a human user offers to it. Working with a theorem prover feels like convincing the prover.

One distinguishes different quality criteria for theorem provers. A prover is called sound if it does not allow invalid theorems to be proved. The intelligence of the prover is a measure of its ability to find proofs or proof steps independently. A third important criterion is user friendliness. This covers several aspects. For mathemematicians, the language of the prover should be easy to learn and indeed close to the mathematical language they are acquainted with. On the other hand, it should be relatively easy to guide the prover with correct arguments towards correct conclusions. Thirdly, when the prover is unable to accept a step, it should give a clear indication of the reason of failure. According to an estimate of J Moore, in more than half of the cases, his theorem prover was asked to prove theorems that were actually invalid. Every sound prover rejects invalid or insufficient arguments, but user friendliness is also about how to reject them.

Every prover has a type system to distinguish e.g. real numbers from other things like functions or vector spaces. Rigidity and expressiveness of the type system form quality attributes of theorem provers, but what is best depends on the kind of proofs you need to verify. One needs a rich type system to express complicated matters. Theorem provers with a rich type system typically have less intelligence. The more possibilities you have, the harder it is to choose between them. The theorem prover NQTHM has a poor type system but is very ingeneous in using mathematical induction. The prover PVS has a richer type system, is much more user friendly, but its release before the current one was unsound.

The mathematics used in Wiles' proof of Fermat's Last Theorem is very complicated. The type system of NQTHM is definitely not rich enough to express all this in a convenient manner. I assume that the same holds for PVS. I do not know the provers Coq and Mizar good enough, but I think they are adequate to express the mathematics. I also expect them to be sound, but with less inbuilt intelligence than PVS.

Theorem provers should not be confused with computer algebra packages like Maple and Mathematica. Since the nineties, many mathematicians use such tools from Computer Algebra to investigate and manipulate their formulas. These tools usually do not offer soundness. Their manipulations require side conditions that cannot be checked by the tool. To apply such a tool in computer verification, one needs a form of integration of the tool in the prover. Indeed, I think that computer verification of Wiles' proof will be very difficult without an integrated computer algebra tool in the theorem prover.
Back to the aspects.

What guarantees are offered by a computer verification?

In principle, there are two things that can go wrong: the coding of the theory into the language of the prover, and unsoundness of the prover. Ultimately, computer verification of mathematical theories is not more than a considerable strengthening of the social process of acceptance of the result with its accompanying proof by the mathematical community.

Human verification will be needed to ensure that the theory under consideration is correctly encoded into the language of the prover. This is very well possible, when the language of the prover is reasonably readable and understandable. The point of unsoundness is even less problematic. An unsound release of a theorem prover will soon be replaced, hopefully by a sound one. It is difficult and risky to make use of an unsoundness of a prover to validate unsound theorems.

There is a more fundamental ways to forestall the unsoundness problem. In principle, a theorem prover should deliver a certificate for each proof. This is a file or set of files that contains the assertion that is proved, with all definitions and all axioms that are used in the proof, and with a witness of the proof that can be verified by a certifier. A certifier is a much simpler tool than a theorem prover. It needs no intelligence and its soundness should be much easier to verify. Integration of computer algebra into a theorem prover, however, may give problems for the certificates or the certifiers.
Back to the aspects.

Does the problem satisfy Bergstra's requirement that it is easy to decide whether some proposed solution indeed solves the problem?

I regard the points raised in the previous item as minor, and therefore answer this question with yes.
Back to the aspects.

What is to be done? Is it possible? Within reach?

Computer verification requires a single tool for the whole project. Indeed, if some theorem is proved with one theorem prover and then transferred to a different prover, the risks of applying the theorem in a slightly different context would undo the whole purpose of the project. It is therefore important to choose a theorem prover that can support the whole project. This requires a very rich type system, and possibly also an integrated computer algebra tool. As far as I know, such a theorem prover is not yet available.

On the other hand, I do think that the challenge is doable, and that it will be done in the coming fifty years. It is definitely not within reach. The project will have to cover large parts of current pure mathematics. Its scope is comparable with the Bourbaki project, where between 1950 and 1970 almost all of pure mathematics got a new foundation by the efforts of an originally French group of mathematicians who published some 20 books under the pseudonym N. Bourbaki.

The Fermat-Wiles project could be adopted by the Mizar group. This group has been active for 32 years. They have developed many theories. I need to look into their work more seriously.

Let me draw a fanciful sketch. If I were to start the project, I would first investigate the limits of the type system by trying to prove the first ten pages of [EGA1]. This amounts to a proof of Yoneda's Lemma for big categories. Yoneda's Lemma itself has been treated in Mizar and Coq, but I don't know about the applicability of these versions to big categories. I would not start building a new theorem prover with integrated computer algebra facilities. Yet, the next part of the project would be the treatment of complex function theory and modular forms, in particular Weierstrass p-function and Eisenstein series. If these preliminary theories have been built, we turn to elliptic curves over the rational numbers with good and bad reduction, and the action of the Galois group on the Tate module. All this should be doable within ten years. Still, we are only in the foothills, real mountaineering is yet to come. Do we need Deligne's proof of the Weil conjectures?
To be corrected, and to be continued....
Back to the aspects.

Are there people working on the problem as posed?

At this moment, one can say that all groups working on the formalization of mathematics work in this direction, as well as all groups who try to improve theorem provers and computer algebra systems. World wide, this is quite a large community. When I focus on the verification of Wiles' proof of Fermat's Theorem, however, the simple answer is no.
Back to the aspects.

What is the strong Dutch tradition alluded to?

As mentioned above, computer verification of mathematical proofs started in the Netherlands in 1967 in the Automath project of N.G. de Bruijn, cf. [NGdV]. Landau's book "Grundlagen der Analyse" was verified completely in the doctoral thesis of L.S. van Benthem Jutting in 1977. The current Dutch work on formalizing mathematics is centered in Nijmegen, see the sites of Barendregt and Geuvers.

A related field is computer verification of computer algorithms. This did not start in the Netherlands, but significant work has been done here. At least four PhD theses were written about it, by Wishnu Prasetya (Utrecht, 1995), Tanja Vos (Utrecht, 2000), Marieke Huisman (Nijmegen, 2001), Gao Hui (Groningen, 2005), see e.g. the sites of Swierstra (Utrecht), Jacobs (Nijmegen), and myself (Groningen).
Back to the aspects.

Is this a problem for computer science or for mathematics?

Fermat's Last Theorem as such is just Arithmetic (number theory). Due to the work of Frey, Serre, and Ribet, the problem is transferred to Algebraic Geometry and Analysis (elliptic curves and modular forms). Wiles' proof belongs to these fields. Computer verification of mathematical proofs belongs to Computational Logic, a joint offspring of Mathematical Logic and Computer Science. Computer algebra or symbolic computing is also a new field somewhere in between mathematics and compter science.

Since Wiles' proof uses many results from various branches of mathematics, computer verification may need semantic database technology to search for relevant theorems. The total body of knowledge to be encoded is huge. The project may therefore be regarded as an exercise in software engineering. Indeed, one can say that it is a matter of refactoring half of pure mathematics. Information systems, databases, and software engineering belong to computer science. Yet, all in all, I tend to regard the problem discussed in this page as a mathematical one, but mathematicians are generally more interested in the development of new mathematics than in the formalization and verification of "old" mathematics.
Back to the aspects.

Would it not be easier to find a simpler proof of Fermat's Last Theorem, formalize that, and verify that by computer?

Eventually a simpler proof of Fermat's Last Theorem may be found. That is a challenge for mathematicians, not for computer scientists. Of course, were it to happen, then we would preferably treat the simpler proof first.

Yet, for me, Bergstra's problem is to formalise and verify the theorem of Wiles that every semistable elliptic curve is modular. Treatment of this theorem requires formalisation of large parts of the theories of elliptic curves and of modular curves. Of course, the mathematicians will try to smoothen Wiles' proof as much as possible, but whether this will yield sizeable simplifications remains to be seen.
Back to the aspects.

Would not it be preferable to have the computer itself interpret, formalize, and verify Wiles' proof?

Well, perhaps, but this is an even bigger challenge, and perhaps more for the field of artificial intelligence than for computer science proper.
Back to the aspects.

Who am I to report on this?

You may have a look at my home page. I did my PhD thesis in 1975 in the field of Algebraic Geometry (singularities and actions of linear algebraic groups). In the eighties, I developed special purpose computer algebra programs to classify upper triangular matrices. While doing this I moved to Computer Science. In the nineties, I started to use the theorem prover NQTHM to verify the correctness of concurrent algorithms. The last years, I turned to the theorem prover PVS to develop and prove the semantics of a refinement theory for concurrency. So, I am not a specialist in the field of this virtuel project, but I have experience in several related fields.
Back to the top of the page.

I do not strive for completeness, but I want to make this page as readable, representative, correct, and unbiased as possible. Therefore, all questions and comments are welcome.

Copyright © 2005 Wim H. Hesselink
Last modified: Thu Apr 3 17:11:58 CEST 2008