Side note: The organization that maintains Lean is a "Focused Research Organization", which is a new model for running a science/discovery based nonprofit. This might be useful knowledge for founder types who are interested in research. For more information, see: https://www.convergentresearch.org
The concept trying new science orgs is noble, but this is the typical Schmidt BS of saying every previous academic consortia is totally incompetent and I'm the only one that can inject the magic sauce of focus and coordination.
Unfortunately being noble or self righteous or whatever emotion you choose has nothing to do with it. If there is a pool of grant money available only to “Focused Research Organizations,” and you want some of it for your work, then you open one and do your work under that umbrella. Academic institutions themselves do this all the time. It looks politically and morally sketchy, and maybe it often is, but it’s the way it works.
To me, it seems like coming up with something more coordinated than a consortium and more flexible than a single lab or a research corporation funded by multiple universities makes sense.
It's probably a narrow set of problems with the right set of constraints and scale for this to be a win.
Having an organization maintain a software tool seems pretty unsurprising. There’s a well-defined problem with easily visible deliverables, relatively little research risk, and small organizations routinely maintain software tools all the time. Whereas broader research is full of risk and requires funders be enormously patient and willing to fund crazy ideas that don’t make sense.
Hmm. I don't know very much about Lean, and it definitely feels smaller in scope and coordination risk than the kinds of things that would generally benefit from this.
(OTOH, within the community they're effectively trying to build a massive, modern Principia Mathematica, so maybe they would...)
> Whereas broader research is full of risk and requires funders be enormously patient and willing to fund crazy ideas that don’t make sense.
Yah. I'm not a researcher, but I keep ending up tangentially involved in research communities. I've seen university labs, loose research networks, loose consortia funding research centers, FFRDC, etc.
What I’ve noticed is that a lot of these consortia or networks struggle to deliver anything cohesive. There's too many stakeholders, limited bandwidth, and nobody quite empowered to say “we’re building this.”
In the cases where there’s a clearly scoped, tractable problem that’s bigger than what a single lab can handle, and a group of stakeholders agrees it’s worth a visionary push, something like an FRO might make a lot of sense.
This is an incredibly bad take on a hard social problem which is hard for reasons that are well understood.
Scientific research is often not immediately applicable, but can still be valuable. The number of people that can tell you if it's valuable are small, and as our scientific knowledge improves, the number of people who know what's going on shrinks and shrinks.
Separately, it's possible to spend many years researching something, and have very little to show for it. The scientists in that situation also want some kind of assurance that they will be able to pay their bills.
Between the high rate of failure, and conflicts of interest, and inscrutability of the research topics. It's very hard to efficiently fund science, and all the current ways of doing it are far from optimal. There is waste, there is grift, there is politics. Any improvement here is welcome, and decreasing the dollar cost per scientific discovery is more important than the research itself in any single field.
I love that they want to formalize this proof, and I understand why they're using Lean.
But part of me feels like if they are going to spend the massive effort to formalize Fermat's Last Theorem it would be better to use a language where quotient types aren't kind of a hack.
Lean introduces an extra axiom as a kind of cheat code to make quotients work. That makes it nicer from a softer dev perspective but IMO less nice from a mathematical perspective.
Yup, Lean's quotient induction breaks subject reduction, which is an important type-theoretic principle. It means you can write a Lean development where t has type A, and t reduces (i.e. computes, as part of the Lean kernel) to u, but u doesn't have type A, and may not even type check. (See https://github.com/digama0/lean-type-theory/releases/downloa... Sec. 3.1 for a detailed discussion of this issue.) This is obviously quite bad, and it goes far beyond the usual drawback of adding axioms to a theory, including the quotient axiom. (Namely, the loss of canonicity.)
> Throughout this section, we use A≈B to mean that A and B are essentially equal, in the sense that B is a suitable approximation of A in some sense that we will formalize in a later section. The reader may feel free to assume A=B when verifying estimates, even though A=B is generally false.
How much does this leak into typical math-related proofs?
If someone would create LeanQ where quotient types are built in nicely, how much work would it be to port the Fermat project from Lean to LeanQ?
AIUI, this cannot lead to inconsistency or a "wrong" proof. So if a proof checks out in Lean that's good enough, you might not even need a separate LeanQ.
Does this mean that most of the proofs in Lean and LeanQ would look exactly the same, it's just that the proofs of some technical low-level lemmas around quotient types (which I guess mathematicians are not really interested in anyway) look different?
For example, if I want to prove that a+b=b+a, I wouldn't care if I'm directly in peano arithmetic or just have a construction of the peano axioms in ZFC, as in both cases the proofs would be identical (some axioms in PA would be lemmas in ZFC).
If that's the case with quotients, I wonder why it's such a big deal for some.
AFAICT, this issue only comes up if you form the quotient of a proposition by a relation. But there is no point in doing that (all the proofs of a proposition are already equal!) so it's not an issue in practice and it wouldn't be difficult to fix.
However, Lean's SR is broken in other ways which do show up in practice: definitional proof irrelevance is incompatible with Lean's computation rule for Acc. That one is much harder to fix.
It’s less nice from a computational perspective. But almost no mathematicians care about computation – they care about semantics, and there is no sense in which Lean’s quotients are a hack from a semantic perspective. I’m sure it makes computer scientists unhappy, but this isn’t a project for computer scientists.
This article is about a team of mathematicians digitizing a proof that will take several years to realize. Lean itself is about computation.
So all the libraries they build up will have these holes in them that make it harder to do things like treat two isomorphic objects as the same -- something mathematicians do implicitly on a daily basis.
You can probably get a long way in Lean with the soundness axiom. But what I don't know is what happens when you build up a decade of software libraries in a system that adds a lot of manual and cognitive overhead when you want to use them.
My gut instinct is that by cutting corners now, they're creating a form of technical debt that could possibly escalate quickly and force mathematicians to reformulate their tools in a nicer way.
This actually happens continuously throughout the history of math. Sometimes it leads to errors like the so-called Italian school of algebra. Sometimes it just leads to pauses while we go back and figure out what the objects we're working with actually are before we can make more progress.
Take all this with a grain of salt: I haven't worked with Lean so I don't know how much this crops up in practice, and I don't know how large Lean libraries are at this point. This is all gut feeling.
But my sense is that what you really want is to get the foundations right, then build abstraction layers on those foundations that are nicer to use. Lean tries to build a "good enough" foundation and historically the gap between what we know is correct and what is seen to be "good enough" tends to show itself sooner or later in math. If you are just working in natural language then you can just forget it was a problem as soon as a fix is found. If you're working in software, though, you'll likely need to do a major rewrite or refactoring.
> I haven't worked with Lean so I don't know how much this crops up in practice
It really doesn't. I've been using Lean and Mathlib for about five years now, and Fermat's Last Theorem is definitely not going to depend on the reduction properties of quotient types in the large scale.
Mathematical reasoning in Lean is almost universally done with rewriting, not reduction. People have found reduction based proofs (colloquially "heavy rfls") to be difficult to maintain. It exposes internal details of definitions. It's better to use the "public API" for mathematical definitions to be sure things can be refactored.
Really, quotients almost should never use the actual `Quot` type unless you have no better choice. In mathematics we like working with objects via universal properties ("public API"). A quotient type is any type that satisfies the universal property of a quotient. All `Quot` does is guarantee that quotients exist with reasonable computation properties, if we ever need them, and if we need those computation properties — which in the kind of math that goes into FLT we often don't. We don't even need `Quot` for Lean to have quotient types, since the classic construction of a set of equivalence classes works. (Though to prove that this construction is correct surely uses functional extensionality, which is proved using `Quot` in some way, but that's an implementation detail of `funext`.)
What’s mathematically questionable about the quotient soundness axiom? It’s justifiable metamathematically. What’s the real difference baking it into the proof kernel? I’d rather such independent properties be modeled as an axiom. The quotient automation I’m familiar with in other theorem provers is typically way more (untrusted!) machinery than just stating quot.sound.
Computation is the difference. In Lean, applying the universal property of the quotient (`Quotient.lift f Hf`) to an element that is of the form `Quotient.mk a` reduces to `f a`.
This rule is fine in itself, but the Lean developers were not sufficiently careful and allowed it to apply for quotients of propositions, where it interferes with the computation rules for proof irrelevance and ends up breaking subject reduction (SR is deeply linked to computation when you have dependent types!) [0]. It is not really a problem in practice though, since there is no point in quotienting a proposition.
By the time they are hoping to finish in 2029, I bet LLMs are capable of translating the proof from Lean into the alternate theorem proving language of your choice with only a small amount of human assistance.
If this does end up being the case, that translation becomes easy, then essentially all theorem proving efforts should be conducted in the language that is the easiest to work in. You can translate into the "mathematically superior" languages later.
In particular, note that a key lemma of crystalline cohomology rests on a mistake. Experts think that it is fixable by virtue that results have depended on it for a long time and no issue was found, but it is not fixed.
My understanding is that the proof doesn't exist in written form in its entirety.
Plus, Kevin Buzzard is a world expert with some ideas for how to better organize the proof. In general, formalization leads to new understanding about mathematics.
Something people outside of mathematics don't tend to appreciate is that mathematicians are usually thinking deeply about what we already know, and that work reveals new structures and connections that clarify existing knowledge. The new understanding reveals new gaps in understanding, which are filled in, and the process continues. It's not just about collecting verifiably true things.
Even if somehow the OpenAI algorithm could apply here, we'd get less value out of this whole formalization exercise than to have researchers methodically go through our best understanding of our best proof of FLT again.
"The International Mathematical Olympiad (IMO) is the World Championship Mathematics Competition for High School students", so not to undermine it but it's below university or graduate level.
Research level mathematics like this is as hard as it gets, and this proof is famously difficult: uses many branches of advanced mathematics, required thousands of pages of proofs, years of work.
Human readable (informal) proofs are full of gaps that all have to be traced back to axioms e.g. gaps that rely on shared intuition, background knowledge and other informal proofs.
It's somewhat like taking rough pseudo code (the informal proof, a mixture of maths and English) and translating that into a bullet-proof production app (the formal proof, in Lean), where you're going to have to specify every step precisely traced back to axioms, handle all the edge causes, fix incorrect assumptions, and fill in the missing parts that were assumed to be straightforward but might not be.
A major part is you also have to formalise all the proofs your informal proof relied on so everything is traced back to the initial axioms e.g. you can't just cite Pythagorus theorem, you have to formalise that too.
So it's an order of magnitude more difficult to write a formal proof compared to an informal one, and even when you have the informal proof it can teams many years of effort.
I’m almost certain this is ignorance on my part, but it seems like this would mean the proof is… possibly wrong? I mean if there are gaps and other informal proofs in there?
Kevin Buzzard told me that the worry that it might in fact be wrong is a huge motivator for him. I also once asked Serge Lang why so much mathematics is
correct (which surprised me coming from programming where everything has bugs), and he said; “people do a large number of consistency checks beyond what is in the published proofs, which makes the chances the claimed results are correct much, much higher.” Another related quote Bryan Birch told me once: “it is always a good idea to prove true theorems.”
> “people do a large number of consistency checks beyond what is in the published proofs, which makes the chances the claimed results are correct much, much higher.”
I imagine one bias is because formal verification is such a huge effort, you're only going to do it for really interesting and impactful proofs, which means the proofs that get formal verified will already have been reviewed and critiqued a lot, and will be less likely to have glaring critical errors.
> Kevin Buzzard told me that the worry that it might in fact be wrong is a huge motivator for him.
And then, when I raised concerns in Zulip about Lean's metaprogramming facilities being used to trick the pipeline into accepting false proofs, he said the opposite. He even emphasized that the formalization efforts are not for checking proof correctness, but for cataloguing truths we believe in.
This kind of equivocation turned me away from that community, to be honest. That was an extremely frustrating experience.
For what it's worth, I don't think that Kevin Buzzard is the person you should talk to if you are interested in proof assistant design. As far as I know, Buzzard does not consider himself to be an expert in type theory or in proof assistants, and claims to be a mere user.
I think it was Terrance Tao on the Lex Friedman podcast recently that said that there are very often little mistakes in big proofs, but they are almost always able to be patched around. Its like mathematicians' intuition is tracking some underlying reality and the actual formalization is flexible. Yes sometimes digging down into a small mistake leads to an unbridgeable gap and that route has to be abandoned, but uncannily often such issues have nearby solutions.
Also a lot of errors would be called "typos", not errors. Such as some edge cases missing in the theorem statement which technically makes the theorem false. As long as there's a similar theorem in the same spirit that can be proven, that's what the original was all along.
> I’m almost certain this is ignorance on my part, but it seems like this would mean the proof is… possibly wrong?
That's part of the motivation to formalise it. When a proof gets really long and complex, relies on lots of other complex proofs, and there's barely any single expert who has enough knowledge to understand all the branches of maths it covers, there's more chance there's a mistake.
There's a few examples here of errors/gaps/flaws found while formalizing proofs that were patched up:
My understanding is it's common to find a few problems that then need to be patched or worked around. It's a little like wanting to know if a huge codebase is bug-free or not: you might find some bugs if you formalized the code, but you can probably fix the bugs during the process because it's generally correct. There can be cases where it's not fixable though.
The level of rigor used in math if sometimes characterized as "sufficient to convince other mathematicians of correctness." So, yeah possibly, but not in a willy nilly way. It's not a proof sketch, it's a proof. It just isn't written in human language designed for communication.
The thing is though that MANY bugs slip through even the most thorough code reviews. As a security researcher, I can tell you there is literally no system out there that doesn't have such a bug in it.
The systems we deal with in software are massive compared with your typical mathematical framework though. But FLT is probably on similar scope.
I have little knowledge in this area, but my understanding is it's like this:
There is a pseudocode app that depends on a bunch of pseudocode libraries. They want to translate that pseudocode app into a real runnable app. They can do that, and it's a good amount of work, but reasonable. The problem is to get the app to run they also need to translate the hundreds or thousands of pseudocode libraries into actual libraries. Everything from OS APIs, networking libs, rendering libs, language standard libs all need ro be converted from specs and pseudocode to real code to actually run the app. And that's a ton of work.
No, some of the harder work has been done. Translating human-readable proofs into machine-readable ones is also very hard work and an area of active research.
This is a significantly harder problem than winning gold in IOM. A large part of it is figuring out how to represent some of the relevant ideas in Lean at all.
Formalizing Wiles' proof requires translating hundreds of pages of sophisticated mathematics with implicit reasoning steps into a precise logical framework, which is fundamentally different from the pattern-matching AI uses to solve competition problems.
All three claims of gold medal performance on IMO 2025 that I'm aware of solved the first 5 problems, that were designed to be solvable by application of standard techniques, but got stumped on the sixth problem that was a bit more unusual. So it does seem like state-of-the-art models solve competition problems by recognizing which kind of problem it is and applying a corresponding solution template. Which is not too different from human competitors exploiting common question patterns, but humans seem to be able to degrade more gracefully by falling back to a more explorative mode when none of the standard tricks seem to apply.
Proof assistant code is high reliability, there is no room to fudge it. This is perhaps the one place where you can really see how bad LLMs are when you care about reliability.
Why? Coding assistants tend to do even better in contexts where they have tools like type checkers and linters to verify their implementations. This area seems actually uniquely well suited to LLM usage.
When I asked experts on formal proofs a year ago, their intuition was that there isn't enough formal proofs out there for LLMs to be very good at the syntax.
It's, as far as I know, quite hard to teach an LLM things it doesn't know.
I see people on Zulip using Copilot to write Lean proofs, and they have some success, but the quality is really bad right now, creating long, unmaintainable proofs. New users get stuck, thinking they're 90% of the way to the end, but really the whole thing probably should be scrapped and they should start over.
It's a bit frustrating because, before Copilot, new users would come with proofs and you could spend some time helping them write better proofs and they'd learn things and gain skills, but now it's not clear that this is time well spent on my part. Copilot is not going to learn from my feedback.
Actually, some proofs take longer for the computer to verify than for a human (even an unskilled typist) to type out. Several hours to evaluate a two-page document isn't unusual. (I prefer to write optimised proofs, but it can take me weeks to find the correct abstractions for that.)
Are you talking about Lean 4? Lean 4 is usually pretty fast at verifying proofs and most proofs are created in interactive mode so are checked as they are typed.
A "two-page document" (perhaps 200-300 lines of code) would typically check in a matter of seconds to, at most, a few minutes. If it took hours, it would mean the file contains a pathological case that a developer would be expected to identify and fix. It is absolutely not a normal or "not unusual" occurrence.
"In constructive mathematics, proof by contradiction, while not universally rejected, is treated with caution and often replaced with direct or constructive proofs."
(gemini llm answer to google query: constructive math contradiction)
"Wiles proved the modularity theorem for semistable elliptic curves, from which Fermat’s last theorem follows using proof by contradiction."
So, will the Lean formalization of FLT involve translation to a direct or constructive proof? It seems not, I gather the proof will rely on classical not constructive logic.
"3. Proof by Contradiction:
The core of the formal proof involves assuming ¬Fermat_Last_Theorem and deriving a contradiction. This contradiction usually arises from building a mathematical structure (like an elliptic curve) based on the assumed solution and then demonstrating that this structure must possess contradictory properties, violating established theorems.
4. Formalizing Contradiction:
The contradiction is formalized in Lean by deriving two conflicting statements, often denoted as Q and ¬Q, within the context of the assumed ¬Fermat_Last_Theorem. Since Lean adheres to classical logic, the existence of these conflicting statements implies that the initial assumption (¬Fermat_Last_Theorem) must be false."
(gemini llm answer to google query: Lean formalization of fermat's last theorem "proof by contradiction")
As far as I understand, The lead of this project Kevin Buzzard is a mathematician first. And the majority of mathematicians are untroubled by non-constructive proofs. I would imagine that proof directions that result in the most interesting additions to Mathlib would be chosen.
FLT is a negative statement ("there are no nonzero integers x, y, z such that..."), and proofs by contradiction are constructively valid for proving negative statements.
I have no idea why Gemini is saying that. Proof my contradiction is totally fine. Sure, many people prefer a more direct proof as they are nicer to read, but proof by contradiction is totally fine and sometimes the only way to prove important results.
Side note: The organization that maintains Lean is a "Focused Research Organization", which is a new model for running a science/discovery based nonprofit. This might be useful knowledge for founder types who are interested in research. For more information, see: https://www.convergentresearch.org
And if you want to read why we need additional types of science organizations, see "A Vision of Metascience" (https://scienceplusplus.org/metascience/)
The concept trying new science orgs is noble, but this is the typical Schmidt BS of saying every previous academic consortia is totally incompetent and I'm the only one that can inject the magic sauce of focus and coordination.
Unfortunately being noble or self righteous or whatever emotion you choose has nothing to do with it. If there is a pool of grant money available only to “Focused Research Organizations,” and you want some of it for your work, then you open one and do your work under that umbrella. Academic institutions themselves do this all the time. It looks politically and morally sketchy, and maybe it often is, but it’s the way it works.
To me, it seems like coming up with something more coordinated than a consortium and more flexible than a single lab or a research corporation funded by multiple universities makes sense.
It's probably a narrow set of problems with the right set of constraints and scale for this to be a win.
Having an organization maintain a software tool seems pretty unsurprising. There’s a well-defined problem with easily visible deliverables, relatively little research risk, and small organizations routinely maintain software tools all the time. Whereas broader research is full of risk and requires funders be enormously patient and willing to fund crazy ideas that don’t make sense.
Hmm. I don't know very much about Lean, and it definitely feels smaller in scope and coordination risk than the kinds of things that would generally benefit from this.
(OTOH, within the community they're effectively trying to build a massive, modern Principia Mathematica, so maybe they would...)
> Whereas broader research is full of risk and requires funders be enormously patient and willing to fund crazy ideas that don’t make sense.
Yah. I'm not a researcher, but I keep ending up tangentially involved in research communities. I've seen university labs, loose research networks, loose consortia funding research centers, FFRDC, etc.
What I’ve noticed is that a lot of these consortia or networks struggle to deliver anything cohesive. There's too many stakeholders, limited bandwidth, and nobody quite empowered to say “we’re building this.”
In the cases where there’s a clearly scoped, tractable problem that’s bigger than what a single lab can handle, and a group of stakeholders agrees it’s worth a visionary push, something like an FRO might make a lot of sense.
This is an incredibly bad take on a hard social problem which is hard for reasons that are well understood.
Scientific research is often not immediately applicable, but can still be valuable. The number of people that can tell you if it's valuable are small, and as our scientific knowledge improves, the number of people who know what's going on shrinks and shrinks.
Separately, it's possible to spend many years researching something, and have very little to show for it. The scientists in that situation also want some kind of assurance that they will be able to pay their bills.
Between the high rate of failure, and conflicts of interest, and inscrutability of the research topics. It's very hard to efficiently fund science, and all the current ways of doing it are far from optimal. There is waste, there is grift, there is politics. Any improvement here is welcome, and decreasing the dollar cost per scientific discovery is more important than the research itself in any single field.
I love that they want to formalize this proof, and I understand why they're using Lean.
But part of me feels like if they are going to spend the massive effort to formalize Fermat's Last Theorem it would be better to use a language where quotient types aren't kind of a hack.
Lean introduces an extra axiom as a kind of cheat code to make quotients work. That makes it nicer from a softer dev perspective but IMO less nice from a mathematical perspective.
Yup, Lean's quotient induction breaks subject reduction, which is an important type-theoretic principle. It means you can write a Lean development where t has type A, and t reduces (i.e. computes, as part of the Lean kernel) to u, but u doesn't have type A, and may not even type check. (See https://github.com/digama0/lean-type-theory/releases/downloa... Sec. 3.1 for a detailed discussion of this issue.) This is obviously quite bad, and it goes far beyond the usual drawback of adding axioms to a theory, including the quotient axiom. (Namely, the loss of canonicity.)
Sorry if this is a silly question, but I came across a "white lie" in a paper(https://arxiv.org/html/2502.06137v2#S3.SS2) yesterday:
> Throughout this section, we use A≈B to mean that A and B are essentially equal, in the sense that B is a suitable approximation of A in some sense that we will formalize in a later section. The reader may feel free to assume A=B when verifying estimates, even though A=B is generally false.
Is that when this would be needed?
No: you would want to formalise (or axiomatise) the notion of A≈B: assuming A=B when A≠B lets you prove basically whatever you like.
How much does this leak into typical math-related proofs? If someone would create LeanQ where quotient types are built in nicely, how much work would it be to port the Fermat project from Lean to LeanQ?
AIUI, this cannot lead to inconsistency or a "wrong" proof. So if a proof checks out in Lean that's good enough, you might not even need a separate LeanQ.
Does this mean that most of the proofs in Lean and LeanQ would look exactly the same, it's just that the proofs of some technical low-level lemmas around quotient types (which I guess mathematicians are not really interested in anyway) look different?
For example, if I want to prove that a+b=b+a, I wouldn't care if I'm directly in peano arithmetic or just have a construction of the peano axioms in ZFC, as in both cases the proofs would be identical (some axioms in PA would be lemmas in ZFC).
If that's the case with quotients, I wonder why it's such a big deal for some.
AFAICT, this issue only comes up if you form the quotient of a proposition by a relation. But there is no point in doing that (all the proofs of a proposition are already equal!) so it's not an issue in practice and it wouldn't be difficult to fix.
However, Lean's SR is broken in other ways which do show up in practice: definitional proof irrelevance is incompatible with Lean's computation rule for Acc. That one is much harder to fix.
It’s less nice from a computational perspective. But almost no mathematicians care about computation – they care about semantics, and there is no sense in which Lean’s quotients are a hack from a semantic perspective. I’m sure it makes computer scientists unhappy, but this isn’t a project for computer scientists.
This article is about a team of mathematicians digitizing a proof that will take several years to realize. Lean itself is about computation.
So all the libraries they build up will have these holes in them that make it harder to do things like treat two isomorphic objects as the same -- something mathematicians do implicitly on a daily basis.
You can probably get a long way in Lean with the soundness axiom. But what I don't know is what happens when you build up a decade of software libraries in a system that adds a lot of manual and cognitive overhead when you want to use them.
My gut instinct is that by cutting corners now, they're creating a form of technical debt that could possibly escalate quickly and force mathematicians to reformulate their tools in a nicer way.
This actually happens continuously throughout the history of math. Sometimes it leads to errors like the so-called Italian school of algebra. Sometimes it just leads to pauses while we go back and figure out what the objects we're working with actually are before we can make more progress.
Take all this with a grain of salt: I haven't worked with Lean so I don't know how much this crops up in practice, and I don't know how large Lean libraries are at this point. This is all gut feeling.
But my sense is that what you really want is to get the foundations right, then build abstraction layers on those foundations that are nicer to use. Lean tries to build a "good enough" foundation and historically the gap between what we know is correct and what is seen to be "good enough" tends to show itself sooner or later in math. If you are just working in natural language then you can just forget it was a problem as soon as a fix is found. If you're working in software, though, you'll likely need to do a major rewrite or refactoring.
> I haven't worked with Lean so I don't know how much this crops up in practice
It really doesn't. I've been using Lean and Mathlib for about five years now, and Fermat's Last Theorem is definitely not going to depend on the reduction properties of quotient types in the large scale.
Mathematical reasoning in Lean is almost universally done with rewriting, not reduction. People have found reduction based proofs (colloquially "heavy rfls") to be difficult to maintain. It exposes internal details of definitions. It's better to use the "public API" for mathematical definitions to be sure things can be refactored.
Really, quotients almost should never use the actual `Quot` type unless you have no better choice. In mathematics we like working with objects via universal properties ("public API"). A quotient type is any type that satisfies the universal property of a quotient. All `Quot` does is guarantee that quotients exist with reasonable computation properties, if we ever need them, and if we need those computation properties — which in the kind of math that goes into FLT we often don't. We don't even need `Quot` for Lean to have quotient types, since the classic construction of a set of equivalence classes works. (Though to prove that this construction is correct surely uses functional extensionality, which is proved using `Quot` in some way, but that's an implementation detail of `funext`.)
What’s mathematically questionable about the quotient soundness axiom? It’s justifiable metamathematically. What’s the real difference baking it into the proof kernel? I’d rather such independent properties be modeled as an axiom. The quotient automation I’m familiar with in other theorem provers is typically way more (untrusted!) machinery than just stating quot.sound.
Computation is the difference. In Lean, applying the universal property of the quotient (`Quotient.lift f Hf`) to an element that is of the form `Quotient.mk a` reduces to `f a`.
This rule is fine in itself, but the Lean developers were not sufficiently careful and allowed it to apply for quotients of propositions, where it interferes with the computation rules for proof irrelevance and ends up breaking subject reduction (SR is deeply linked to computation when you have dependent types!) [0]. It is not really a problem in practice though, since there is no point in quotienting a proposition.
[0] see the end of section 3.1 in https://github.com/digama0/lean-type-theory/releases/downloa...
what does it mean to quotient datan’t?
By the time they are hoping to finish in 2029, I bet LLMs are capable of translating the proof from Lean into the alternate theorem proving language of your choice with only a small amount of human assistance.
If this does end up being the case, that translation becomes easy, then essentially all theorem proving efforts should be conducted in the language that is the easiest to work in. You can translate into the "mathematically superior" languages later.
This is a nice overview of what this is, why they're doing it and why it's many years of work:
https://github.com/ImperialCollegeLondon/FLT/blob/main/GENER...
Link added to top text. Thanks!
Are there any graphics that show the massive progress to date in some symbolic form?
There are 13 blueprint graphs (one for each chapter of the blueprint) so far, all incomplete. https://imperialcollegelondon.github.io/FLT/blueprint/dep_gr...
Since the proof already exists in human-written form, I'm wondering, can't OpenAI's IOM gold winning algorithm not translate the blueprint to lean?
In addition to other comments, see https://xenaproject.wordpress.com/2024/12/11/fermats-last-th...
In particular, note that a key lemma of crystalline cohomology rests on a mistake. Experts think that it is fixable by virtue that results have depended on it for a long time and no issue was found, but it is not fixed.
> and my understanding of Maria Ines’ talk is that these issues have now been sorted out
My understanding is that the proof doesn't exist in written form in its entirety.
Plus, Kevin Buzzard is a world expert with some ideas for how to better organize the proof. In general, formalization leads to new understanding about mathematics.
Something people outside of mathematics don't tend to appreciate is that mathematicians are usually thinking deeply about what we already know, and that work reveals new structures and connections that clarify existing knowledge. The new understanding reveals new gaps in understanding, which are filled in, and the process continues. It's not just about collecting verifiably true things.
Even if somehow the OpenAI algorithm could apply here, we'd get less value out of this whole formalization exercise than to have researchers methodically go through our best understanding of our best proof of FLT again.
"The International Mathematical Olympiad (IMO) is the World Championship Mathematics Competition for High School students", so not to undermine it but it's below university or graduate level.
Research level mathematics like this is as hard as it gets, and this proof is famously difficult: uses many branches of advanced mathematics, required thousands of pages of proofs, years of work.
Yes but the hard work (coming up with a human-readable proof) has already been done.
Human readable (informal) proofs are full of gaps that all have to be traced back to axioms e.g. gaps that rely on shared intuition, background knowledge and other informal proofs.
It's somewhat like taking rough pseudo code (the informal proof, a mixture of maths and English) and translating that into a bullet-proof production app (the formal proof, in Lean), where you're going to have to specify every step precisely traced back to axioms, handle all the edge causes, fix incorrect assumptions, and fill in the missing parts that were assumed to be straightforward but might not be.
A major part is you also have to formalise all the proofs your informal proof relied on so everything is traced back to the initial axioms e.g. you can't just cite Pythagorus theorem, you have to formalise that too.
So it's an order of magnitude more difficult to write a formal proof compared to an informal one, and even when you have the informal proof it can teams many years of effort.
> It's somewhat like taking rough pseudo code and translating that into a bullet-proof production app
That's actually where LLMs are already quite good at.
/s
I’m almost certain this is ignorance on my part, but it seems like this would mean the proof is… possibly wrong? I mean if there are gaps and other informal proofs in there?
But I thought it was a widely celebrated result.
Kevin Buzzard told me that the worry that it might in fact be wrong is a huge motivator for him. I also once asked Serge Lang why so much mathematics is correct (which surprised me coming from programming where everything has bugs), and he said; “people do a large number of consistency checks beyond what is in the published proofs, which makes the chances the claimed results are correct much, much higher.” Another related quote Bryan Birch told me once: “it is always a good idea to prove true theorems.”
> “people do a large number of consistency checks beyond what is in the published proofs, which makes the chances the claimed results are correct much, much higher.”
I imagine one bias is because formal verification is such a huge effort, you're only going to do it for really interesting and impactful proofs, which means the proofs that get formal verified will already have been reviewed and critiqued a lot, and will be less likely to have glaring critical errors.
> Kevin Buzzard told me that the worry that it might in fact be wrong is a huge motivator for him.
And then, when I raised concerns in Zulip about Lean's metaprogramming facilities being used to trick the pipeline into accepting false proofs, he said the opposite. He even emphasized that the formalization efforts are not for checking proof correctness, but for cataloguing truths we believe in.
This kind of equivocation turned me away from that community, to be honest. That was an extremely frustrating experience.
For what it's worth, I don't think that Kevin Buzzard is the person you should talk to if you are interested in proof assistant design. As far as I know, Buzzard does not consider himself to be an expert in type theory or in proof assistants, and claims to be a mere user.
I suspect he is massively overestimating the reliability of obscure mathematics.
I think it was Terrance Tao on the Lex Friedman podcast recently that said that there are very often little mistakes in big proofs, but they are almost always able to be patched around. Its like mathematicians' intuition is tracking some underlying reality and the actual formalization is flexible. Yes sometimes digging down into a small mistake leads to an unbridgeable gap and that route has to be abandoned, but uncannily often such issues have nearby solutions.
Also a lot of errors would be called "typos", not errors. Such as some edge cases missing in the theorem statement which technically makes the theorem false. As long as there's a similar theorem in the same spirit that can be proven, that's what the original was all along.
> I’m almost certain this is ignorance on my part, but it seems like this would mean the proof is… possibly wrong?
That's part of the motivation to formalise it. When a proof gets really long and complex, relies on lots of other complex proofs, and there's barely any single expert who has enough knowledge to understand all the branches of maths it covers, there's more chance there's a mistake.
There's a few examples here of errors/gaps/flaws found while formalizing proofs that were patched up:
https://mathoverflow.net/questions/291158/proofs-shown-to-be...
My understanding is it's common to find a few problems that then need to be patched or worked around. It's a little like wanting to know if a huge codebase is bug-free or not: you might find some bugs if you formalized the code, but you can probably fix the bugs during the process because it's generally correct. There can be cases where it's not fixable though.
The level of rigor used in math if sometimes characterized as "sufficient to convince other mathematicians of correctness." So, yeah possibly, but not in a willy nilly way. It's not a proof sketch, it's a proof. It just isn't written in human language designed for communication.
Yes and when it was first published it was wrong (made leap of logic).
It takes thorough review by advanced mathematicians to verify correctness.
This is not unlike a code review.
Most people vastly underestimate how complex and esoteric modern research mathematics are.
The thing is though that MANY bugs slip through even the most thorough code reviews. As a security researcher, I can tell you there is literally no system out there that doesn't have such a bug in it.
The systems we deal with in software are massive compared with your typical mathematical framework though. But FLT is probably on similar scope.
I have little knowledge in this area, but my understanding is it's like this:
There is a pseudocode app that depends on a bunch of pseudocode libraries. They want to translate that pseudocode app into a real runnable app. They can do that, and it's a good amount of work, but reasonable. The problem is to get the app to run they also need to translate the hundreds or thousands of pseudocode libraries into actual libraries. Everything from OS APIs, networking libs, rendering libs, language standard libs all need ro be converted from specs and pseudocode to real code to actually run the app. And that's a ton of work.
No, some of the harder work has been done. Translating human-readable proofs into machine-readable ones is also very hard work and an area of active research.
This is a significantly harder problem than winning gold in IOM. A large part of it is figuring out how to represent some of the relevant ideas in Lean at all.
I disagree. I think it's a sequence of a huge number of modular moderately hard tasks each much easier than a hard IMO question.
Formalizing Wiles' proof requires translating hundreds of pages of sophisticated mathematics with implicit reasoning steps into a precise logical framework, which is fundamentally different from the pattern-matching AI uses to solve competition problems.
That's not how state of the art models work.
All three claims of gold medal performance on IMO 2025 that I'm aware of solved the first 5 problems, that were designed to be solvable by application of standard techniques, but got stumped on the sixth problem that was a bit more unusual. So it does seem like state-of-the-art models solve competition problems by recognizing which kind of problem it is and applying a corresponding solution template. Which is not too different from human competitors exploiting common question patterns, but humans seem to be able to degrade more gracefully by falling back to a more explorative mode when none of the standard tricks seem to apply.
You're either underestimating the length of the proof, or overestimating the length of tasks that models can currently accomplish.
The blueprint is a step-by-step outline.
If the goal is to formalize the proof, you would need more than an outline.
Proof assistant code is high reliability, there is no room to fudge it. This is perhaps the one place where you can really see how bad LLMs are when you care about reliability.
Why? Coding assistants tend to do even better in contexts where they have tools like type checkers and linters to verify their implementations. This area seems actually uniquely well suited to LLM usage.
When I asked experts on formal proofs a year ago, their intuition was that there isn't enough formal proofs out there for LLMs to be very good at the syntax.
It's, as far as I know, quite hard to teach an LLM things it doesn't know.
He is right and it doesn't matter because you can instantly tell if the proof the LLM generates is true or not.
I see people on Zulip using Copilot to write Lean proofs, and they have some success, but the quality is really bad right now, creating long, unmaintainable proofs. New users get stuck, thinking they're 90% of the way to the end, but really the whole thing probably should be scrapped and they should start over.
It's a bit frustrating because, before Copilot, new users would come with proofs and you could spend some time helping them write better proofs and they'd learn things and gain skills, but now it's not clear that this is time well spent on my part. Copilot is not going to learn from my feedback.
Actually automated theorem provers like Lean are the PERFECT use for LLMs because you can instantly determine if the proof it generated is correct.
The problem is if they don't produce any correct ones then that doesn't matter
Actually, some proofs take longer for the computer to verify than for a human (even an unskilled typist) to type out. Several hours to evaluate a two-page document isn't unusual. (I prefer to write optimised proofs, but it can take me weeks to find the correct abstractions for that.)
Are you talking about Lean 4? Lean 4 is usually pretty fast at verifying proofs and most proofs are created in interactive mode so are checked as they are typed.
A "two-page document" (perhaps 200-300 lines of code) would typically check in a matter of seconds to, at most, a few minutes. If it took hours, it would mean the file contains a pathological case that a developer would be expected to identify and fix. It is absolutely not a normal or "not unusual" occurrence.
"In constructive mathematics, proof by contradiction, while not universally rejected, is treated with caution and often replaced with direct or constructive proofs."
"Wiles proved the modularity theorem for semistable elliptic curves, from which Fermat’s last theorem follows using proof by contradiction." So, will the Lean formalization of FLT involve translation to a direct or constructive proof? It seems not, I gather the proof will rely on classical not constructive logic."3. Proof by Contradiction: The core of the formal proof involves assuming ¬Fermat_Last_Theorem and deriving a contradiction. This contradiction usually arises from building a mathematical structure (like an elliptic curve) based on the assumed solution and then demonstrating that this structure must possess contradictory properties, violating established theorems. 4. Formalizing Contradiction: The contradiction is formalized in Lean by deriving two conflicting statements, often denoted as Q and ¬Q, within the context of the assumed ¬Fermat_Last_Theorem. Since Lean adheres to classical logic, the existence of these conflicting statements implies that the initial assumption (¬Fermat_Last_Theorem) must be false."
(gemini llm answer to google query: Lean formalization of fermat's last theorem "proof by contradiction")
As far as I understand, The lead of this project Kevin Buzzard is a mathematician first. And the majority of mathematicians are untroubled by non-constructive proofs. I would imagine that proof directions that result in the most interesting additions to Mathlib would be chosen.
FLT is a negative statement ("there are no nonzero integers x, y, z such that..."), and proofs by contradiction are constructively valid for proving negative statements.
A purely universal statement, to be more clear.
I have no idea why Gemini is saying that. Proof my contradiction is totally fine. Sure, many people prefer a more direct proof as they are nicer to read, but proof by contradiction is totally fine and sometimes the only way to prove important results.
I'm not disagreeing (I'm on the fence. Also a bit of a nube.). I thought this was a good read and on topic.
https://www.quora.com/In-math-are-there-any-proofs-that-can-...