AlphaProof's Greatest Hits

(rishimehta.xyz)

111 points | by rishicomplex 6 hours ago ago

51 comments

  • sega_sai an hour ago

    I think the interface of LLM with formalized languages is really the future. Because here you can formally verify every statement and deal with hallucinations.

    • raincole 26 minutes ago

      It's obviously not the future (outside of mathematics research). The whole LLM boom we've seen in the past two years comes from one single fact: peopel don't need to learn a new language to use it.

      • seizethecheese 22 minutes ago

        Both comments can be right. People don’t need to know HTML to use the internet.

  • throwaway713 an hour ago

    Anyone else feel like mathematics is sort of the endgame? I.e., once ML can do it better than humans, that’s basically it?

    • raincole 22 minutes ago

      If "better than humans" means when you give it a real world problem, it gives you a mathematical model to describe it (and does it better than human experts), then yes, it's the end game.

      If it just solves a few formalized problems with formalized theorems, not so much. You can write a program that solves ALL the problems under formalized theorems already. It just runs very slowly.

    • margorczynski 32 minutes ago

      I doubt it. Math has the property that you have a way to 100% verify that what you're doing is correct with little cost (as it is done with Lean). Most problems don't have anything close to that.

      • exe34 20 minutes ago

        to be fair, humans also have to run experiments to discover whether their models fit nature - AI will do it too.

        • margorczynski 13 minutes ago

          These kind of experiments are many times orders of magnitude more costly (time, energy, money, safety, etc.) than verifying a mathematical proof with something like Lean. That's why many think math will be one of the first to crack with AI as there is a relatively cheap and fast feedback loop available.

  • sbierwagen 5 hours ago

    More information about the language used in the proofs: https://en.wikipedia.org/wiki/Lean_(proof_assistant)

  • sincerely 5 hours ago

    in the first question, why do they even specify ⌊n⌋ (and ⌊2n⌋ and so on) when n is an integer?

    • rishicomplex 5 hours ago

      Alpha need not be an integer, we have to prove that it is

      • sincerely 5 hours ago

        Should have read more carefully, thank you!

  • Robotenomics 4 hours ago

    “Only 5/509 participants solved P6”

    • nybsjytm 3 hours ago

      This has to come with an asterisk, which is that participants had approximately 90 minutes to work on each problem while AlphaProof computed for three days for each of the ones it solved. Looking at this problem specifically, I think that many participants could have solved P6 without the time limit.

      (I think you should be very skeptical of anyone who hypes AlphaProof without mentioning this - which is not to suggest that there's nothing there to hype)

      • auggierose 3 hours ago

        Certainly an interesting information that AlphaProof needed three days. But does it matter for evaluating the importance of this result? No.

        • nybsjytm 2 hours ago

          I agree that the result is important regardless. But the tradeoff of computing time/cost with problem complexity is hugely important to think about. Finding a proof in a formal language is trivially solvable in theory since you just have to search through possible proofs until you find one ending with the desired statement. The whole practical question is how much time it takes.

          Three days per problem is, by many standards, a 'reasonable' amount of time. However there are still unanswered questions, notably that 'three days' is not really meaningful in and of itself. How parallelized was the computation; what was the hardware capacity? And how optimized is AlphaProof for IMO-type problems (problems which, among other things, all have short solutions using elementary tools)? These are standard kinds of critical questions to ask.

          • dash2 2 hours ago

            Though, if you start solving problems that humans can't or haven't solved, then questions of capacity won't matter much. A speedup in the movement of the maths frontier would be worth many power stations.

            • nybsjytm 2 hours ago

              For some time a 'superhuman math AI' could be useful for company advertising and getting the attention of VCs. But eventually it would be pretty clear that innovative math research, with vanishingly few exceptions, isn't very useful for making revenue. (I am a mathematician and this is meant with nothing but respect for math research.)

              • airstrike an hour ago

                "Making revenue" is far from being the only metric by which we deem something worthy.

                • nybsjytm 25 minutes ago

                  As a mathematician, of course I agree. But in a sentence like:

                  > A speedup in the movement of the maths frontier would be worth many power stations

                  who is it 'worth' it to? And to what end? I can say with some confidence that many (likely most, albeit certainly not all) mathematicians do not want data centers and power stations to guzzle energy and do their math for them. It's largely a vision imposed from without by Silicon Valley and Google research teams. What do they want it for and why is it (at least for now) "worth" it to them?

                  Personally, I don't believe for a second that they want it for the good of the mathematical community. Of course, a few of their individual researchers might have their own personal and altruistic motivations; however I don't think this is so relevant.

        • andrepd 35 minutes ago

          More or less. Modern theorem provers, even fully automatic ones, can prove incredibly difficult problems if given enough time. With 3 days and terabytes of memory, perhaps they could? Would be interesting to compare Alphaproof with a standard theorem prover that is given similarly astronomical computing resources.

          • auggierose 14 minutes ago

            That is an interesting thought. But I doubt that standard automated theorem proving techniques would have solved this in a 100 years, even with immense resources, as all they do is (basically) brute force search, and it remains an exponential search space. Therefore 3 days does not really matter, indeed, 3 months would still be a result that 10 years ago I would have thought impossible.

  • wslh 5 hours ago

    If you were to bet on solving problems like "P versus NP" using these technologies combined with human augmentation (or vice versa), what would be the provable time horizon for achieving such a solution? I think we should assume that the solution is also expressible in the current language of math/logic.

    • uptownfunk 4 hours ago

      The hard part is in the creation of new math to solve these problems not in the use of existing mathematics. So new objects (groups rings fields) etc have to be theorized, their properties understood, and then that new machinery used to crack the existing problems. I think we will get to a place (around 5 years) where AI will be able to solve these problems and create these new objects. I don’t think it’s one of technology I think it’s more financial. Meaning, there isn’t much money to be made doing this (try and justify it for yourself) and so the lack of focus here. I think this is a red herring and there is a gold mine in there some where but it will likely take someone with a lot of cash to fund it out of passion (Vlad Tenev / Harmonic, or Zuck and Meta AI, or the Google / AlphaProof guys) but in the big tech world, they are just a minnow project in a sea of competing initiatives. And so that leaves us at the mercy of open research, which if it is a compute bound problem, is one that may take 10-20 years to crack. I hope I see a solution to RH in my lifetime (and in language that I can understand)

      • wslh 4 hours ago

        I understand that a group of motivated individuals, even without significant financial resources, could attempt to tackle these challenges, much like the way free and open-source software (FOSS) is developed. The key ingredients would be motivation and intelligence, as well as a shared passion for advancing mathematics and solving foundational problems.

        • uptownfunk 2 hours ago

          Ok but how do you get around needing a 10k or 100k h100 cluster

          • wslh 2 hours ago

            It is well known that cloud services like Google Cloud subsidizes some projects and we don't even know if in a few years improvements will arise.

    • zarzavat 5 hours ago

      Probably a bad example, P vs NP is the most likely of the millennium problems to be unsolvable, so the answer may be "never".

      I'll bet the most technical open problems will be the ones to fall first. What AIs lack in creativity they make up for in ability to absorb a large quantity of technical concepts.

      • wslh 4 hours ago

        Thank you for the response. I have a follow-up question: Could these AIs contribute to advancements in resolving the P vs NP problem? I recall that the solution to Fermat’s Last Theorem relied on significant progress in elliptic curves. Could we now say that these AI systems might play a similar role in advancing our understanding of P vs NP?

        • ccppurcell an hour ago

          Just my guess as a mathematician. But if LLMs are good for anything it will be for finding surprising connections and applying our existing tools in ways beyond human search. There's a huge space of tools and problems, and human intuition and brute force searching can only go so far. I can imagine that LLMs might start to find combinatorial proofs of topological theorems, maybe even novel theorems. Or vice versa. But I find it difficult to imagine them inventing new tools and objects that are really useful.

          • nemonemo an hour ago

            Do you have a past example where this already-proven theorem/new tools/objects would have been only possible by human but not AI? Any such example would make your arguments much more approachable by non-mathematicians.

    • hiddencost 5 hours ago

      No one is focused on those. They're much more focused on more rote problems.

      You might find them used to accelerate research math by helping them with lemmas and checking for errors, and formalizing proofs. That seems realistic in the next couple of years.

      • nybsjytm 3 hours ago

        There are some AI guys like Christian Szegedy who predict that AI will be a "superhuman mathematician," solving problems like the Riemann hypothesis, by the end of 2026. I don't take it very seriously, but that kind of prognostication is definitely out there.

  • nybsjytm 3 hours ago

    Why have they still not released a paper aside from a press release? I have to admit I still don't know how auspicious it is that running google hardware for three days apiece was able to find half-page long solutions, given that the promise has always been to solve the Riemann hypothesis with the click of a button. But of course I do recognize that it's a big achievement relative to previous work in automatic theorem proving.

    • whatshisface 3 hours ago

      I don't know why so few people realize this, but by solving any of the problems their performance is superhuman for most reasonable definitions of human.

      Talking about things like solving the Reimman hypothesis in so many years assumes a little too much about the difficulty of problems that we can't even begin to conceive of a solution for. A better question is what can happen when everybody has access to above average reasoning. Our society is structured around avoiding confronting people with difficult questions, except when they are intended to get the answer wrong.

      • zeroonetwothree 2 hours ago

        Well, multiply two large numbers instantly is a superhuman feat a calculator can do. I would hope we are going for a higher bar, like “useful”. Let’s see if this can provide proofs of novel results.

        • whatshisface 2 hours ago

          The ability to multiply two numbers superhumanly has already transformed society.

        • nybsjytm 2 hours ago

          It's worth emphasizing that it's been possible for years to use an automatic theorem prover to prove novel results. The whole problem is to get novel interesting results.

        • rowanG077 an hour ago

          That superhuman capability of "multiplying two large numbers instantly" has transformed human society like not even plumbing has. I really can't see how this you could not consider this useful.

      • GregarianChild 2 hours ago

        We know that any theorem that is provable at all (in the chosen foundation of mathematics) can be found by patiently enumerating all possible proofs. So, in order to evaluate AlphaProof's achievements, we'd need to know how much of a shortcut AlphaProof achieved. A good proxy for that would be the total energy usage for training and running AlphaProof. A moderate proxy for that would be the number of GPUs / TPUs that were run for 3 days. If it's somebody's laptop, it would be super impressive. If it's 1000s of TPUs, then less so.

        • Onavo 2 hours ago

          > We know that any theorem that is provable at all (in the chosen foundation of mathematics) can be found by patiently enumerating all possible proofs.

          Which computer science theorem is this from?

          • kadoban 2 hours ago

            It's a direct consequence of the format of a proof. They're finite-length sequences of a finite alphabet of symbols, so of course they're enumerable (the same algorithm you use to count works to enumerate them).

            If you want a computer to be able to tell that it found a correct proof once it enumerates it, you need a bit more than that (really just the existence of automated proof checkers is enough for that).

          • zeroonetwothree 2 hours ago

            It’s just an obvious statement. If a proof exists, you will eventually get to it.

          • E_Bfx an hour ago

            I guess it is tautological from the definition of "provable". A theorem is provable by definition if there is a finite well-formulated formula that has the theorem as consequence (https://en.wikipedia.org/wiki/Theorem paragraph theorem in logic)

            • Xcelerate 44 minutes ago

              Not sure it’s a tautology. It’s not obvious that a recursively enumerable procedure exists for arbitrary formal systems that will eventually reach all theorems derivable via the axioms and transformation rules. For example, if you perform depth-first traversal, you will not reach all theorems.

              Hilbert’s program was a (failed) attempt to determine, loosely speaking, whether there was a process or procedure that could discover all mathematical truths. Any theorem depends on the formal system you start with, but the deeper implicit question is: where do the axioms come from and can we discover all of them (answer: “unknown” and “no”)?

          • Xcelerate an hour ago

            Imagine thinking this is a recent “theorem” of computer science rather than a well-known result in mathematics from very long ago. People thought about the nature of computation long before CS was even a thing.

      • nybsjytm 2 hours ago

        > A better question is what can happen when everybody has access to above average reasoning. Our society is structured around avoiding confronting people with difficult questions, except when they are intended to get the answer wrong.

        What does this have to do with a hypothetical automatic theorem prover?

        • whatshisface 2 hours ago

          Logic is pretty much absent from our culture and daily life, but that could be due to its limited supply.

          • nybsjytm an hour ago

            Being logical in social life is pretty much completely different from being logical in a mathematical argument, especially in a formal theorem proving environment. (Just try to write any kind of cultural proposition in a formal language!)

            • whatshisface 41 minutes ago

              That's the way things are now, but this regime came about when proving things took intense concentration and specialized skills that very few people had. Contrast going to look something up in a library with googling something during a conversation.

    • GregarianChild 2 hours ago

      Google stopped publishing interesting AI work since they had their AI lead taken away by OpenAI, and mostly with tech that was pioneered, but not monetised by Google like transformers.

      I imagine they are under pressure not to make this mistake again.