New proof dramatically compresses space needed for computation

(scientificamerican.com)

163 points | by baruchel 3 days ago ago

90 comments

  • zerof1l 12 hours ago

    Here's the gist:

    For nearly 50 years theorists believed that if solving a problem takes t steps, it should also need roughly t bits of memory: 100 steps - 100bits. To be exact t/log(t).

    Ryan Williams found that any problem solvable in time t needs only about sqrt(t) bits of memory: a 100-step computation could be compressed and solved with something on the order of 10 bits.

    • svat 5 hours ago

      To be clear, this was/is a statement about the worst case, not every problem. So it may be clearer to rephrase your sentence as:

      For nearly 50 years theorists believed that there exist problems taking t steps that also need roughly t/log(t) bits of memory.

      (The Quanta magazine article https://www.quantamagazine.org/for-algorithms-a-little-memor... gives some context: for a long time, the best result was t/log(t) as proved in "On Time Versus Space" https://dl.acm.org/doi/10.1145/322003.322015 by Hopcroft, Paul and Valiant in 1975.)

    • heavenlyblue 10 hours ago

      Why does it need the same bits of memory as steps?

      • mjburgess 9 hours ago

        The basic intuition of the two extremes are:

        * High space requirement: we need a bucket per step to track the state

        * Low space requirement: we only need a single bucket, if we can mix/unmix the various components of the state

        High-space requirement algorithms will be those with a large amount of "unmixable" complex state. Low-space reqs will be those with either a small state, or an easily mixed/unmixed state.

        Example of mixing: suppose we need an odd number and a parity (+,-) -- then we can store odd/even numbers: taking even to means (-1 * (number-1)) and odd means odd.

        So 7 = +7, 8 = -7

        I don't know if this example is really that good, but I believe the very basics of the intution is correct -- its to do with how we can trade "interpretation" of data (computation) for the amount of data in such a way that data can kinda be mixed/unmixed in the interpretation

      • TrueDuality 8 hours ago

        Bits are a bit misleading here, it would be more accurate to say "units of computation". If you're problem space operates on 32-bit integers this would be 32bits * number of steps, these papers solve for individual bits as the smallest individual unit of computation we can commonly reason about.

        • mort96 8 hours ago

          Wait but if "bits of memory" here was supposed to be "units of computation", that means that:

          * The old assumption was that if you require t steps to complete, you need t/log(t) units of computation

          * This new proof shows that if you require t steps to complete, you need sqrt(t) units of computation

          Surely this doesn't make sense? Using any definition of "unit of computation" I would intuitively assume, computing t steps requires something proportionalt to t units of computation...

          • nyrikki 8 hours ago

            There is a bit of nuances here that is difficult to explain fully but note from the paper.

            > Our simulation reduces the problem of simulating time-bounded multitape Turing machines to a series of implicitly-defined Tree Evaluation instances with nice parameters, leveraging the remarkable space-efficient algorithm for Tree Evaluation recently found by Cook and Mertz [STOC 2024].

            The "time-bounded multitape Turing machines" with bounded fan-in means that that particular abstract model has access to the bits of those tapes current head position.

            Mapping the quirks of the various abstract models can be tricky, but remember having access to the symbol currently under the tape head is 'free' in TMs.

            It is a useful abstraction but doesn't directly map to physically realizable machine.

            It is still an interesting result for trees in physically realizable machines.

    • cubefox 9 hours ago

      > Ryan Williams found that any problem solvable in time t needs only about sqrt(t) bits of memory

      At least or at most or on average?

      • user070223 8 hours ago

        I've skimmed the result couple of weeks ago, and if I remember it states that there from all machines which takes t time to accomplish something there is one such that sqrt(t) memory is used. so it's at most but not on avg nor amorotized[not sure if et even make sense it the space where the problem lies] (you take the least memory hungry machine)

        • cma 4 hours ago

          There must be more constraints, what if the problem is copying a string? Or is it only for turing tape machines where it has to backtrack during the copy, increasing number of steps?

          • jasperry 2 hours ago

            That's a good example! So the sqrt(n) space in the result must refer to space beyond that needed to write down the output--basically a working memory scratchpad. In which case, a string-copying function could use just a constant amount of scratch space.

            I mean, for all I know, the result may have been proved in the model of decision problems where the output is always one bit. But I'd guess that it generalizes just fine to the case where you have to write longer outputs.

            However, your question does make me wonder if the result still applies in the RAM model, where you can move to any spot on the tape in constant time. My theory knowledge is getting rusty...

            • jasperry 2 hours ago

              Update after youtube: It was already known that a time-t function can be computed in sqrt(t) space on a single-tape Turing machine. It's exactly like @cma said: a Turing machine wastes a lot of time moving the head back and forth. Williams' result shows sqrt(t) space for multi-tape Turing machines. Those are much more time-efficient in regard to head movement, so until Williams' proof it was not believed to be possible to to make any algorithm use only sqrt(t) space.

      • jeanlucas 8 hours ago

        In the best case

    • taneq 9 hours ago

      Huh? What did any of those theorists think about the phrase "time/space tradeoff"?

      • bubblyworld 9 hours ago

        That phrase is usually about a specific problem, right? This is more like given an arbitrary algorithm (you don't know which up front), can you optimise it's memory usage? Seems kinda surprising to me that you can.

      • mhuffman 9 hours ago

        They were likely the ones that invented the phrase.

        • mort96 8 hours ago

          Really? But the whole idea behind the space/time trade-off is that in a bunch of cases, if you want to compute something in less memory, you need to solve it in more steps; or if you want to solve something in fewer steps, you need to use more memory.

          This seems to wildly contradict the idea that the amount of memory needed is roughly proportional to the number of steps.

          • svat 5 hours ago

            > in a bunch of cases

            That's precisely the issue: of course for many problems there are obvious ways to trade off space and time. Smart people who have spent their lives thinking about this are aware of this. :-) The issue here is: can this always be done? For an arbitrary computation that takes t time, can it always be simulated in much less space (using more time of course), no matter what the computation is? Lots of people have tried, and the best result until now was t/log t in the 1970s.

            Edit: A comment on a previous discussion of this result by the author of the Quanta article, which substantiates and links to researchers' opinion that it's the universality (works for every problem) that's the surprising part: https://news.ycombinator.com/item?id=44075191

          • eru 6 hours ago

            The amount of steps is an (over)estimate on the amount of memory needed.

            You can write some trivial programs that need this much memory. But most of them can be optimised to use less space (and still compute the same answer).

            The big question was: can you always optimise the space usage down from the ridiculous maximum and to what extent?

            • mort96 6 hours ago

              But plenty of programs can only be optimized to take less space by making them run slower (i.e take more steps)...

              • eru 6 hours ago

                Yes. But even with infinite time there's a limit to how much you can compress the space usage. And that's an interesting property to study.

                In addition you can look at how much extra time you actually need: infinite time is a vast overestimate. The new proof gives much more reasonable time bounds.

              • Ar-Curunir 6 hours ago

                That is also what is happening here. When you reduce space to sqrt you are increasing time complexity. The interesting part of the result is that you can make this tradeoff for _any_ problem.

        • taneq 8 hours ago

          I'm not convinced of that if they thought that s=O(t/log(t)) where s=space (bytes or similar) and t=time (clocks, steps, etc.) I dunno about theoretical limits but in practice, there's at least 1-2 orders of magnitude to be exchanged between space and processing time.

          • eru 6 hours ago

            That's in practice for practical problems. But they were interested in the worst cases.

    • m3kw9 8 hours ago

      Doesn’t make practical sense why they would even assign a number to problems which could have unknown dependencies. Unless you are talking about bounded math issues

    • andsoitis 7 hours ago

      > Ryan Williams found that any problem solvable in time t needs only about sqrt(t) bits of memory: a 100-step computation could be compressed and solved with something on the order of 10 bits.

      Does it require one to first have solved the problem in uncompressed space?

      • jasperry an hour ago

        As I understand it, this result is about algorithms that solve any instance of a given computational problem (like finding the shortest paths through a graph.) The specific problem instance (the graph) is the input to the algorithm. The result shows how to take an algorithm and construct another algorithm that solves the same class of problems in less space, by transforming how the algorithm reads and writes memory. Now you have an algorithm that still solves any instance of that problem but uses only sqrt(t) space.

        So, to answer your question more directly: no, you don't have to have solved a specific instance of the problem, but you have to already have an algorithm that solves it in general.

    • zombot 10 hours ago

      > log(t)

      log to what basis? 2 or e or 10 or...

      Why do programmers have to be so sloppy?

      • anonymous_sorry 9 hours ago

        It's not sloppiness, it's economy.

        You can convert between log bases by multiplying by a constant factor. But any real-world program will also have a constant factor associated with it, depending on the specific work it is doing and the hardware it is running on. So it is usually pointless to consider constant factors when theorising about computer programs in general.

      • 12345ieee 10 hours ago

        It doesn't matter in O() notation.

      • derbOac 9 hours ago

        One answer, from a certain perspective, is that it's relative to your encoding base. It's logarithmic in operations, with the base depending on the encoding.

      • eviks 9 hours ago

        Another reason is because base e notation is ln, not log

        • tzs 7 hours ago

          It depends on the field.

          For example in programming base e is more common. For example log is base e in C/C++, JavaScript, Java, Python, Perl, Mathematica, Fortran, and Rust.

          Another example is number theory. I just checked a few number theory books from my library and most used base e: Hardy & Wright's "An Introduction to the Theory of Numbers", Apostol's "Introduction to Analytic Number Theory", Baker's "A Comprehensive Course in Number Theory", Ingham's "The Distribution of Prime Numbers", Baker's "Transcendental Number Theory", Ribenboim's "The Book of Prime Number Records", Kumanduri & Romero's "Number Theory with Computer Applications", and Niven's "Irrational Numbers".

          The only number theory book I found using ln rather than log was Gelfond's "Transcendental & Algebraic Numbers".

          • eviks 6 hours ago

            That confusion is unfortunate, thanks for checking the books!

            Am now a bit curious as to what the country/scientific field table with most common log notation would look like as it seems there is indeed a lot of variance...

            https://mathworld.wolfram.com/Ln.html

            > The United States Department of Commerce recommends that the notation lnx be used in this way to refer to the natural logarithm (Taylor 1995, p. 33).

            > Unfortunately, mathematicians in the United States commonly use the symbol logx to refer to the natural logarithm, as does TraditionalForm typesetting in the Wolfram Language

        • tempodox 7 hours ago

          If only! The math libs I know use the name `log` for base e logarithm, not `ln`.

          • eviks 7 hours ago

            hm, you're right, this unfortunatley proves the original point...

      • Tarq0n 9 hours ago

        This is very common. Log without further specification can be assumed to be the natural log (log e).

        • holowoodman 9 hours ago

          No. Usually log without further specification is base10.

          Except in mathematics and physics, where it usually is base e.

          Except sometimes in computer science, where it can be base 2.

          But there are more of those weirdnesses: "ld" can be "log decimal", so base 10; or "logarithmus dualis", so base 2. Base 2 is also sometimes written as "lb" (log binary). You really need to know the conventions of the field/subfield to know which is which.

          • eru 6 hours ago

            Log without a basis (at least to me) usually indicates that the basis doesn't matter in this context (or is obvious).

            I usually see either base e or base 2 as the default. In which applications do people use both logarithms and base 10 as the default? I mean, especially these day. In the olden days of the slide ruler base 10 was probably more common.

            • holowoodman 3 hours ago

              > In which applications do people use both logarithms and base 10 as the default?

              Engineering. For example dB is defined in base 10.

          • gbacon 2 hours ago

            Base 2 is also sometimes abbreviated lg.

        • mort96 8 hours ago

          In this case, and in many other cases, log without further specification is meant to be understood as just "it is logarithmic" without further specification. With big O, we don't differentiate between log bases, just as we don't differentiate between scale factors. In fact, converting from one log base to another is just multiplying by a constant.

        • griffzhowl 9 hours ago

          In information theory it usually means log base 2

        • eviks 9 hours ago

          no, that's what ln is for

          • thaumasiotes an hour ago

            Well, you're right that that's what "ln" is for. But more specifically "ln" is for indicating the natural log on calculators that already have another button labeled "log". Tarq0n is correct that "log" without further specification can be assumed to be the natural log.

  • cyrillite 12 hours ago

    One of my favourite sci comms YouTubers explained this in great detail https://youtu.be/8JuWdXrCmWg

    Highly recommend

    • teekert 9 hours ago

      Yeah, where Hossenfelder is getting more and more dramatic (although I can still appreciate it) she has a refreshingly calm and intelligent tone. Highly recommended indeed.

      • waynecochran 2 hours ago

        She has taken to click bait -- e.g., using the word "shocking" and showing pictures with here mouth open (actually she can't really open her mouth, but you get the idea).

      • eru 6 hours ago

        Sabine Hossenfelder also has very 'interesting' ideas about determinism.

        • teekert 5 hours ago

          Which I share. It’s haunted me for a long time, but I’ve accepted it. Much like Sabine.

          We can’t predict the future but we do not have free will as most people think we have, imho. Many of those separated brain cases seem to confirm this stance.

          • hnuser123456 5 hours ago

            Sure, but if someone finds themselves feeling incredibly defeated by the thought, then how can we call it productive philosophy? I went too far down this rabbit hole about 8 years ago, and built up a strong mindset of all the things I wanted to be that I couldn't because I wasn't born in the right circumstances. Much better to feel like I can absolutely be those things at least in spirit, and maybe talk to other people about it and find people who are willing to see the unexpected parts of me.

            Yes, we have enough accurate theories now that we can predict parts of the future with incredible accuracy in ways that weren't possible 100+ years ago, but we don't have a bulletproof theory of everything, much less a bulletproof theory of everything about humans.

            Championing superdeterminism is like being the smart alec who says they can predict anything if you give them enough initial context. Sure, now go make risky investments that will only go up.

            The Heisenberg uncertainty principle itself shows that it is not worth fretting too much about superdeterminism.

            We will never be able to replace every theory that uses probabilities with ones that don't.

          • griffzhowl 4 hours ago

            We can evaluate various courses of action, and pick one based on that evaluation. I think something like that is what most people think of as having free will. If we discover that evaluation function is deterministic, it shouldn't change our attitudes to it imo. This is how we normally think of the past: what we did in the past is now determined, and yet we were just as free then as we are in the present. And we presently either suffer or enjoy the consequences of those past decisions, so we better take our present decisions reasonably seriously.

            In general I'm quite sceptical of taking physical theories as guidance for life, even more so speculative interpretations of physical theories. Physics can tell us, probabilistically, how relatively simple systems will evolve in time, but most events in our life are way beyond what it can predict, so that should caution us against extrapolating its concepts to these more complicated phenomena. We don't know what conceptual innovations might be required to have a fully coherent and precise account of them, or if that is even possible. The best insight we can get on our life is to acutally live it and reflect on those experiences.

            Other considerations are that we don't currently have a fundamental physical theory, since general relativity and the standard model don't work together. Even when they do apply, the equations can't be solved exactly for systems of more than two particles. Everything else involves approximations and simplifications, and in fact even the concept of particle is only valid in a non-relativistic approximation. That suggests to me that physical theories are best thought of at the moment as tools that have a limited domain of effectiveness, though of course within that domain they're extremely effective.

    • lherron 9 hours ago

      Thanks for this! Subscribed.

  • JohnKemeny 10 hours ago

    Related:

    For algorithms, a little memory outweighs a lot of time. 343 points, 139 comments, 39 days ago. https://news.ycombinator.com/item?id=44055347

    You need much less memory than time. 126 points, 11 comments, 22 days ago. https://news.ycombinator.com/item?id=44212855

    • ALLTaken an hour ago

      I am trying to understand how the paper OP posted can be translated into anything that improves current algorithms.

      Can actually any current algorithms improved by this?? Or is that only the case if the hardware itself is improved?

      I am baffled

      • JohnKemeny 33 minutes ago

        This is blue skies research and is not expected to have any immediate practical impact whatsoever.

  • mikewarot 11 hours ago

    I get that this is an interesting theoretical proof. I'm more interested in the inverse, trading memory for time, to make things faster, even if they take more space. It seems to me the halting problem is almost a proof the inverse of this is impossible.

    Memoization is likely the best you can do.

    • LegionMammal978 9 hours ago

      The author of the paper also notes how the result gives an upper bound for how well we can 'trade space for time' in the worst case:

      > In this paper, we make another step towards separating P from PSPACE, by showing that there are problems solvable in O(n) space that cannot be solved in n^2/log^c n time for some constant c > 0. [0]

      Of course, this is all specifically in the context of multitape Turing machines, which have very different complexity characteristics than the RAM machines we're more used to.

      [0] https://arxiv.org/abs/2502.17779

    • burnt-resistor 7 hours ago

      Dynamic programming and divide-and-conquer with multiple cores on a diagonalized problem space of independent problems (CPU and memory hierarchy) with (hopefully) a final reduce step at the end. Real-world problems are seldom so neat, hence the investment in Infiniband gear.

    • 8 hours ago
      [deleted]
  • trueismywork 13 hours ago
  • krackers 3 hours ago
  • gwbas1c 7 hours ago

    One thing that's surprised me throughout my career is just how inefficient most of the code that I've inherited is. I suspect we could do a lot more with the hardware we have if we simply became better at programming.

    (Perhaps more AI coaching could help?)

    • jiehong 6 hours ago

      I'm also surprised by how many developers just don't seem to care much.

      Yet, the general population isn't behaving like scouts most of the time, so I suppose it's only fair that developers also aren't.

      "Leave the world a better place than you found it" (s/world/code/g)

      • Agingcoder 3 hours ago

        I’ve spent many years working on hpc problems, and most people simply don’t have , or don’t care about, or don’t know what to do with performance problems.

        Don’t have to because they usually have a full machine, and unless they saturate it, the problem doesn’t exist.

        Don’t care, because many people think it’s alright if things are slow, or inefficient ( the user can wait, or will live with 100ms instead of 1ms).

        Don’t know what to do - this one is interesting : many people don’t realize it’s actually possible to get the same result , but faster.

        • gwbas1c 3 hours ago

          > Don’t care, because many people think it’s alright if things are slow, or inefficient ( the user can wait, or will live with 100ms instead of 1ms).

          It's not that.

          I've encountered code where someone wrote complicated "scale out" code instead of learning how to use a database and fix the n+1 problem; or another case where someone wrote complicated caching code instead of learning how to avoid cartesian explosion.

    • HideousKojima 7 hours ago

      AI is trained on the same shitty code you're inheriting, so probably not.

      • humanfromearth9 7 hours ago

        It's also trained on all best practices and algorithms that you don't know exist, so it is able to do better - provided you know to ask and how to ask/what to ask.

        • HideousKojima 6 hours ago

          It's not simply a matter of knowing what/how to ask. LLMs are essentially statistical regressions on crack. This is a gross oversimplification, but the point is that what they generate is based on statistical likelihoods, and if 90%+ of the code they were trained on was shit you're not going to get the good stuff very often. And if you need an AI to help you do it you won't even be able to recognize the good stuff when it does get generated.

  • gweinberg 3 hours ago

    I don't understand what they are saying at all. If I have a computation that requires going through a loop n times, why should the memory requirements increase with n at all?

    • svat 37 minutes ago

      Of course in many cases (such as the one you mentioned), the amount of memory needed does not grow with N. But there are other programs/computation that seem to require more memory. For example, consider the following (Python) program:

          def intpow(n, r): return int(n**r)
          N = 10000
          a = [0] * N
          a[1] = 1
          for m in range(2, N):
              a[m] = a[m-1] + a[intpow(m, 0.6)] + a[intpow(m, 0.3)] + a[intpow(m, 0.1)]
      
      It populates an array of size N by, for each index m, accessing array elements at smaller values m-1, m^0.6, m^0.3 and m^0.1 (just arbitrary values I made up). So this is a program that runs in time about N, and as it can depend on array elements that go very far back, it may not be immediately obvious how to make it run with space significantly less than N^0.9 (say), let alone √N. But the result shows that it can be done, and for any program no matter how weird.
    • jasperry an hour ago

      Sure, there are plenty of computations where you don't need much space no matter how many times you loop--like if you're just updating a single sum. But there are other problems where each time through the loop you might need to add an element to a big array or matrix, and then you go back and do more computation on the saved data elements to get the answer.

      Before this result, there were believed to be at least some problems where you need (almost) as much space as time, but Williams shows that, theoretically at least, that's never necessarily the case.

  • snickerbockers 23 minutes ago

    >(Technically, that equation was t/log(t), but for the numbers involved log(t) is typically negligibly small.)

    My dude, that is NOT how rational numbers work.

  • kristianp 11 hours ago

    This seems very theoretical, just a lower bound on space required, without talking about what is being computed. Does it have any import on real algorithms?

    • wiz21c 11 hours ago

      Lower bound tells you how much it's worth to improve the SOTA. It gives you a hint that you can do better.

      So it's more like polar star. Maybe not directly practical, but it will lead tons of people in the right direction.

      • Gravityloss 11 hours ago

        Thanks. I think the title is quite misleading then? I would have expected better from Scientific American.

    • wasabi991011 9 hours ago

      It's a reduction from multitape Turing machine to tree evaluations. If your "real algorithm" is straightforwardly modeled by a multitape TM, then it might be possible to use this proof to reduce the space. Otherwise, I don't think so.

    • CommenterPerson 10 hours ago

      It does have an impact. It wont give you stackoverflow like code to copy-paste though.

      Analogy is thermodynamics says how efficient a heat engine _could_ be. If your engine is way below that you know there how much of an improvement there _could_ be, that it's not an impossible problem. That will get people to build better stuff.

    • bmacho 10 hours ago

      Upper bound on the space required. Anything, that is computable.

      > Does it have any import on real algorithms?

      It depends on the constants, but if the constants are good, you can have VMs for example that make memory usage smaller.

    • kadoban 11 hours ago

      > Does it have any import on real algorithms?

      As-in algorithms you'll care about if you're a programmer and not overly interested in theory? Not really, no.

    • 11 hours ago
      [deleted]
  • 7 hours ago
    [deleted]
  • baruchel 3 days ago
  • bluenose69 9 hours ago

    Here's a quote from the SciAm article: "Technically, that equation was t/log(t), but for the numbers involved log(t) is typically negligibly small."

    Huh?

    • asimpletune 6 hours ago

      I think this means that while Log grows to infinity, it does that so slowly that it can often be treated as if it were a coefficient. Coefficients are ignored in big O notation, hence the negligibly small comment.

    • burnt-resistor 7 hours ago

      Maybe I'm missing context, but that sounds like O(n) or Ω(n).

    • fwip 6 hours ago

      t/log(t) is 'closer to' t than it is to sqrt(t) as t heads toward infinity.

      e.g:

          4/log2(4) = 4/2 = 2
          sqrt(4) = 2
      
          2^32/log2(2^32) = 2^32/32 = 2^27
          sqrt(2^32) = 2^16
      • tgv 4 hours ago

        In case someone doesn't like the proof by example, here's a hint: sqrt(t) = t / sqrt(t).