Why tail-recursive functions are loops

(kmicinski.com)

58 points | by speckx 3 days ago ago

47 comments

  • LegionMammal978 28 minutes ago

    Several people in this thread are saying that tail recursion is superior to imperative iteration in that it explicitly specifies which variables may change in each iteration (assuming a broadly immutable language).

    To the contrary, I'd argue that immutability isn't the only alternative to universal mutability: many newer imperative languages (such as Rust) have various forms of controlled mutability, where one must explicitly declare which variables may be modified by imperative assignments.

    IME, controlled mutability captures just about all the benefit of immutable languages, without requiring any special data structures or sufficiently-smart compiler analyses to ensure good performance. I've never really understood the desire for tail-recursive versions of iterative algorithms, except for a prior commitment to functional programming.

  • hinkley 3 hours ago

    Practically the day after I learned about tail recursion in CS class, I learned that almost all recursive calls can be translated to iteration, that in many cases the iterative version is easier to scan, is as fast if not faster, and that they can usually handle much much larger inputs than recursion due to avoiding stack overflow.

    Tail recursion is meant to fix the latter. But what we mean to happen and what actually happens ain't ever exactly similar.

    Tail recursion IME is a bigger foot gun than relying on someone to add a new conditional branch at the end of a block in an iterative algorithm without fucking it up in the process. And iteration responds generally better to Extract Function. And while I can think of counter cases easily enough, in the large iteration is less work and less vigilance. And you cannot scale a project up without the vigilance requirement amortizing basically to 0 per line of code.

    • tombert 3 hours ago

      In my mind, the biggest advantage to using tail recursion over vanilla loops is the ability to keep using persistent data structures without any (apparent) mutation.

      At least in theory, a tail recursive call will be converted into a dumb jump, so there shouldn't be a performance penalty, but since from your code's perspective you're passing in the stuff for the next iteration, you can keep using the pretty and easy-to-reason-about structures without creating any kind of mutable reference.

      I'm not 100% sold on tail recursion in a broader sense, but at least with Clojure's loop/recur stuff it is kind of cool to be able to keep using persistent data structures across iterations of loops.

      • weitendorf 2 hours ago

        What makes tail recursion "special" is that there exists a semantically equivalent mutable/iterative implementation to something expressed logically as immutable/recursive. [0]

        Of course, this means that the same implementation could also be directly expressed logically in a way that is mutable/iterative.

        func pow(uint base, uint n): n == 0 ? return 1 : return n * pow(base, n-1)

        is just

        func pow(uint base, uint n): uint res = 1; for(i=0; i<n; i++){ res *= n} return res

        There is no real "advantage" to, or reason to "sell" anybody on tail call recursion if you are able to easily and clearly represent both implementations, IMO. It is just a compiler/runtime optimization, which might make your code more "elegant" at the cost of obfuscating how it actually runs + new footguns from the possibility that code you think should use TCO actually not (because not all immutable + recursive functions can use TCO, only certain kinds, and your runtime may not even implement TCO in all cases where it theoretically should).

        As an aside, in C++ there is something very similar to TCO called copy-elision/return-value-optimization (RVO): [1]. As with TCO it is IMO not something "buy into" or sell yourself on, it is just an optimization you can leverage when structuring your code in a way similar to what the article calls "continuation passing style". And just like TCO, RVO is neat but IMO slightly dangerous because it relies on implicit compiler/runtime optimizations that can be accidentally disabled or made non-applicable as code changes: if someone wanders in and makes small semantic to changes to my code relying on RVO/TCO for performance they could silently break something important.

        [0] EXCEPT in practice all implementation differences/optimizations introduce observable side effects that can otherwise impact program correctness or semantics. For example, a program could (perhaps implicitly) rely on the fact that it errors out due to stack overflow when recursing > X times, and so enabling TCO could cause the program to enter new/undesirable states; or a program could rely on a functin F making X floating point operations taking at least Y cycles in at least Z microseconds, and not function properly when F takes less than Z microseconds after enabling vectorization. This is Hyrum's Law [2].

        [1] https://en.wikipedia.org/wiki/Copy_elision#Return_value_opti...

        [2] https://www.hyrumslaw.com/

        • gleenn an hour ago

          I would argue having the parameters that change during the loop be explicit is a very nice advantage. Agree that the things can be equivalent in terms of execution but the readability and explicitness, and the fact that all the parameters are listed in the same place is great.

          • weitendorf 38 minutes ago

            Agreed. Some people really like FP a lot, and I think it's underrated that the kinds of functions where TCO is applicable tend to be so simple that they are not really that inelegant when expressed imperatively. My true opinion is that relying on TCO is usually choosing ideological adherence to FP (or "code that looks cooler") over reliability/performance/communication.

            That said, just as I'd expect experienced C++ programmers to be able to recognize others' code using RVO and be careful not to restructure things to break it, I'd concede that experienced FP programmers might be unlikely to accidentally break others' TCO. It's just that ad absurdum you cannot expect every developer to be able to read every other developers' mind and recognize/workaround all implicit behavior they encounter.

      • ramchip an hour ago

        I think you're confusing mutation and variable reassignment?

        • tombert 44 minutes ago

          I'm saying that if I do a regular loop, something needs to explicitly mutate, either a reference or a value itself.

          `for (var i = 0; i < n, i++)`, for example, requires that `i` mutate in order to keep track of the value so that the loop can eventually terminate. You could also do something like:

              var i = new Obj(); 
              while (!i.isDone()) {
                  //do some stuff
                  i = new Obj(); 
              }
          
          There, the reference to `i` is being mutated.

          For tail recursion, it's a little different. If I did something like Factorial:

              function factorial(n, acc) {
                  if (n <= 1) return acc;
                  return factorial(n - 1, acc * n);
              }
              
          Doing this, there's no explicit mutation. It might be happening behind the scenes, but everything there from the code perspective is creating a new value.

          In something like Clojure, you might do something like

              (defn vrange [n]
                (loop [i 0 v []]
                  (if (< i n)
                    (recur (inc i) (conj v i))
                    v)))
          
          (stolen from [1])

          In this case, we're creating "new" structures on every iteration of the loop, so our code is "more pure", in that there's no mutation. It might be being mutated in the implementation but not from the author's perspective.

          I don't think I'm confusing anything.

          [1] https://clojure.org/reference/transients

      • charcircuit an hour ago

        >without creating any kind of mutable reference.

        The parameter essentially becomes a mutable reference.

        • tombert 44 minutes ago

          No disagreement on that but that's an implementation detail and from the coder's perspective there's no mutable references.

          • weitendorf 36 minutes ago

            From the coder's perspective, there are no mutable references only if the coder does not really rely on or care about the possibility that their code uses TCO. If they actively want TCO then they definitely care about the performance benefits they get from underlying mutability/memory reuse/frame elision.

          • charcircuit 42 minutes ago

            From the coder's perspective it is. There's just different syntax for assigning to them.

            • tombert 36 minutes ago

              No, from the coder's perspective it looks like I'm passing an argument into a function. I suppose you could say that arguments in general are "mutable references", and I guess that's not necessarily "untrue", but that's not generally how I visualize it.

              If I pass a persistent structure into a tail call, it looks like I'm passing a copy into there, in the same way that I might if I did something like `Math.abs(myValue * 3)`; the compiler converts it to a mutable reference but I see it as the same as passing a value.

    • bjoli 2 hours ago

      I am in the other camp. I prefer tail recursion and recursion over loops. However: For the simple cases it can and should probably be abstracted away like the racket for loops or my own goof-loop [1].

      I just feel that a recursive calls makes state much more clear, but then again I am no fan of mutation in general. In my old python days I think a good 60% of my bugs were due to me being bad at managing state.

      [1] https://rikspucko.koketteriet.se/bjoli/goof-loop

      • iamevn 6 minutes ago

        I'm in the same boat, recursion tends to be easier for me to reason about because I'm expressing the problem in terms of some base case that incoming parameters are being reduced to rather than some condition that an iterative approach is working towards.

    • javcasas 2 hours ago

      Recursion deals with recursive data structures, and iteration deals with "plain old" data structures.

      When you use one to process the other, you get into trouble. For example, you need to manage a stack to do iteration on your binary tree. Or you need to construct slices to recurse on your arrays.

      It's all about ergonomics.

      • dreamcompiler 2 hours ago

        This is overly simplistic.

        Is a binary tree recursive from the perspective of a type declaration? Yes.

        Is it recursive from the point of view of search? Depends. If you're doing depth-first search, then you need a stack and a recursive algorithm seems natural. If you're doing breadth-first search, then you need a queue and the algorithm is less obviously recursive.

        In either case the program could be expressed recursively or as an explicit loop. When a stack is needed, the underlying hardware always provides automatic stack management so recursion feels natural in that case.

        When a queue is needed it's more of a tossup since hardware and programming languages don't generally provide automatic queue management, so you're going to need to manage that data structure manually anyway. In which case whether you choose to use recursion or not is more of a personal preference.

        • javcasas an hour ago

          So a tree is easier to do recursing vs iterating in some cases, whereas in other cases recursion and iteration have roughly the same level of complexity.

          Kinda shows that recursive data structures are usually better dealt with recursion. Worst case you end up with the same level of difficulty as iteration.

          • busterarm 35 minutes ago

            My career is practically built on the fact that other people 99% of the miss the simplest representation of the data/problem.

            Just look at myriad ways that people implement something like checking win conditions in Rock, Paper, Scissors. Switch statements, conditional hell, etc.

            ...you can just do something as simple as stick Rock, Scissors, Paper into a circular list and compare to the next element. It's practically readable as plain English. No programming skill required.

            If you need to code golf you can use modulo arithmetic, for a different kind of "simple", but then you lose readability.

    • CalChris 2 hours ago

      I must have missed this class. How does one convert a recursive descent parser into an iterative one?

      • Jtsummers 2 hours ago

        Every recursive algorithm can be made into an iterative algorithm if you use an explicit stack instead of the implicit call stack. It's not always cleaner.

        In tail recursive algorithms, there is no stack, it's just a straight loop.

          def foo(state, acc): # if acc is needed
            if base-condition(state): return acc
            return foo(next(state), f(state, acc))
        
        Is simply:

          def foo(state):
            acc = initial
            while not base-condition(state):
              acc = f(state, acc)
              state = next(state)
            return acc
        
        If it's not tail recursive you introduce a stack, for instance a DFS on a binary tree:

          def search(node, val):
            if node is None: return False # empty tree, only need this check once
            stack = [node]
            while stack:
              n = stack.pop()
              if n.val == val: return True
              if n.right: stack.push(n.right)
              if n.left:  stack.push(n.left)
            return False
        
        Note the insertion order is reversed from the recursive calls in a typical DFS. We want left to be searched first and then its children and then we "recur" back to right and deal with its children, so we need to push right into the stack first and then left.

        When you have multiple mutually recursive functions (as is likely the case in a recursive descent parser) then things become more complicated, but it's still feasible.

        • fellowniusmonk an hour ago

          When TCO recursion was first developed it was very explicitly called out as a syntactic and structurally improved GOTO but still fundamentally a GOTO that could take params.

          Recursion isn't physically real, any book that teaches the abstraction before explaining either the call stack (for non-TCO recursion) or in the GOTO context is a book actively trying to gatekeeper CS and weed out readers. (Not that any of those old pascal turbo/boreland books from the 90s actually shipped code that compiled.)

          I had several people on HN of all places try to "teach me" recursion after this simple line inside a larger comment:

          > It's like acting like recursion is physically real (it's not) when it smuggles in the call stack.

          Recursion IS real abstractly. It's just not physically real, it was developed before we knew how DNA/RNA encoding handles the same issues in a more performant way.

          • achierius 44 minutes ago

            > It's just not physically real, it was developed before we knew how DNA/RNA encoding handles the same issues in a more performant way.

            That was a sharp left turn -- how do you figure DNA/RNA are relevant here? I feel like iteration pre-dates our modern understanding of RNA in particular (though I could be mistaken) so I struggle to imagine how DNA/RNA were particularly informative in this regard.

        • LegionMammal978 an hour ago

          Sometimes the messy translation into an explicit stack and dispatch loop is necessary, if you want to pause the calculation, serialize the current state, and reconstitute it later. (E.g., if you want to add disk-saved checkpoints, a frequent hassle in some of my projects.) Coroutines can get you the equivalent of pausing and resuming for a recursive routine, but I'm not aware of any language that lets you serialize the call stack.

          • Jtsummers an hour ago

            > I'm not aware of any language that lets you serialize a whole call stack.

            That's basically what continuations provide. Scheme, SML, and others provide them.

            • LegionMammal978 an hour ago

              Continuations allow an inactive call stack to sit around in memory. But do any of those languages let you save a continuation to a file and resume it in a different execution of the program, without massive contortions to the code? That's what I mean by serialization.

      • kylec 2 hours ago

        You can do it by managing your own stack, but I would argue that doing so makes the algorithm LESS easy to scan and has its own sets of problems.

      • electroly 2 hours ago

        Same way as any other algorithm: with an explicit heap-allocated stack. I have a naive parser where I built operator precedence into the grammar as nested productions instead of using an algorithm like shunting yard. An input of ((((((1)))))) blew the stack. Converted it to an explicit stack and it was fine; deep for the call stack was not deep for my own heap-allocated stack. Nasty code, though--I think this serves as one of the counterexamples to the idea that the recursive code gets simpler when turning it into iteration. The original OP was talking more specifically about tail recursion, which a recursive descent parser won't be.

        • bjoli an hour ago

          There are many languages out there that can grow the stack these days. If you have a language that can do tail recursion, it is really a very neat complement. In lisps it means being able to write a list building function in a direct consing way, and still being faster than the TCP-way.

      • pmg101 2 hours ago
      • jandrese 2 hours ago

        By making your own stack and using loops.

    • Spivak 2 hours ago

      I'm in your camp, recursive code is hard for the next reader, which might be you, to understand. It's a bit too magic looking for my taste. If you're looping it should look like a loop, if you're pushing onto a stack you should get to see the stack. It's also safe against modifications which might silently break the tail-call nature of it until you blow out the stack later. It also gives you much saner and more debuggable stack traces because all the state is in the current frame.

      • busterarm 41 minutes ago

        I never quite understood this recursion is hard/magic sentiment. Maybe it's because I started my CS education when it was still taught out of math departments or because it started with actually using programming languages to do algebra. Then again, the FP Kool-Aid is practically the blood in my veins at this point I guess.

  • tombert 3 hours ago

    I'm a pretty big functional programming nerd and I want to like tail recursion, but I honestly kind of feel like I agree with Guido on it, which is that it kind of breaks the typical stack-trace patterns.

    I have kind of grown to prefer Clojure's loop/recur construct, since it gives you something more or less akin to tail recursion but it doesn't pretend to be actually recursive.

    • dreamcompiler 2 hours ago

      Clojure does it this way because the JVM stupidly doesn't support tail call optimization.

      It is true that TCO messes up your stack traces. It is also true that loops mess up your stack traces, because they don't even create a stack trace. In many languages that support TCO, TCO can be turned off for debugging and enabled for production, so Guido's characterization of the issue is ridiculous.

      • titzer 15 minutes ago

        > It is also true that loops mess up your stack traces

        No, because an indirect tail-call eliminates the calling frame. After, it looks like the caller of that function called a function it did not call. In fact, with tail-calls a whole pile of steps gets telescoped down to nothing[1]. This is not the case with a loop, it doesn't jump to itself indirectly. Loops don't jump into the middle of other loops in other functions. We can still follow the chain of control flow from the start of the main function, through (non-tail) calls, i.e. the stack, through the start of the loop, and then from some backedge in the loop body we're looking at.

        Tail-calls are absolutely harder to debug in general than loops.

        [1] In the limit, if the entire program was transformed into CPS with tail-calls everywhere, the original execution stack is now strewn throughout the heap as a chain of closures.

      • tombert 40 minutes ago

        I know that's the origin of why Clojure does its manual loop/recur thing, but I've grown to kind of prefer its explicitness. There's no ambiguity on whether it will TCO; it simply won't compile if it's not tail-call.

        I don't agree that being able to turn off TCO in debugging is enough to call Guido's characterization "ridiculous". I often have to debug running production systems, and those are (hopefully) not deployed with debug compilations.

        You're not wrong about loops not creating a stack trace; I guess what bothers me (and maybe Guido) is that with loops there's no expectation of a stack trace, while with a function call there is.

  • munchler 2 hours ago

    Every developer should know how to write a tail-recursive function and understand why it is equivalent to a loop. That said, tail recursion is rarely needed in modern functional programming. For example, why write out a recursive function when a call to `fold` or `map` will do the same thing?

    • kmicinski 2 hours ago

      I agree with you--that's a topic I will definitely cover in my blog, too. You make a good point: I know some folks who worked at big financial orgs, writing hundreds of thousands of lines of code, and never wrote general-recursive functions (only used simple recursors like foldl).

    • Eji1700 2 hours ago

      > For example, why write out a recursive function when a call to `fold` or `map` will do the same thing?

      Yeah this was a big help when I started F#. Basically "if you're using the rec keyword, you're probably missing something" and hell that even goes for a lot of uses of fold, from the beginners perspective.

    • xdavidliu 2 hours ago

      it's not entirely true that it does the same thing: even if it gives the same result. For many programming languages, fold and map can only act on non-lazy data structures, so require O(N) memory for the data that needs to be folded over, while tail-recursive functions usually only use O(1) memory, even stack memory.

      Notable exceptions to the above are python3 with generators, which I believe truly use O(1) memory with map and fold. Haskell has lists that are lazy by default, but if you fold or map over them, it still "forces the thunk" for each element and thus you still end up using O(N) memory.

  • mehulashah an hour ago

    Tail recursion is beautifully deep and simple. It (and as a corollary CPS) makes clear what state matters to your loop (and function) and avoids extraneous variables in loops as well as implicit unneeded state in functions. It also makes it easier to show the correctness of your loops. Sure, there are other functional language constructs like fold and map, if your problem is amenable to them. Tail recursion is more general and simpler.

  • odyssey7 16 minutes ago

    High-profile places where this is sadly not implemented include:

    - V8 JavaScript Engine

    - Python

  • wagwang an hour ago

    Normal recursion is just a loop and a stack, turns out, if you can optimize recursion without a stack, it just becomes a loop.

  • username3 an hour ago

    > I wrote up a detailed blog post about tail call optimization in Elixir/Erlang and its performance. The TLDR; sort of is that none tail call optimized recursive functions (body-recursive) can be faster and more memory efficient than TCO functions. This is something that I never thought before, that TCO is always faster seems to be a common misconception. My example is also not contrived - it’s an implementation of map.

    https://pragtob.wordpress.com/2016/06/16/tail-call-optimizat...

  • aryonoco an hour ago

    For what it’s worth, in F#, tail recursive functions are turned into equivalent loops by the .NET CLR.

    I actually like the compromise. I get to write safe functional code while getting all the benefits of a highly optimised iterative operation.

  • tylerhou an hour ago

    I disagree with the title; loops are tail-recursive functions, but tail-recursive functions are not loops (in the sense that squares are rectangles, but rectangles are not squares).

    It is true that every tail recursive function can be converted into a semantically equivalent loop via a transformation like CPS (which the author mentions). However, for mutually tail-recursive functions, this conversion loses control flow information. This is because after the CPS transformation, calls to the other function become calls to a continuation; this call usually must be implemented as an indirect jump. On the other hand, mutually tail-recursive functions can call each other with direct/statically-known jumps.

    This loss of information might appear trivial, but in practice it has some important consequences. Classic examples are interpreter loops. It is well-known that computed gotos can result in modest to large speedups for interpreters [1]. The reason why is that computed gotos create an indirect jump per opcode, so a branch predictor can take advantage of correlations between opcodes. For example, looking at Python disassembly, the header of a standard range for loop compiles down to three opcodes: GET_ITER, FOR_ITER, STORE_FAST in sequence [2]. A branch predictor can recognize that the target of the "FOR_ITER" indirect jump will likely be the "STORE_FAST" instruction pointer; it cannot predict this in the naive implementation where jumps for all instructions are "merged" into a single indirect jump / switch at the top of the loop body. In this case, computed goto is effectively equivalent to a CPS transformation whose closures require no storage on the heap.

    Suppose, however, we know even more information about the instruction sequence; for example, we know ahead of time that every FOR_ITER opcode will be followed by a STORE_FAST opcode. We could completely replace the indirect jump with a direct jump to the instruction pointer for the STORE_FAST opcode. Because modern branch predictors are very good, this will have about the same performance in practice as the computed goto loop.

    However, consider the limiting case where we know the entire instruction sequence beforehand. If we write our interpreter as many mutually tail-recursive functions, with one function for every instruction, an optimizing compiler can replace every indirect call with a direct (tail-recursive) call to the function that implements the next instruction's opcode. With a good enough optimizer / partial evaluator, you can turn an interpreter into a compiler! This is known as the first Futamura projection [3].

    To see an example of this in action, I wrote a prototype of a Brainfuck compiler via the Futamura projection; it uses LLVM as a partial evaluator [4]. The main interesting function is `interpret`, which is templated on the program counter / instruction. That is, `interpret` is really a family of mutually tail-recursive functions which statically call each other as described above. For short Brainfuck programs, the LLVM optimizer is able to statically compute the output of the Brainfuck program. (The one in the Godbolt link compiles to a loop, likely because LLVM does not want to unroll the mutual recursion too much.) You can play around with different Brainfuck programs by modifying the `program` string on line 5.

    [1] https://eli.thegreenplace.net/2012/07/12/computed-goto-for-e...

    [2] https://godbolt.org/z/rdhMvPo36

    [3] https://en.wikipedia.org/wiki/Partial_evaluation#Futamura_pr...

    [4] https://godbolt.org/z/WY4j931jf