V8 Garbage Collector

(wingolog.org)

60 points | by swah 4 hours ago ago

20 comments

  • ZeroConcerns 3 hours ago

    Interesting article! One thing that made me literally LOL was the fact that several exploits were enabled via a Google "style recommendation" that caused on-heap length fields to be signed and thus subject to sign-extension attacks.

    The conversation-leading-up-to-that played out a bit like this in my head:

    Google Engineer #1: Hey, shouldn't that length field be unsigned? Not like a negative value ever makes sense there?

    GE#2: Style guide says no

    GE#1: Yeah, but that could easily be exploited, right?

    GE#2: Maybe, but at least I won't get dinged on code review: my metrics are already really lagging this quarter

    GE#1: Good point! In fact, I'll pre-prepare an emergency patch for that whole thing, as my team lead indicated I've been a bit slow on the turnaround lately...

    • dbdr 2 hours ago

      Quote from their style guide:

      > The fact that unsigned arithmetic doesn't model the behavior of a simple integer, but is instead defined by the standard to model modular arithmetic (wrapping around on overflow/underflow), means that a significant class of bugs cannot be diagnosed by the compiler.

      Fair enough, but signed arithmetic doesn't model the behavior of a "simple integer" (supposedly the mathematical concept) either. Instead, overflow in signed arithmetic is undefined behavior. Does that actually lead to the compiler being able to diagnose bugs? What's the claimed benefit exactly?

      • dzaima 6 minutes ago

        A sanitizer or static analysis or any other tool can unconditionally give you a warning/error on signed integer overflow. Whereas that's invalid for unsigned integers as they have well-defined behavior, and things depend on said overflow (hashing, bitwise magic, temporary wrapping that unwraps later, etc).

        Ideally there'd be a third type for unsigned-non-wrapping-integer (and llvm even supports a UB-on-unsigned-wrapping flag for arith ops in its IR that largely goes unused for C/C++), but alas such doesn't exist. Half-relatedly, this previously appeared as a discussion point on Linux (though Linus really did not like the concept of multiple unsigned types and as such it didn't go anywhere iirc).

      • sltkr 8 minutes ago

        Tools like UBsan [1] can detect integer overflow in debug builds, and are used internally at Google to run automated tests.

        So if you use a signed integer, there is a chance that overflows are caught in tests.

        1. https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html

      • gf000 2 hours ago

        I believe some logic behind may be that you can't recognize an overflow has happened with unsigned, but with signed you can recognize over and underflows in certain cases by simply checking if it's a non-negative number.

        At least I believe Java decided on signed integers for similar reasons. But if it's indeed UB in C++, it doesn't make sense.

        • alpinisme 42 minutes ago

          It’s the opposite in cpp: unsigned integer overflow is undefined but signed overflow is defined as wrapping

          • sltkr 2 minutes ago

            No, it's the opposite. UNSIGNED overflow wraps around. SIGNED overflow is undefined behavior.

            This leads to fun behavior. Consider these functions which differ only in the type of the loop variable:

                int foo() {
                    for (int i = 1; i > 0; ++i) {}
                    return 42;
                }
                
                int bar() {
                    for (unsigned i = 1; i > 0; ++i) {}
                    return 42;
                }
            
            If you compile these with GCC with optimization enabled, the result is:

                foo():
                .L2:
                    jmp     .L2
            
                bar():
                    mov     eax, 42
                    ret
            
            That is, foo() gets compiled into an infinite loop, while the loop in bar() is eliminated instead. This is because the compiler may assume only in the first case that i will never overflow.
          • debugnik 14 minutes ago

            Did you mix up unsigned and signed by mistake? Because in C and C++, the wrapping one is unsigned and the here-be-dragons-on-overflow one is signed.

    • Leszek 2 hours ago

      The signed length fields pre-date the sandbox, and at that point being able to corrupt the string length meant you already had an OOB write primitive and didn't need to get one via strings. The sandbox is the new weird thing, where now these in-sandbox corruptions can sometimes be promoted into out-of-sandbox corruptions if code on the boundary doesn't handle these sorts of edge cases.

  • whizzter an hour ago

    I don't envy these engineers having to trace through corruptions and other issues related to moving GC's, just keeping a simple regular toy GC from blowing up can be hard enough sometimes (Maybe they have some better tools, but memory corruptions are inherently prickly to debug).

  • NeutralForest 2 hours ago

    It's an interesting article because tech articles rarely revisit the past for what kind of decisions were made and why. Thanks! Also always cool to see a Wingo article because I get exposed to a field I know very little about (how garbage collection works).

  • maartin0 3 hours ago

    What does FTE stand for?:

    > From what I can tell, there have been about 4 FTE from Google over this period

    • jlokier a few seconds ago

      It stands for "Full Time Equivalent".

      It's a measure of time spent working on something, to standardise comparisos of work capacity and acknowledge that it's not always full time, especially when aggregating the time from different people. One full time person == 1 FTE.

      For example if you work 20 hours a week on project A and 20 hours on project B, then project A will count your contribution as 0.5 FTE while you're assigned to that project.

      If you also have two other people working on it full timee, and a project manager working 1 day a week on it, then project A will count the contribution from all three of you as 2.7 FTE. (2.7 = 0.5 + 2 + 0.2).

    • layer8 4 minutes ago
    • kannanvijayan 3 hours ago

      Full Time Employee

      • DanielHB 36 minutes ago

        Is this a codeword for "not contractor"? I heard that at google contractors are second class citizens.

        • ColonelPhantom 4 minutes ago

          I think FTE is mostly used as a 'unit'. E.g. if two people work on something 50% of the time, you get one "FTE-equivalent", as there is roughly one full-time employee of effort put in.

          Though in this context it just seems to be the number of people working on the code on a consistent basis.

          • layer8 3 minutes ago

            The “E” in “FTE” actually already stands for “equivalent”.

    • comonoid an hour ago

      FTE is a TLA.