Mark–Scavenge: Waiting for Trash to Take Itself Out

(inside.java)

166 points | by vips7L 8 months ago ago

48 comments

kibwen 8 months ago

> Modern garbage collection assumes that the weak generational hypothesis holds and that most objects die young

Aside, I'm curious how first-class support for value types and Go-style stack allocation via escape analysis changes the value proposition of the generational hypothesis. If we hypothesize that short-lived objects are local to a single function scope (and thus eligible for stack allocation either explicitly via a value type or heuristically via escape analysis) then it might completely upend the generational hypothesis and make it so that relatively more long-lived objects are getting heap-allocated. Surely someone's done some studies on this?

[-]

masklinn 8 months ago

> it might completely upend the generational hypothesis

The generational hypothesis is about object lifetime, and that doesn't change.

It does change the relevance of the generational hypothesis to garbage collection.

> Surely someone's done some studies on this?

The go team has, and that's why go doesn't have a generational GC. The complexity of adding generational support, especially in a mutation-based language (so needing memory barriers and the like) was found not to benefit when a significant fraction of the newborn objects don't even reach the youngen. See https://github.com/golang/go/discussions/70257#discussioncom... from the current discussion of adding opt-in ad-hoc support for memory regions.

pron 8 months ago

Stack allocation doesn't make a big difference. To see this, consider the working set of a server application vs. the number of threads. Assuming a stack size circa 1MB, thousands of threads are needed to account for a significant portion of the working set. Go and Java's user-mode threads certainly make it more interesting, but not enough to dominate heap objects (arena allocation, which is particularly nice in Zig) is a different matter, and can have a bigger impact).

The main reason for Java's Project Valhalla is not stack allocation, as it has too little impact, but arrays-of-structs on the heap. These matter not so much for their memory-management aspect, but the rate of cache-misses.

ithkuil 8 months ago

If you flip the argument on its head you can frame it as: since most objects die young it's very likely they will stay on the stack and thus it makes sense to invest in an allocation-site optimizer that will put the object on the heap only if static escape analysis says it may escape the lexical scope.

uluyol 8 months ago

You might be interested in this talk: https://go.dev/blog/ismmkeynote

DarkNova6 8 months ago

But stack allocated objects are not part of the heap and therefore not even part of Garbage Collection? And afaik stack allocation is already done for objects which don't escape a method.

[-]

masklinn 8 months ago

Yes, but that’s the point: objects which don’t escape are pretty much all young objects. So by this process the stack captures a significant fraction of the young generation, that young generation never reaches the heap and this is never under consideration by the GC.

Essentially the stack is a form of younggen. It is not as complete (as there are things which must be heap allocated) but because it is, it reduces the benefits of a generational GC… without having much impact on its costs and complexity.

Depending on work load, that competition can be sufficient to make a generational GC net negative.

[-]

pron 8 months ago

1. How significant that portion is may be lower than you think.

2. Stack allocation adds complexity that actually can adversely affect performance. The main problem is that for stack objects to behave more like heap objects, you need to be able to reference them. References into the stack make user-mode threads less flexible. For example, Go takes a significant hit for Go-native interop in goroutines, whereas Java doesn't pay that cost.

3. Why do you, as a user, care if the GC is more complex?

[-]

masklinn 8 months ago

> How significant that portion is may be lower than you think.

And it’s probably higher than you think.

> References into the stack make user-mode threads less flexible. For example, Go takes a significant hit for Go-native interop in goroutines, whereas Java doesn't pay that cost.

This has nothing to do with on-stack structure, it has to do with Go not using C-compatible stacks. If this was an actual issue C would take a hit when calling C.

> Why do you, as a user, care if the GC is more complex?

GC complexity impacts its cost and performances. Generational GCs have more overhead than non-generational ones. That cost needs to be reclaimed by avoiding full collections or the generational GC is a net loss.

And that is what the go team observed on many workloads when they experimented with making Go’s GC generational.

[-]

pron 8 months ago

> And it’s probably higher than you think.

Possibly, but the question is does it negate the need for a generational GC? Judging by Go's poor GC performance compared to Java -- it doesn't.

> This has nothing to do with on-stack structure, it has to do with Go not using C-compatible stacks

Sure, but any user-mode thread implementation will not be using C-compatible stacks (if it wants to be efficient). Java doesn't use C-compatible stacks yet it doesn't take the hit Go does.

> GC complexity impacts its cost and performances. Generational GCs have more overhead than non-generational ones.

But in this case (ZGC) it doesn't come at a cost.

> And that is what the go team observed on many workloads when they experimented with making Go’s GC generational.

Go's GC is several generations behind the GCs in Java. It is also isn't compacting. Java had such a GC -- CMS, which served the JDK well for many years -- until the next generation (G1) and the next-next generation (ZGC) were developed, at which point it made little sense to keep a non-moving collector.

DarkNova6 8 months ago

Thanks for the answer. But is this actual behaviour for the GCs of the JDK? I was certain that at the very least Hotspot makes use of stack allocation as much as possible.

But perhaps the JDK GCs don't care so much about the stack because that is already dealt by the JVM a step prior? In any case, there will likely still be young objects allocated in the heap and this new algorithm might prove useful.

But you can tell I am far from an expert here.

[-]

masklinn 8 months ago

> Thanks for the answer. But is this actual behaviour for the GCs of the JDK? I was certain that at the very least Hotspot makes use of stack allocation as much as possible.

Not really, java has some escape analysis but it's very limited in its ability to stack allocate as it can't put entire structures (objects) on the stack, it only works if the compiler manages to scalar-replace the object (https://shipilev.net/jvm/anatomy-quarks/18-scalar-replacemen...) and that has somewhat restricted applicability (https://pkolaczk.github.io/overhead-of-optional/). The behaviour I'm talking about is mostly that of Go, as it is much more capable of stack allocating, and specifically has a non-generational GC because in testing they found generational GCs had a very variable impact depending on workload (rather than a universally or near universally positive impact).

[-]

jerven 8 months ago

There was a microsoft prototype for more stack allocation in OpenJDK (https://archive.fosdem.org/2020/schedule/event/reducing_gc_t...). I recall that being put on hold because of how it would interact with project Loom fast stack copying. But I don't know the current status.

GO has a non moving GC and I understand, that the cost of introducing safe moving GC is considered high. If one has a moving GC which the serious java one's are read/write barriers are already required, especially if they are concurrent like ZGC, C4 or Shenadoah. ZGc, C4 and Shenadoah all started out as non generational GC implementations, which gained them later, because in most cases they do increase performance/reduce overhead.

Valhalla makes objects denser, and reduces overhead of identity which is great. Reducing the difference in memory layout between java objects and nested go structs.

Go with arena's reduce the GC de-allocation costs. Something that the ZGC team is looking at in relation to loom/virtual threads. (but I can't find the reference for that right now)

DarkNova6 8 months ago

Thank you for these excellent sources!

pfdietz 8 months ago

I wonder if architectural support could be added to reduce the cost of recording modification information.

DamonHD 8 months ago

AFAIK not nearly enough stuff gets caught by escape analysis - and thus stack allocated - to make a difference.

[-]

masklinn 8 months ago

Go has much more significant stack allocation capabilities, most notably it has no problem allocating entire structs on the stack so doesn't need scalar replacement, which falls over if you breathe on it (https://pkolaczk.github.io/overhead-of-optional/).

According to https://github.com/golang/go/discussions/70257#discussioncom...

> the weak generational hypothesis does not hold up well with respect to heap-allocated memory in many real-world Go programs (think 60-70% young object mortality vs. the 95% typically expected)

fweimer 8 months ago

Historically, Hotspot's escape analysis only resulted in avoided heap allocations (via scalar replacement) if all uses were inlined. I don't think this has changed.

kibwen 8 months ago

I'm open to believing that this is true, but some real numbers would be nice. Surely it wouldn't be a hugely invasive change to fork the Go compiler, change the stack allocation check to `return false`, and then measure the overhead of the garbage collector on real Go programs with stack allocation both enabled and disabled.

[-]

DarkNova6 8 months ago

The reason escape analysis is not "good enough" is why we have project Valhalla trying to bring Value Types into the JVM.

I don't have numbers at hand, but I remember the JDK Expert Group talking about this extensively in the past and why they deferred bringing Value Types for such a long time. They hoped complex enough EA can get rid of indirections and heap allocations but it just wasn't powerful enough, even with all advances throughout the years.

DamonHD 8 months ago

I may have been answering past you - I am thinking of Java running on the JDK here. And indeed I may be out of date also.

[-]

cempaka 8 months ago

Yeah in Java land specifically I think the question would become, "does the generational hypothesis still hold up once we have Valhalla and a much larger share of short-lived objects are stack allocated as value types?" but of course it may be years until the ecosystem reaches that point, if ever.

[-]

neonsunset 8 months ago

As shown by C#, it will generally continue to be relevant since both primarily use JIT compilation with ability to modify code at runtime which can violate inter-procedural escape analysis assumptions leading to heap allocations of the objects that are passed down to the callees (there is work scheduled for .NET 10 to address this, at least for AOT compilation where interproc analysis conclusions will be impossible to violate).

You can craft a workload which violates the hypothesis by only allocating objects that live for a long time but both JVM and .NET GC implementations are still much faster designs than Go's GC which prioritizes small memory footprint and consistent latency on low allocation traffic (though as of .NET 9, SRV GC puts much more priority on this, making similar tradeoffs).

[-]

cempaka 8 months ago

> ability to modify code at runtime

Would Java's moves towards "integrity by default" mean that this could be ruled out in more cases?

[-]

neonsunset 8 months ago

Reading through the JEP again it does not seem to be related - it is about deprecating unsafe APIs that the executed code itself uses. OpenJDK also has "partial escape analysis" where the object that only conditionally escapes can still be placed on the stack/scalar replaced.

I'm not privy to the exact APIs that OpenJDK exposes but in .NET the main limitation around escape analysis that spans multiple methods is the fact that CoreCLR has re-JIT API which allows to perform a multitude of actions like attaching a profiler or a debugger to a live application and forcing the runtime to deoptimize a particular method for debugging, or modifying the implementation and re-JITting the result. Debug codegen is very different especially around GC liveness tracking and escape analysis that builds on top of it - it means that even debug code would have to uphold stack-allocated nature of such object in some way, complicating the design significantly. In addition to that, debuggers and other instrumentation may observe object references that would have otherwise not escaped local scope.

This creates an unfortunate situation where the above almost never happens in production, but ignoring it and "just stack-allocating anyway" would lead to disastrous breakage for all sorts of existing instrumentation. Because Go does not have to deal with this constraint, it can perform interproc escape analysis without risk - whether a pointer escapes or not can be statically proven. For NativeAOT, .NET could approach this problem in the same way, but paraphrasing compiler team: "We would like to avoid optimizations only available for one of the target types be it JIT or AOT, and only supporting AOT would not benefit the majority of the .NET users".

There is, however, recognition that more comprehensive interproc analysis could be very beneficial, including the EA which is why it is planned to work on it in .NET 10:

- https://github.com/dotnet/runtime/issues/108931 IPA framework

- https://github.com/dotnet/runtime/issues/104936 Stack allocation enhancements

[-]

pjmlp 8 months ago

Integrity by default is what the OpenJDK folks are pushing for so that any API that can break runtime assumptions, has to be explicitly allowed, so that they can actually make use of performance optimizations that would otherwise be too risky if anyone at any time could violate them.

cempaka 8 months ago

Yeah there's a JEP around deprecating access to sun.misc.Unsafe, but that's part of a larger effort including Jigsaw to push the Java ecosystem in the direction of modular builds, where more invariants are assumed to hold (e.g. " 'final' fields are actually final") unless explicitly opted out for each module. I would assume the lack of such guarantees in the status quo wreaks a lot of havoc with EA.

Profiling and debugging would be separate considerations -- I'm really not sure what limitations those impose on the JVM JIT.

DamonHD 8 months ago

Ahem --- JDK => JVM!

eikenberry 8 months ago

Is there a language that makes this explicit, allocates the variables on the stack via compiler enforced notation?

[-]

fanf2 8 months ago

C, C++, Rust, Zig, …

[-]

im3w1l 8 months ago

I'm not sure about Zig, but what strikes me about the others are that they change the type of the object.

T vs T*.

It would be kind of neat if you could have an annotation on the variable instead that didn't change the type.

You could in C++ make a reference T& which is almost that - references behave identically to the real thing. But I think freeing the memory backing a reference is probably quite questionable?

eikenberry 8 months ago

What's Zig's notation for it?

[-]

masklinn 8 months ago

Not doing anything, same as the other 3.

Heap allocation is what requires requesting memory from an allocator.

[-]

eikenberry 8 months ago

Right.. been using GC languages to long. Everything allocated to the stack unless it is specifically allocated to the heap. I was stuck thinking about what the keywords were. But I'm still curious.. are there any GC languages that have a way to specify if something should go on the stack or the heap?

[-]

masklinn 8 months ago

> are there any GC languages that have a way to specify if something should go on the stack or the heap?

I think only in the sense that some GCd language have value (stack) types as a separate hierarchy from heap types? E.g. structs in C# or Swift are stack-allocated (and value-identity, and copied) whereas classes are heap-allocated.

Adding that for java is one of the goals of Project Valhalla I believe.

stassats 8 months ago

Common Lisp. The dynamic-extent declaration allows for stack allocation.

neonsunset 8 months ago

C# (.NET in general) :)

Well, variables cannot be forced to stack specifically. They are placed in the "local scope". And that would usually be either stack or CPU registers - thinking in stack only is a somewhat flawed mental model.

Both C# and F# complicate this by supporting closures, iterator and async methods which capture variables placing them in a state machine box / display class instead which would be located on the heap, unless stack-allocated by escape analysis (unlikely because these usually cross method boundaries).

However, .NET has `ref structs` (or, in F#, [<Struct; IsByRefLike>] types) which are subject to lifetime analysis and can never be placed on the heap directly or otherwise.

8 months ago

[deleted]

m463 8 months ago

This is fascinating.

The idea of mark/scavenge is pretty cool for page-based allocation and deallocation.

Also real-time or maybe latency-intolerant systems, which basically can't block for garbage collection. You could probably keep scavenging and stay ahead of things without impacting things in a black and white way.

[-]

AtlasBarfed 8 months ago

This may be naive, but gc has now been around since the java days at mass scale (that is I'm calling java the first mass market GC computing system, just go with it for now) for about 30 years.

The big lie of GC is that you never need to worry about memory management. The insidious lie is that you have no control when you do need it

Rust went with a new extreme: no GC, but the compiler does a memory verification and a LOT of obscure complex notation and keywords

So ... I'm wondering if there is a happy medium, where you can have keywords that strongly hint to a GC how to manage memory (eg arena/permanent, thread group bound, local/young, etc) that will be easier than rust.

Maybe we could have multiple heaps defined (a big problem with GC heaps is the bigger they are the harder it is for the bigO of the management algorithms to scale, so nice they aren't linear bigO).

I'm thinking of something like Cassandra where tables and even subsections of tables don't need their bloom filters, commit logs, atc all intermingled in a single heap, to the great detriment of heao performance.

Instead, provide annotations to partition the heaps or run multiple heaps

Of course this is a massive complexity rabbit hole/slippery slide.

[-]

m463 8 months ago

I remember a long time ago using the apache portable runtime and pool malloc was pretty interesting. I believe it probably was designed around web requests - which could disappear at any time. The idea was that you allocated a memory pool, and did all your mallocs within that pool. If there was a problem, you could easily clean up and free the entire pool.

I think this idea could work really well the the memory system - a pool could be a number of underlying pages and gc would be easy, and letting go of the pages would be too.

funcDropShadow 8 months ago

> Maybe we could have multiple heaps defined

As far as I understand, Erlang has one heap per actor. The cost is that messages have to be serialized to pass from one actor to another. But it means there are lots of independent, often tiny heaps which don't need very complicated GCs.

jddj 8 months ago

Zig's allocators run along this line of thought

codetrotter 8 months ago

I was wondering if the .java TLD belonged to the Java island, or to Oracle or to someone else.

https://icannwiki.org/.java

It’s Oracle.

fweimer 8 months ago

How does this relate to Shenandoah's region selection logic? Doesn't it have similar behavior?

A4ET8a8uTh0 8 months ago

What a time to be alive, I read it the opening fully expecting to see an open source automated trashcan that takes itself to the curb each Monday. I was disappointed to find out it is about an actual garbage collection algorithm.

[-]

garblegarble 8 months ago

I know this is completely off-topic, but you might be interested in this[1] YouTuber who did something not far off that...

1: https://www.youtube.com/watch?v=VhYEOG9LOIk

throwaway81523 8 months ago

I thought comparing reachability to "true" liveness had been studied in the past. Ben Zorn may have worked on it in the 1980s: https://www2.eecs.berkeley.edu/Pubs/TechRpts/1989/5313.html

Also: https://people.cs.umass.edu/~emery/pubs/04-17.pdf