If you enjoyed this, or if you need more control over some memory allocations in Go, please have a look at this package I wrote. I would love to have some feedback or have someone else use it.
It bypasses the GC altogether by allocating its own memory separately from the runtime. It also disallows pointer types in allocations, but replaces them with a Reference[T] type, which offers the same functionality. Freeing memory is manual though - so you can't rely on anything being garbage collected.
These custom allocators in Go tend to be arena's intended to support groups of allocations which live and die together. But the offheap package was intended to build large long-lived datastructures with zero garbage collection cost. Things like large in-memory caches or databases.
I've been doing some performance tuning in Go lately to really squeak performance, and ended up with a very similar arena design except using byte slices for buf and chunks instead of unsafe pointers. I think I tried that too and it wasn't any faster and a whole lot uglier, but I'll have to double check before saying that with 100% confidence.
A couple other easy wins -
if you start with a small slice and find some payloads append large amounts, write your own append that preemptively is more aggressive in cap bumping before calling the builtin append.
unsafe.String is rather new and great for passing strings out of byte slices without allocating. Just read the warnings carefully and understand what you're doing.
The append(slice,slice2...) code is all well and good but its going to hit into the expansion quite often. When you know the second append is going to be large its often faster to allocate a new slice with the right size and no elements and then append both slices to it, then there is no expansion costs the values just get copied in and it also produces less garbage to be collected.
I have done a few other things in the past where I had sliceLike's which took two slices and point to one and then the other and a function mapped to the indices as if they were appended, costs a bit on access but saves on the initial allocation if you don't intend to iterate through the entire thing or only do so once.
The base library in go does not do much for optimising this sort of thing, its not a dominate operation in most applications so I can see why we don't have more advanced data structures and algorithms. You have to be quite heavily into needing different performance characteristics to outperform the built ins with custom code or a library. All parts of Go's simplicity push that seems to assume people don't need anything else other than Array Lists and hash maps.
> All parts of Go's simplicity push that seems to assume people don't need anything else other than Array Lists and hash maps.
you can see some of this in the work of the progenitors of Go.
quoth pike style, from rob pike:
Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures.
The following data structures are a complete list for almost all practical programs:
array
linked list
hash table
binary tree
Of course, you must also be prepared to collect these into compound data structures. For instance, a symbol table might be implemented as a hash table containing linked lists of arrays of characters.
Off topic, but I love the minimap on the side -- for pages where I might be jumping around the content (long, technical articles, to refer back to something I read earlier but forgot) -- how can I get that on my site? Way cool.
tl;dr for anyone who may be put off by the article length:
OP built an arena allocator in Go using unsafe to speed allocator operations up, especially for cases when you're allocating a bunch of stuff that you know lives and dies together. The main issue they ran into is that Go's GC needs to know the layout of your data (specifically, where pointers are) to work correctly, and if you just allocate raw bytes with unsafe.Pointer, the GC might mistakenly free things pointed to from your arena because it can't see those pointers properly. But to make it work even with pointers (as long as they point to other stuff in the same arena), you keep the whole arena alive if any part of it is still referenced. That means (1) keeping a slice (chunks) pointing to all the big memory blocks the arena got from the system, and (2) using reflect.StructOf to create new types for these blocks that include an extra pointer field at the end (pointing back to the Arena). So if the GC finds any pointer into a chunk, it’ll also find the back-pointer, therefore mark the arena as alive, and therefore keep the chunks slice alive. Then they get into a bunch of really interesting optimizations to remove various internal checks and and write barriers using funky techniques you might not've seen before
Interesting stuff! For folks building off-heap or arena-style allocators in Go—how do you usually test or benchmark memory safety and GC interactions in practice?
> Go prioritizes not breaking the ecosystem; this allows to assume that Hyrum’s Law will protect certain observable behaviors of the runtime, from which we may infer what can or cannot break easily.
If this assertion is correct, then effectively Go as a language is an evolutionary dead end. Not sure if I would Go fascinating in this case.
They introduced generics into the language whilst maintaining compatibility and breaking changes between language versions is painful in large code bases.
Small, but significant point: you can easily avoid the new behavior. IIRC, if you had a pre-1.22 project, and didn't change anything, it still compiles as before. So if you relied on that behavior (which would be very weird, but who knows), backwards compatibility is still there for you.
Just a quick meta note. This article is really lengthy, I don't have time to read this level of detail for the background. For example the "Mark and Sweep" section takes up more than 4 pages on my laptop screen. That section starts more than 5 pages into the article. Is this the result of having AI help to write sections, and as a result making it too comprehensive? It's easy to generate content, but the editing decisions to keep the important parts haven't been made. I just want to know the part about the Arena allocator, I don't need a tutorial on garbage collection as well.
This is an interesting comment. The author, has been consistently making lengthy posts since 2021 - there's no reason to believe he is using AI as it doesn't look like his writing style has changed.
However, the reader has changed, and readers are notoriously lazy. Now instead of a "tl;dr", the reader might incredulous assume the writer is using AI. This is an interesting side effect.
FWIW: The Mark and Sweep section is specifically about Go's internal implementation of Mark and Sweep and is relevant context for the design decisions made in his arena. It is not generic AI slop about Mark and Sweep GCs.
This article is a fun read.
If you enjoyed this, or if you need more control over some memory allocations in Go, please have a look at this package I wrote. I would love to have some feedback or have someone else use it.
https://github.com/fmstephe/memorymanager
It bypasses the GC altogether by allocating its own memory separately from the runtime. It also disallows pointer types in allocations, but replaces them with a Reference[T] type, which offers the same functionality. Freeing memory is manual though - so you can't rely on anything being garbage collected.
These custom allocators in Go tend to be arena's intended to support groups of allocations which live and die together. But the offheap package was intended to build large long-lived datastructures with zero garbage collection cost. Things like large in-memory caches or databases.
I've been doing some performance tuning in Go lately to really squeak performance, and ended up with a very similar arena design except using byte slices for buf and chunks instead of unsafe pointers. I think I tried that too and it wasn't any faster and a whole lot uglier, but I'll have to double check before saying that with 100% confidence.
A couple other easy wins -
if you start with a small slice and find some payloads append large amounts, write your own append that preemptively is more aggressive in cap bumping before calling the builtin append.
unsafe.String is rather new and great for passing strings out of byte slices without allocating. Just read the warnings carefully and understand what you're doing.
The append(slice,slice2...) code is all well and good but its going to hit into the expansion quite often. When you know the second append is going to be large its often faster to allocate a new slice with the right size and no elements and then append both slices to it, then there is no expansion costs the values just get copied in and it also produces less garbage to be collected.
I have done a few other things in the past where I had sliceLike's which took two slices and point to one and then the other and a function mapped to the indices as if they were appended, costs a bit on access but saves on the initial allocation if you don't intend to iterate through the entire thing or only do so once.
The base library in go does not do much for optimising this sort of thing, its not a dominate operation in most applications so I can see why we don't have more advanced data structures and algorithms. You have to be quite heavily into needing different performance characteristics to outperform the built ins with custom code or a library. All parts of Go's simplicity push that seems to assume people don't need anything else other than Array Lists and hash maps.
> All parts of Go's simplicity push that seems to assume people don't need anything else other than Array Lists and hash maps.
you can see some of this in the work of the progenitors of Go.
quoth pike style, from rob pike:
Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures.
array linked list hash table binary treeOf course, you must also be prepared to collect these into compound data structures. For instance, a symbol table might be implemented as a hash table containing linked lists of arrays of characters.
https://doc.cat-v.org/bell_labs/pikestyle
Off topic, but I love the minimap on the side -- for pages where I might be jumping around the content (long, technical articles, to refer back to something I read earlier but forgot) -- how can I get that on my site? Way cool.
Funnily enough, this isn’t a real minimap: it’s actually a copy of the main content, zoomed out to look tiny and scroll-synced.
https://github.com/lrsjng/pagemap looks viable.
But it uses a canvas and redraws.
While the post's website renders a copy of the page in a <div> and scroll it. As you can check by inspecting the div.
No idea how plug-n-playable it is, but the source seems to be self contained: https://github.com/mcy/mcy.github.io/blob/master/public/js/m...
tl;dr for anyone who may be put off by the article length:
OP built an arena allocator in Go using unsafe to speed allocator operations up, especially for cases when you're allocating a bunch of stuff that you know lives and dies together. The main issue they ran into is that Go's GC needs to know the layout of your data (specifically, where pointers are) to work correctly, and if you just allocate raw bytes with unsafe.Pointer, the GC might mistakenly free things pointed to from your arena because it can't see those pointers properly. But to make it work even with pointers (as long as they point to other stuff in the same arena), you keep the whole arena alive if any part of it is still referenced. That means (1) keeping a slice (chunks) pointing to all the big memory blocks the arena got from the system, and (2) using reflect.StructOf to create new types for these blocks that include an extra pointer field at the end (pointing back to the Arena). So if the GC finds any pointer into a chunk, it’ll also find the back-pointer, therefore mark the arena as alive, and therefore keep the chunks slice alive. Then they get into a bunch of really interesting optimizations to remove various internal checks and and write barriers using funky techniques you might not've seen before
Interesting stuff! For folks building off-heap or arena-style allocators in Go—how do you usually test or benchmark memory safety and GC interactions in practice?
> Go prioritizes not breaking the ecosystem; this allows to assume that Hyrum’s Law will protect certain observable behaviors of the runtime, from which we may infer what can or cannot break easily.
If this assertion is correct, then effectively Go as a language is an evolutionary dead end. Not sure if I would Go fascinating in this case.
They also changed maps' iteration order to be random, rather than insertion order.
It's quite a leap from "certain observable behaviors of the runtime" cannot change to Go is a dead-end.
Go regularly makes runtime changes and language changes, see https://go.dev/blog/. Some highlights:
- Iterators, i.e., range-over-function
- Generics
- For loops: fixed variable capture
- Optimized execution tracing
- Changing the ABI from stack-based to register-based.
They introduced generics into the language whilst maintaining compatibility and breaking changes between language versions is painful in large code bases.
They broke the foreach loop behavior in 1.22, mainly to make it match what people expected.
https://go.dev/blog/loopvar-preview
Small, but significant point: you can easily avoid the new behavior. IIRC, if you had a pre-1.22 project, and didn't change anything, it still compiles as before. So if you relied on that behavior (which would be very weird, but who knows), backwards compatibility is still there for you.
Just a quick meta note. This article is really lengthy, I don't have time to read this level of detail for the background. For example the "Mark and Sweep" section takes up more than 4 pages on my laptop screen. That section starts more than 5 pages into the article. Is this the result of having AI help to write sections, and as a result making it too comprehensive? It's easy to generate content, but the editing decisions to keep the important parts haven't been made. I just want to know the part about the Arena allocator, I don't need a tutorial on garbage collection as well.
This is an interesting comment. The author, has been consistently making lengthy posts since 2021 - there's no reason to believe he is using AI as it doesn't look like his writing style has changed.
However, the reader has changed, and readers are notoriously lazy. Now instead of a "tl;dr", the reader might incredulous assume the writer is using AI. This is an interesting side effect.
FWIW: The Mark and Sweep section is specifically about Go's internal implementation of Mark and Sweep and is relevant context for the design decisions made in his arena. It is not generic AI slop about Mark and Sweep GCs.