phkmalloc

(phk.freebsd.dk)

70 points | by fanf2 4 days ago ago

9 comments

Tuna-Fish 3 minutes ago

> Because I kept the “metadata” away from the chunks themselves, and because I used a binary “buddy” layout for sub-page-sized allocations, I could detect some of the most common mistakes.

> First I thought “We’re not having any of that” and made phkmalloc abort(2) on any wrong usage. Next time I rebooted my laptop fsck(8) aborted, and left me in single user mode until I could fix things with a floppy disk.

I love everything about this anecdote.

throw0101d 9 hours ago

For those unaware, "PHK" is:

* https://en.wikipedia.org/wiki/Poul-Henning_Kamp

Amongst other things (including jails), he invented the MD5crypt algorithm (originally for FreeBSD) as an alternative to the original DEScrypt of Unix:

* https://en.wikipedia.org/wiki/Crypt_(C)#MD5-based_scheme

Nowadays probably most well-known for creating Varnish:

* https://en.wikipedia.org/wiki/Varnish_(software)

bogeholm 6 hours ago

Nice guy by the way! Met him on a train home from work once. I was working on my computer, glanced left and saw someone with a red beard running a tiling WM on some real boy system. Since we were in the silent zone, I wrote

    phk?

In a text editor - got a nod, and we shook hands :)

grogers an hour ago

> Reasonable people who’s opinions I respect, have called this hack anything from “brilliant” to “an afront to all morals”. I think it is OK.

It's definitely a clever hack given the constraints of malloc, but this anecdote made me smile very widely.

In addition to multi-core becoming the norm causing it to be less performant than alternatives, I imagine the "sanity checking" aspects of phkmalloc were subsumed by things like ASAN.

elteto 3 days ago

"... spending an hour over breakfast, chatting with Dennis Ritchie about device nodes and timekeeping in early UNIX kernels"

Wow, what an incredible experience!

nasretdinov 3 days ago

Nice article! I wonder if now, with all the NUMA stuff and processors with hundreds of cores something changed sufficiently enough that it warrants another complete redesign similar to what happened in the article

[-]

karmakaze 3 days ago

A lot of the article talked about swap which wouldn't be a concern in normal operation of most production servers--cache/memory locality still matters but not as dramatically. Back when I was managing bare-metal MySQL servers we were getting scaling to NUMA memory (with jemalloc/tcmalloc). There was an initial performance degradation that required a lot of fine-tuning even working around how the same motherboard/CPUs would initialize core affinities differently. A new problem was deadlocking of large transactions that touched multiple buffer instances. Mind you this wasn't a clean codebase that put a lot of thought into avoiding deadlocks (up until then).

At the time I didn't think much about how the allocators could help as they're constrained to the ABI. Writing in Zig with custom allocators for everything would. The only mysql NUMA setting was innodb_numa_interleave=ON which wasn't very good but not a lot worse than trying harder.

toast0 9 hours ago

Much of that is well addressed by one allocator arena per cpu, and either pinning threads to cpus or at least having a high threshold to move threads across NUMA boundaries.

If you have a lot of cross thread memory use, maybe you need something to help with allocate on core X, free on core Y and the cross core communication that causes (maybe that's already in place?).

There's more memory overhead that way, but large core count systems tend to have a lot of memory too.

[-]

masklinn 7 hours ago

> Much of that is well addressed by one allocator arena per cpu, and either pinning threads to cpus or at least having a high threshold to move threads across NUMA boundaries.

Note that that can have an awkward effect: if the thread gets parked (either entirely, or just stops calling the allocator because it has reached its steady state), the allocator may never have the opportunity to release that thread's memory. IIRC mimalloc suffers from this issue, you need to call an allocator specific API to tell it about the regime change.