Memory Subsystem Optimizations

(johnnysswlab.com)

48 points | by mfiguiere 4 days ago ago

15 comments

  • jeffbee 4 days ago

    I find this site interesting because of its mixture of good topic choice and inaccurate details. I think it's generated by LLMs.

    Specifically catching my eye in this collection of articles is the highly misleading one about huge pages. All recent Linux distributions have THP set to "madvise" by default. Many programs exploit THP automatically, including any Go program and any JVM program with a flag set. The tcmalloc shared library that comes with Ubuntu is probably the single worst way to experience huge pages. Mi-malloc is the better choice if you must preload a library, but there are even better choices. Explicit huge pages are little-used because managing them is annoying. Finally, latest Linux kernels have features called "folios"and "mTHP" that make THP even smoother.

    • hairband_dude 4 days ago

      It's been around for a while: https://web.archive.org/web/20230602031306/https://johnnyssw.... Not sure if the newer articles are LLM/AI assisted though.

    • kev009 4 days ago

      The huge page article is sequitur with official documentation like https://docs.redhat.com/en/documentation/red_hat_enterprise_.... THP can only issue up to 2MB pages on amd64 so it's not necessarily a silver bullet for large persistent consumers like a DB or GC language and worth knowing about the older methods.

      To me they look like marketing posts, but they aren't void of effort or meaning as a quick intro to various topics.

      • almostgotcaught 4 days ago

        > sequitur

        I love the malapropisms on hn because it always reeks of "I'm trying so hard to sound smart" lol. FYI non-sequitur doesn't mean "non-sequential" it means "illogical" (and thus sequitur doesn't mean "in sequence"). Also both of words these are nouns not adjectives.

        • kev009 3 days ago

          Is this a performance art where you do the thing you accuse? "malapropism" is a five dollar word if "sequitur" is. The use tracks with the Latin or English definitions, what does any of this have to do with sequential? I imply the article is probably not simple AI slop because it follows official documentation. Add "a" in front of it if your worth is determined by neckbearding a borrowed verb that can only noun in the lease.

          • almostgotcaught 3 days ago

            > The use tracks with the Latin or English definitions

            No it doesn't

            > sequitur noun : the conclusion of an inference : consequence

            https://www.merriam-webster.com/dictionary/sequitur

            > "malapropism" is a five dollar word

            It is of course but I spent my $5 wisely because my use is syntactically and semantically correct.

            • kev009 3 days ago

              > "the conclusion of an inference"

              Inference: article tracks accurately to other sources and reality Conclusion: no indication of simple AI slop.

              Fail and derail which has no bearing on the original topic of memory management nor whether AI is in play. Take the neckbeard behavior back to reddit.

              • almostgotcaught 3 days ago

                > neckbeard

                Hmmm didn't realize proper grammar and syntax was neckbeard territory. TIL!

    • foltik 4 days ago

      > Mi-malloc is the better choice if you must preload a library, but there are even better choices.

      What’s a better choice?

      • jeffbee 4 days ago

        Linking the allocator into your program when you build it, instead of overriding just malloc and free at runtime. Then you can choose between jemalloc, mi-malloc, TCMalloc, or whatever you please, and get better features such as C++ sized delete. Rust makes this easy with for example "use tcmalloc_better::TCMalloc".

  • grayxu 3 days ago

    While this guide covers roughly 80% of the material, it remains a high-level overview that lacks depth. I can't confirm if it was LLM-generated, but the content is undeniably superficial. Real-world production environments are far more complex; for instance, despite other users mentioning hugepages and TLB, there is no discussion of critical issues like TLB shootdown.

  • adsharma 4 days ago

    18 blog posts and very limited mention of NUMA and HT?

    https://adsharma.github.io/more-performance-hints/

  • matu3ba 4 days ago

    The blog looks nice, especially having simple to understand numbers. To me the memory subsystem articles are missing the more spicy pieces like platform semantics, barriers, de-virtualization (latter discussed in an article separate of the series). In the other articles I'd also expect debugging format trade-offs (DWARF vs ORC vs alternatives), virtualization performance and relocation effects briefly discussed, but could not find them. There are a few C++ article missing: 1. cache-friendly structures in C++, because standard std::map etc are unfortunately not written to be cache-friendly (only std::vector and std::deque<T> with high enough block_size), ideally with performance numbers, 2. what to use for destructive moves or how to roll your own (did not make it into c++26).