> Ideally in a web application, aside from some in-memory caches, no object allocated as part of a request should survive longer than the request itself.
This is one of those areas where out of process caching wins. In process caching has a nasty habit of putting freshly created objects into collections that have survived for days or hours, creating writes in the old generation and back references from old to new.
Going out of process makes it someone else’s problem. And if it’s a compiled language with no or a better GC, all the better.
One of the challenges I see in general is that languages don't have enough capabilities to express intent of lifetimes / control flow. What I mean by that is that there is a significant difference between spawning a thread with the intention of joining, allocating memory with the intention of only lasting to the end of the request etc. vs spawning a permanent background thread or stashing away an object into a global cache.
This is starting to really become a problem in the observability space and async locals. Node.js for instance currently will keep async locals around for too long of a time because they are propagated everywhere. For instance if you call `console.log` in a promise you will leak an async local forever.
Next.js famously keeps around way too many async locals past the request boundary for caching related reasons.
A solution would be to have a trampoline to call things through that make it explicit that everything happening past that point is supposed to "detach" from the current flow. An allocator or a context local system can then use that information to change behavior.
Agreed. We have some facility for out of process caching (node local memcached), and I frequently have to argue with colleagues that it's generally preferable to in-process caching.
A similar strategy is to serialize the object and store it in-process but off heap. This is useful when the values are private to the process, and/or they don't need to survive a crash, and/or you need to avoid the network overhead. Access times are often 100x-1000x faster.
This is something I've been thinking about for a while. What if we create a language where all object references are explicitly specified to be either request-scoped or application-scoped. Don't allow application-scoped objects reference request-scoped objects. Allow to manually upgrade a reference from request-scoped to application-scoped if needed.
That would allow us to have ephemeral per-request heaps which are torn down after every request at once. In-request garbage collections are super-fast. Application-scoped objects are never collected (i.e. no major collections).
Wouldn't this simple model solve most problems? Basically, a very simple equivalent to Rust's lifetimes tailored to web services without all the complexity, and much less GC overhead than in traditional GC systems.
Given that Rust seems to be the generalized solution to this problem, would a viable prototype just be a Rust HTTP server with a Ruby interpreter embedded in it? Write the code in some kind of Ruby DSL which then feeds back into Rust?
I ask because I have embedded Ruby in applications before, and I'm looking for an excuse to do it in Rust.
This could also be a case for non-managed objects in the same process. APIs aren't typically very friendly, but I would expect they could be made so, especially if it was for a dedicated use-case like caching.
Rust is emerging as a major contender for HTTP/gRPC backend services.
Actix, Axum, sqlx, diesel, and a whole host of other utilities and frameworks make writing Rust for HTTP just as easy and developer efficient as Golang or Java, but the code will never have to deal with GC.
It's easy to pull request scoped objects into durable caches.
I bet the engineers at Instagram were unaware of pythons performance profile when they chose it, you should let them know that they should just switch to a different language.
Meta is just a small startup though, they probably don't have enough resources nor the skills to switch to a better language even after they've heard the gospel.
I don't know if you're joking or not but that is exactly true. Meta went as far as creating their own PHP engine and then a new PHO compatible language because they didn't have the resources to switch from PHP.
Instagram is presumably in the same position. Switching language is basically impossible once you have a certain amount of code. I'm sure they were aware of the performance issues with Python but they probably said "we'll worry about it later" when they were a small startup and now it's too late.
Well, Facebook also created their own hacked up version of PHP (called Hack) that's presumably easier to migrate PHP to.
Hack is actually surprisingly pleasant, basically about the best language they could have made starting from PHP. (I know, that's damning with faint praise. But I actually mean this unironically. It has TypeScript vibes.)
I was excited about Hack when it came out. Unfortunately PHP took just enough from it to kill it. I gave up on it once Composer stopped supporting it, after backwards compatibility with PHP no longer became a goal.
IMHO Hack's best feature was native support for XHP... which (also unfortunately) isn't something PHP decided to take.
I only used Hack when I was very briefly working for Facebook. (And I used PHP once before nearly 20 years ago by now for some web site I had 'inherited', back when PHP was truly an awful language)
Did what? Rewrite Instagram into another language? Do you have any source on this?
Last time I checked they're working on improving Python performance instead (yes I know they forked it into Cinder, but they're trying to upstream their optimizations [0]). Which is very similar to what we're doing at Shopify.
Of course 100% of Instagram isn't in Python, I'm certain there's lots of supporting services in C++ etc, but AFAIK the Instagram "frontend" is still largely a Python/Django app.
The joke is that if Meta thought that replacing all the Python code they have with something else was worth it, they'd have done it already.
> The joke is that if Meta thought that replacing all the Python code they have with something else was worth it, they'd have done it already.
"Worth it" depends on both how much performance improvement you get, and how hard it is to replace. Did you consider maybe the rewriting effort is so humongous that it is not worth doing despite large performance improvements? Thus making the joke not funny at all...
That's exactly the joke though. Every time Ruby (or Python) is discussed on HN we get the same old tired question of "why don't they just rewrite in Rust".
But that's some silly engineer tunnel vision, squeezing the very last bit of performance out of a system isn't a goal in itself. You just need it to be efficient enough that it cost you significantly less to run that the amount of revenue it brings you.
I can bet you that moving off Python must have been pitched dozens and dozens of time by Meta engineers, but deemed not worth it, because execution speed isn't the only important characteristic.
So yes, I find it hilarious when HN commenters suggests companies should rewrite all their software into whatever is seen as the most performant one.
It's usually dismissed because companies think short term, and switching languages is a project with huge short term disadvantages and huge long term advantages.
1. 10% performance improvement at Instagram could lead to many millions of revenue "instantly". It is not laughable at any company.
2. It won't be a 5000% performance improvement. Facebook uses its own fork of Python that is heavily optimized. Probably still far from C++, but you should be thinking about languages like Java when talking about performance.
"Better" is a very subjective term when discussing languages, and I hope such discussions can be more productive and meaningful.
Yeah it's definitely welcome, but even if it is double the performance (doesn't seem to be quite there in my experience) fast languages are still 25-50x faster. It's like walking twice as fast when the alternative is driving.
Well, it really depends on whether that alternative is open to you, and at what cost.
So eg lots of machine learning code is held together by duct tape and Python. Most of the heavy lifting is done by Python modules implemented in (faster) non-Python languages.
The parts that remain in Python could potentially be sped up by migrating them, too. But that migration would likely not do too much for the overall performance, but still be pretty expensive to do (in terms of engineering effort).
For organisations in these kinds of situations, it makes a lot of sense to hope for / contribute to a faster Python. Especially if it's a drop-in replacement (like Python 3.12 is for 3.9).
What makes really me hopefully is actually JavaScript: on the face of it, JavaScript is actually about the worst language to have a fast implementation. But thanks to advances in clever compiler and interpreter techniques, JavaScript is one of the decently fast languages these days. Especially if you are willing to work in a restricted subset of the language for substantial parts of your code.
I'm hoping Python can benefit from similar efforts. Especially since they don't need to re-invent the wheel, but can learn from the earlier and ongoing JavaScript efforts.
(I myself made some tiny efforts for CPython performance and correctness. Some of them were even accepted into their repository.)
> Facebook uses its own fork of Python that is heavily optimized.
So likely the 5000% improvement is no longer possible because they already did multiple 10% improvements? I don't know how this counters the original point.
All clues point to FB going this route because they had too much code already in PHP, and not because the performance improvement would be small.
In any case, "facebook does it" is not a good argument that something is the right thing to do. Might be, might not be. FB isn't above wrong decisions. Else we should buy "real estate" in the metaverse.
> no object allocated as part of a request should survive longer than the request itself
So I've spent a lot of time doing Hack (and PHP) as well as Java, Python and other languages. For me, as far as serving HTTP requests goes, Hack/PHP are almost the perfect language. Why?
1. A stateless functional core. There's no loading of large libraries, which is an issue with Python and Java in certain paradigms. The core API us just functions that mean startup costs for a non-stateful service are near zero;
2. The model, as alluded to the above quote, basically creates temporary objects and then tears everything down at the end of the request. It's so much more difficult to leak resources this way as opposed to, say, a stateful Java or C++ server. PHP got a lot of hate unjustly for its "global" scope when in fact it's not global at all. "Global" in PHP/Hack is simply request-scoped and pretty much every language offers request-scoping;
3. There's no threading. Hack, in particular, uses a cooperative async/await model. Where you'd normally create threads (eg making a network request), that's handled by the runtime to make an async/await call out of non-blocking I/O. You never have to deal with mutexes, thread starvation, thread pools, lock ups, etc. You never want to deal with that in "application" or "product" code. Never.
So this article is specific to Ruby-on-Rails, which obviously still has persistent objects, hence the need for GC still.
How Facebook deals with this is kinda interesting. Most FB product code uses an in-memory write-through graph database (called TAO, backed to MySQL). There is an entity model in Hack on top of this that does a whole bunch of stuff like enforcing privacy (ie you basically never talk to TAO directly and if you do, you're going to have to explain why that's necessary, and you absolutely never talk to MySQL directly).
But the point is that persistent entities are request-scoped as well (unlike RoR I guess?).
If the core API is just functions, how do stateful applications handle connections to this persistent storage? Can you still have a connection pool, or does every request pay the extra latency to start a new connection and re-authenticate?
i think they do now but originally one of the reasons people use mysql with languages that are connection per request was that mysql connections were very cheap.
And why pgbouncer used to be considered an essential part of a Postgres web-app-backend deployment — if your business layer didn’t pool and reuse connections, then having an external shim component that pools and reuses connections would solve a lot of the impedance mismatch.
We're actually hoping to get mmtK included in Ruby to be able to use more advanced GCs. Medium term we hope to use Immix: https://bugs.ruby-lang.org/issues/20860
I think we should be careful when correlating heap size with how long the collection should take.
Also, I really want ZGC in .NET runtime, but I don't think I'll ever get support for it first party. There's some kind of principled ideologue holdout situation going on over at Microsoft. Every time I get into it with one of their engineers I'm sent to some impotent "please may I have a temporary GC exemption" API. All I want is it to do nothing. How hard is it to just not clean up the goddamn garbage? Give me a registry flag + env variable + cli arg all required at the same time if you're so worried someone might trip over it.
ZGC does not stand for zero. It stands for Z Garbage Collector. It's a next-generation GC implementation for OpenJDK that focuses on low pause time while supporting very large heap sizes. It does not "not collect garbage".
At the end of the day for anything performance-related you can just write code with manual memory management with RAII patterns via IDisposable on structs and get code that performs closely to C++ or Rust. It's also necessary to understand if this is a good idea at all - most of the time you do want to just rely on GC.
> ZGC does not stand for zero. It stands for Z Garbage Collector.
Apologies - I was attempting to referring to "absolutely no" garbage collection path. I was thinking of Epsilon [0].
> It's also necessary to understand if this is a good idea at all - most of the time you do want to just rely on GC.
Assume we are building a cruise missile flight computer. I have enough ram for ~100 hours of flight if we never clean up any allocations. I only have enough fuel for 8 hours of flight on a good day. Why do I still need a garbage collector? All I need is a garbage generator. The terminal ballistics and warhead are the "out of band" aspects in this arrangement.
I think it only affects throughput at the limit at the 5% level. All the portfolio companies that implemented it got a net increase in performance as they avoid redlining their servers.
hard to smoke sub-millisecond pauses but there may be other axes where it is better. it used to be that people thought azul was better because it was generational but now zgc is as well. my guess is that c4 doesn't have enough of an edge at this point but happy to see benchmarks that prove otherwise.
> Ideally in a web application, aside from some in-memory caches, no object allocated as part of a request should survive longer than the request itself. Any object that does is probably something that should be eagerly loaded during boot, or some state that is leaking between requests. As such, any object promoted to the old generation during a request cycle is very unlikely to be immortal, so promoting it is wasteful.
So could each request clean up it's own garbage when it finishes, so then they should never need any global garbage collection?
That's pretty much what I'm hinting at at the end when I mention minor GC.
I don't think doing it after each request would be sensible, but counter intuitively, the time it takes to run GC isn't proportional to amount of garbage to collect, but to the number of live objects left (ignoring some minor things like finalizers).
So on paper at least we could run a minor GC for very cheap after each request, but there's likely some better heuristics given currently the median request already spent less than 1ms in GC, so after every requests might be overdoing it.
Also even if we were doing that, many requests would still have to run GC because they allocate more than there is memory available, so they need to clean their own garbage to continue, you can't delay GC indefinitely.
But at least now, endpoints that spend too much time in GC are responsible for their own demise, so engineers responsible for a given endpoint performance have a clear signal that they should allocate less, whereas before it could easily discounted as being caused by lots of garbage left over by another collocated endpoint.
Would it be possible for the allocator/GC to know what allocations are made within a request and make a generation for specifically for it? Allocations too big to fit would be made like usual
Since objects cannot be promoted to the old generation inside the request cycle, objects in the new gen are request allocated objects.
So if we were to eagerly trigger a minor GC after a request, we'd have very little objects to scan, and only need to sweep garbage, which is only a small fraction of time spent in GC.
They built a large codebase on a language that doesn't let you control memory, because that makes you "more productive". So just having Rails allocate a per-request arena that is asynchronously freed which would force the programmer not to have any objects that outlive the request, or just pre-allocating memory for a fixed amount of request handling per server instance, or whatever allocation behavior you want to do that is generally possible in C/C++/Zig/Rust/Odin/etc, requires hacking on the language itself. Which means your changes have to go through the Ruby team first. Any additional changes would also need to go through them, which increases the cost of change. Then there is a permanent layer of indirection between your GC callbacks and the semantics of what those callbacks do. Instead of just writing out the custom allocators you want, because that's impossible. How depressing.
> Which means your changes have to go through the Ruby team first. Any additional changes would also need to go through them …
I do want to pick on this specifically - people can and should be patching open source projects they depend on and deploying them to production (exactly as described in the article). Something being in the language vs in “user” code should be no barrier to improving it.
There's a pretty huge difference between implementing a performance optimization that works in your use case, and upstreaming that optimization to be generally usable.
The latter is often orders of magnitude more work, and the existing solution is probably chosen to be well suited in general.
Patching a dependency comes with significant downstream costs. You need to carry the patch forward to new upstream versions . This implies remembering that the dependency was patched, extracting a patch from the existing changed code, and reapplying the patch, fixing comflicts, recompiling the now special version of that dependency and running tests, checking/updating required license notices accordingly.
This is in essence another form of technical dept.
I'm guessing that Zig, Rust, Oden, and "etc." didn't exist when they started the codebase. Now they need to keep moving in their imperfect state. I don't think anyone would start a large company on Ruby today. (They would on Python, though, which is equally unfortunate.)
Per request arenas sound super cool on paper, and work very well on system with clear constraints.
But if suddenly a request start allocating more than the arena can accommodate you're in a bit of a pickle. They're absolutely not a panacea.
Setting aside the challenge of refactoring the Ruby VM to allow this sort of arenas, they'd be a terrible fit for Shopify's monolith.
Ultimately, while it's a bit counter intuitive, GCs can perform extremely well in term of throughput. Ruby's GC isn't quite there yet, but still perform quite well and is improving every versions.
In Zig, at least, this isn't how arenas work. They're a wrapper around a backing allocator, so if the arena runs out of memory, then that means the process is out of memory, something no allocation strategy can fix (ignoring the fact that Zig returns a specific error when that happens, and maybe you can trigger some cache eviction or something like that).
It's easy to set them to retain a 'reasonable' allocated capacity when they get reset, for whatever value of reasonable, so big allocation spikes get actually freed, but normal use just moves a pointer back and reuses that memory.
I don't see Shopify harvesting a lot of value from a complete Zig rewrite, no. But arenas are basically ideal for the sort of memory use which web servers typically exhibit.
Well, yes, with a GC when your heap is full, you make space by getting rid of the garbage.
Also, with a good GC, allocating is most of the time just bumping a pointer, exactly like an arena, and the collection time is proportional to the number of live objects, which when triggered out of band is basically 0.
Hence why I think a well tuned GC really isn't that far off.
I really like Zig, and I’ve been thinking about using it for developing servers offering APIs over HTTP, using arenas bound to request lifetime. I think I would be comfortable developing like this myself. But all the devs in my org have only ever used managed-memory languages such as Java, C#, Python or JavaScript, which makes me hesitant, as I’m wondering about the learning curve, and of course the risk of use-after-free. Not something I would do anyway before Zig reaches 1.0.
I wrote substantial amounts of C, and Pascal/Delphi before that, before learning Zig, so you and I wouldn't see the same learning curve. That said, I found it straightforward to take up. Andrew Kelly places a great emphasis on simplicity in the sense Rich Hickey uses the term, so Zig has a small collection of complete solutions which compose well.
Now is a great time to pick up the language, but I would say that production is not the right place to do that for a programmer learning memory management for the first time. Right now we're late in the release cycle, so I'd download a nightly rather than use 0.13, if you wanted to try it out. Advent of Code is coming up, so that's an option.
Using a memory-managed language means you need to design a memory policy for the code. Zig's GeneralPurposeAllocator will catch use after free and double free in debug mode, but that can only create confidence in memory handling code if and when you can be sure that there aren't latent bugs waiting to trigger in production.
Arenas help with that a lot, because they reduce N allocations and frees to 1, for any given set of allocations. But one still has to make sure that the lifetime of allocations within the arena doesn't outlast the round, and you can only get that by design in Zig, lifetimes and ownership aren't part of the type system like they are in Rust. In practice, or I should say with practice, this is readily achievable.
At current levels of language maturity, small teams of experienced Zig developers can and do put servers into production with good results. But it's probably not time for larger teams to learn as they go and try the same thing.
I started programming in Pascal, C and C++, so personally I’m fine with manual memory management, especially with a language like Zig. I actually find it quite refreshing. I’m just wondering if it’s possible to “scale” this approach to a team of developers who may not have that past experience (having only worked with GCed languages) without ending in a code base littered with use-after-free errors.
And when the default arena size is often outgrown, you'll known from whatever diagnostics/logging/dashboard solution you are using. Which is incidentally also a great tool when optimizing per-request memory usage.
Being explicit about memory has many advantages, and is a strict requirement when scaling.
I don't think using Zig over Python is gonna have the biggest impact in making your next big company successful. It's a drop in the ocean compared to the quality of people you have to actually design and build it.
That is nonsense. If you can run your code on 10 servers instead of 1k servers, that is an insane time and money saver that could make or break a company.
For most startups the difference between Rust and Ruby is not 10 servers vs 1000 but rather between using 0.1% of a CPU or 1% of a CPU. A single server running Rails will easily scale to hundreds of thousands of daily users. Most companies never get that many users in the first place, and those that do will have the funds to afford rewriting the hottest paths in a more performant language.
I certainly don't claim to be an expert, but I have a hunch that getting to the point where performance becomes a significant factor (in the success or failure of a product) isn't going to be about the choice of language. I also think you're vastly underestimating the performance that good architects can get out of ANY (primary) language through good system design. Good design vs bad design makes the biggest difference in my experience, at least from a technical standpoint. Probably just nonsense though as you say.
Whatever works and you can find enough developers for.
The language (or the rest of the stack even) is rarely a barrier to success. What matters are a good idea, good motivation, and decent availability of competence.
> Ideally in a web application, aside from some in-memory caches, no object allocated as part of a request should survive longer than the request itself.
This is one of those areas where out of process caching wins. In process caching has a nasty habit of putting freshly created objects into collections that have survived for days or hours, creating writes in the old generation and back references from old to new.
Going out of process makes it someone else’s problem. And if it’s a compiled language with no or a better GC, all the better.
One of the challenges I see in general is that languages don't have enough capabilities to express intent of lifetimes / control flow. What I mean by that is that there is a significant difference between spawning a thread with the intention of joining, allocating memory with the intention of only lasting to the end of the request etc. vs spawning a permanent background thread or stashing away an object into a global cache.
This is starting to really become a problem in the observability space and async locals. Node.js for instance currently will keep async locals around for too long of a time because they are propagated everywhere. For instance if you call `console.log` in a promise you will leak an async local forever.
Next.js famously keeps around way too many async locals past the request boundary for caching related reasons.
A solution would be to have a trampoline to call things through that make it explicit that everything happening past that point is supposed to "detach" from the current flow. An allocator or a context local system can then use that information to change behavior.
You could do something like that in Rust with a request-scooped arena. But then you'd have to do Rust.
In node you could use worker threads (which create a new V8 instance in a separate OS thread) but that's probably too heavy handed.
Author here.
Agreed. We have some facility for out of process caching (node local memcached), and I frequently have to argue with colleagues that it's generally preferable to in-process caching.
A similar strategy is to serialize the object and store it in-process but off heap. This is useful when the values are private to the process, and/or they don't need to survive a crash, and/or you need to avoid the network overhead. Access times are often 100x-1000x faster.
This is something I've been thinking about for a while. What if we create a language where all object references are explicitly specified to be either request-scoped or application-scoped. Don't allow application-scoped objects reference request-scoped objects. Allow to manually upgrade a reference from request-scoped to application-scoped if needed.
That would allow us to have ephemeral per-request heaps which are torn down after every request at once. In-request garbage collections are super-fast. Application-scoped objects are never collected (i.e. no major collections).
Wouldn't this simple model solve most problems? Basically, a very simple equivalent to Rust's lifetimes tailored to web services without all the complexity, and much less GC overhead than in traditional GC systems.
You could call them "arenas" to be consistent with the prior art. Yes, if you can partition the heaps into ones with distinct lifetimes, good plan.
"What if we create a language where all object references are explicitly specified to be either request-scoped or application-scoped."
I've done this in both C and C++.
The downside of automatic memory management is you have to accept the decisions the memory manager makes.
Still, generational GC like in Ruby and Python essentially attempts to discern the lifetime of allocations, and it gets it right most of the time.
Given that Rust seems to be the generalized solution to this problem, would a viable prototype just be a Rust HTTP server with a Ruby interpreter embedded in it? Write the code in some kind of Ruby DSL which then feeds back into Rust?
I ask because I have embedded Ruby in applications before, and I'm looking for an excuse to do it in Rust.
I accept that it is true but I bristle at the fact of it. It shouldn’t be true.
Depends, it's not just about access time and GC pressure, it's also about sharing that cache with other processes on the node.
This could also be a case for non-managed objects in the same process. APIs aren't typically very friendly, but I would expect they could be made so, especially if it was for a dedicated use-case like caching.
Rust is emerging as a major contender for HTTP/gRPC backend services.
Actix, Axum, sqlx, diesel, and a whole host of other utilities and frameworks make writing Rust for HTTP just as easy and developer efficient as Golang or Java, but the code will never have to deal with GC.
It's easy to pull request scoped objects into durable caches.
This is from several years ago (2017), but this has very similar vibe as Instagram disabling Python GC - https://instagram-engineering.com/dismissing-python-garbage-...
A 10% performance improvement on Python code is laughable. You can get a 5000% performance improvement if you switch to a better language.
I bet the engineers at Instagram were unaware of pythons performance profile when they chose it, you should let them know that they should just switch to a different language.
Meta is just a small startup though, they probably don't have enough resources nor the skills to switch to a better language even after they've heard the gospel.
I don't know if you're joking or not but that is exactly true. Meta went as far as creating their own PHP engine and then a new PHO compatible language because they didn't have the resources to switch from PHP.
Instagram is presumably in the same position. Switching language is basically impossible once you have a certain amount of code. I'm sure they were aware of the performance issues with Python but they probably said "we'll worry about it later" when they were a small startup and now it's too late.
Well, Facebook also created their own hacked up version of PHP (called Hack) that's presumably easier to migrate PHP to.
Hack is actually surprisingly pleasant, basically about the best language they could have made starting from PHP. (I know, that's damning with faint praise. But I actually mean this unironically. It has TypeScript vibes.)
I was excited about Hack when it came out. Unfortunately PHP took just enough from it to kill it. I gave up on it once Composer stopped supporting it, after backwards compatibility with PHP no longer became a goal.
IMHO Hack's best feature was native support for XHP... which (also unfortunately) isn't something PHP decided to take.
Hack really gave PHP/Zend the kick up the arse that it seemed to need.
I guess like Scala (et al) and Java? C++ also seems to be learning things from Rust these days.
I only used Hack when I was very briefly working for Facebook. (And I used PHP once before nearly 20 years ago by now for some web site I had 'inherited', back when PHP was truly an awful language)
Yes I mentioned that.
The joke is that Facebook literally did, right?
Did what? Rewrite Instagram into another language? Do you have any source on this?
Last time I checked they're working on improving Python performance instead (yes I know they forked it into Cinder, but they're trying to upstream their optimizations [0]). Which is very similar to what we're doing at Shopify.
Of course 100% of Instagram isn't in Python, I'm certain there's lots of supporting services in C++ etc, but AFAIK the Instagram "frontend" is still largely a Python/Django app.
The joke is that if Meta thought that replacing all the Python code they have with something else was worth it, they'd have done it already.
[0] https://github.com/facebookincubator/cinder
> The joke is that if Meta thought that replacing all the Python code they have with something else was worth it, they'd have done it already.
"Worth it" depends on both how much performance improvement you get, and how hard it is to replace. Did you consider maybe the rewriting effort is so humongous that it is not worth doing despite large performance improvements? Thus making the joke not funny at all...
That's exactly the joke though. Every time Ruby (or Python) is discussed on HN we get the same old tired question of "why don't they just rewrite in Rust".
But that's some silly engineer tunnel vision, squeezing the very last bit of performance out of a system isn't a goal in itself. You just need it to be efficient enough that it cost you significantly less to run that the amount of revenue it brings you.
I can bet you that moving off Python must have been pitched dozens and dozens of time by Meta engineers, but deemed not worth it, because execution speed isn't the only important characteristic.
So yes, I find it hilarious when HN commenters suggests companies should rewrite all their software into whatever is seen as the most performant one.
It's usually dismissed because companies think short term, and switching languages is a project with huge short term disadvantages and huge long term advantages.
It's usually dismissed because it's almost always a huge strategic blunder.
I think they meant Facebook switched their PHP code to Hack and HHVM, their own PHP-like language and implementation.
They may well have been initially, its a pretty puzzling choice
> I bet the engineers at Instagram were unaware of pythons performance profile when they chose it,
Is the Instagram stack Python? I doubt it, but stranger things have happened
I suspect it is actually some derivative of Apache, or Nginx. Something sensible
Instagram is built with Django.
You expect that the instagram stack is something sensible, such as only a web server? What?
Very naive take.
1. 10% performance improvement at Instagram could lead to many millions of revenue "instantly". It is not laughable at any company. 2. It won't be a 5000% performance improvement. Facebook uses its own fork of Python that is heavily optimized. Probably still far from C++, but you should be thinking about languages like Java when talking about performance.
"Better" is a very subjective term when discussing languages, and I hope such discussions can be more productive and meaningful.
Cinder's benchmarks don't seem "like Java" performance, given they aren't that far off cython.
https://github.com/facebookincubator/cinder/blob/cinder/3.8/...
CPython itself has seen lots of performance improvements recently. Benchmarks on CPython 3.12 take about half the time they took on CPython 3.9.
Yeah it's definitely welcome, but even if it is double the performance (doesn't seem to be quite there in my experience) fast languages are still 25-50x faster. It's like walking twice as fast when the alternative is driving.
Yes.
Well, it really depends on whether that alternative is open to you, and at what cost.
So eg lots of machine learning code is held together by duct tape and Python. Most of the heavy lifting is done by Python modules implemented in (faster) non-Python languages.
The parts that remain in Python could potentially be sped up by migrating them, too. But that migration would likely not do too much for the overall performance, but still be pretty expensive to do (in terms of engineering effort).
For organisations in these kinds of situations, it makes a lot of sense to hope for / contribute to a faster Python. Especially if it's a drop-in replacement (like Python 3.12 is for 3.9).
What makes really me hopefully is actually JavaScript: on the face of it, JavaScript is actually about the worst language to have a fast implementation. But thanks to advances in clever compiler and interpreter techniques, JavaScript is one of the decently fast languages these days. Especially if you are willing to work in a restricted subset of the language for substantial parts of your code.
I'm hoping Python can benefit from similar efforts. Especially since they don't need to re-invent the wheel, but can learn from the earlier and ongoing JavaScript efforts.
(I myself made some tiny efforts for CPython performance and correctness. Some of them were even accepted into their repository.)
> Facebook uses its own fork of Python that is heavily optimized.
So likely the 5000% improvement is no longer possible because they already did multiple 10% improvements? I don't know how this counters the original point.
All clues point to FB going this route because they had too much code already in PHP, and not because the performance improvement would be small.
In any case, "facebook does it" is not a good argument that something is the right thing to do. Might be, might not be. FB isn't above wrong decisions. Else we should buy "real estate" in the metaverse.
> no object allocated as part of a request should survive longer than the request itself
So I've spent a lot of time doing Hack (and PHP) as well as Java, Python and other languages. For me, as far as serving HTTP requests goes, Hack/PHP are almost the perfect language. Why?
1. A stateless functional core. There's no loading of large libraries, which is an issue with Python and Java in certain paradigms. The core API us just functions that mean startup costs for a non-stateful service are near zero;
2. The model, as alluded to the above quote, basically creates temporary objects and then tears everything down at the end of the request. It's so much more difficult to leak resources this way as opposed to, say, a stateful Java or C++ server. PHP got a lot of hate unjustly for its "global" scope when in fact it's not global at all. "Global" in PHP/Hack is simply request-scoped and pretty much every language offers request-scoping;
3. There's no threading. Hack, in particular, uses a cooperative async/await model. Where you'd normally create threads (eg making a network request), that's handled by the runtime to make an async/await call out of non-blocking I/O. You never have to deal with mutexes, thread starvation, thread pools, lock ups, etc. You never want to deal with that in "application" or "product" code. Never.
So this article is specific to Ruby-on-Rails, which obviously still has persistent objects, hence the need for GC still.
How Facebook deals with this is kinda interesting. Most FB product code uses an in-memory write-through graph database (called TAO, backed to MySQL). There is an entity model in Hack on top of this that does a whole bunch of stuff like enforcing privacy (ie you basically never talk to TAO directly and if you do, you're going to have to explain why that's necessary, and you absolutely never talk to MySQL directly).
But the point is that persistent entities are request-scoped as well (unlike RoR I guess?).
If the core API is just functions, how do stateful applications handle connections to this persistent storage? Can you still have a connection pool, or does every request pay the extra latency to start a new connection and re-authenticate?
i think they do now but originally one of the reasons people use mysql with languages that are connection per request was that mysql connections were very cheap.
And why pgbouncer used to be considered an essential part of a Postgres web-app-backend deployment — if your business layer didn’t pool and reuse connections, then having an external shim component that pools and reuses connections would solve a lot of the impedance mismatch.
All the other virtual machines that support GC need to look at the JVM's ZGC and Shenandoah. Sub-millisecond pause times with terabyte heaps.
We're actually hoping to get mmtK included in Ruby to be able to use more advanced GCs. Medium term we hope to use Immix: https://bugs.ruby-lang.org/issues/20860
And yes, we're aware of ZGC &co https://www.eightbitraptor.com/presentations/RubyKaigi2023-m...
Amazing to see this happening. I wrote the non-moving Immix collector at Twitter for our Ruby runtime, Kiji.
Good luck!
I think we should be careful when correlating heap size with how long the collection should take.
Also, I really want ZGC in .NET runtime, but I don't think I'll ever get support for it first party. There's some kind of principled ideologue holdout situation going on over at Microsoft. Every time I get into it with one of their engineers I'm sent to some impotent "please may I have a temporary GC exemption" API. All I want is it to do nothing. How hard is it to just not clean up the goddamn garbage? Give me a registry flag + env variable + cli arg all required at the same time if you're so worried someone might trip over it.
ZGC does not stand for zero. It stands for Z Garbage Collector. It's a next-generation GC implementation for OpenJDK that focuses on low pause time while supporting very large heap sizes. It does not "not collect garbage".
You could try using https://github.com/kkokosa/UpsilonGC and seeing if it still works.
At the end of the day for anything performance-related you can just write code with manual memory management with RAII patterns via IDisposable on structs and get code that performs closely to C++ or Rust. It's also necessary to understand if this is a good idea at all - most of the time you do want to just rely on GC.
> ZGC does not stand for zero. It stands for Z Garbage Collector.
Apologies - I was attempting to referring to "absolutely no" garbage collection path. I was thinking of Epsilon [0].
> It's also necessary to understand if this is a good idea at all - most of the time you do want to just rely on GC.
Assume we are building a cruise missile flight computer. I have enough ram for ~100 hours of flight if we never clean up any allocations. I only have enough fuel for 8 hours of flight on a good day. Why do I still need a garbage collector? All I need is a garbage generator. The terminal ballistics and warhead are the "out of band" aspects in this arrangement.
> You could try using https://github.com/kkokosa/UpsilonGC and seeing if it still works.
I've spent weeks on this exact thing. I cannot get it to work. This gets me back to the first party support aspect.
[0] https://openjdk.org/jeps/318
Would GC.TryStartNoGCRegion work for you? https://learn.microsoft.com/en-us/dotnet/api/system.gc.tryst...
How much of a throughput penalty do those options incur on the application?
I think it only affects throughput at the limit at the 5% level. All the portfolio companies that implemented it got a net increase in performance as they avoid redlining their servers.
C4 still smokes them both, doesn't it?
hard to smoke sub-millisecond pauses but there may be other axes where it is better. it used to be that people thought azul was better because it was generational but now zgc is as well. my guess is that c4 doesn't have enough of an edge at this point but happy to see benchmarks that prove otherwise.
> Ideally in a web application, aside from some in-memory caches, no object allocated as part of a request should survive longer than the request itself. Any object that does is probably something that should be eagerly loaded during boot, or some state that is leaking between requests. As such, any object promoted to the old generation during a request cycle is very unlikely to be immortal, so promoting it is wasteful.
So could each request clean up it's own garbage when it finishes, so then they should never need any global garbage collection?
That's pretty much what I'm hinting at at the end when I mention minor GC.
I don't think doing it after each request would be sensible, but counter intuitively, the time it takes to run GC isn't proportional to amount of garbage to collect, but to the number of live objects left (ignoring some minor things like finalizers).
So on paper at least we could run a minor GC for very cheap after each request, but there's likely some better heuristics given currently the median request already spent less than 1ms in GC, so after every requests might be overdoing it.
Also even if we were doing that, many requests would still have to run GC because they allocate more than there is memory available, so they need to clean their own garbage to continue, you can't delay GC indefinitely.
But at least now, endpoints that spend too much time in GC are responsible for their own demise, so engineers responsible for a given endpoint performance have a clear signal that they should allocate less, whereas before it could easily discounted as being caused by lots of garbage left over by another collocated endpoint.
Would it be possible for the allocator/GC to know what allocations are made within a request and make a generation for specifically for it? Allocations too big to fit would be made like usual
That's already what we effectively have.
Since objects cannot be promoted to the old generation inside the request cycle, objects in the new gen are request allocated objects.
So if we were to eagerly trigger a minor GC after a request, we'd have very little objects to scan, and only need to sweep garbage, which is only a small fraction of time spent in GC.
What dashboard software is that?
Looks like grafana
They built a large codebase on a language that doesn't let you control memory, because that makes you "more productive". So just having Rails allocate a per-request arena that is asynchronously freed which would force the programmer not to have any objects that outlive the request, or just pre-allocating memory for a fixed amount of request handling per server instance, or whatever allocation behavior you want to do that is generally possible in C/C++/Zig/Rust/Odin/etc, requires hacking on the language itself. Which means your changes have to go through the Ruby team first. Any additional changes would also need to go through them, which increases the cost of change. Then there is a permanent layer of indirection between your GC callbacks and the semantics of what those callbacks do. Instead of just writing out the custom allocators you want, because that's impossible. How depressing.
> Which means your changes have to go through the Ruby team first. Any additional changes would also need to go through them …
I do want to pick on this specifically - people can and should be patching open source projects they depend on and deploying them to production (exactly as described in the article). Something being in the language vs in “user” code should be no barrier to improving it.
There's a pretty huge difference between implementing a performance optimization that works in your use case, and upstreaming that optimization to be generally usable.
The latter is often orders of magnitude more work, and the existing solution is probably chosen to be well suited in general.
Patching a dependency comes with significant downstream costs. You need to carry the patch forward to new upstream versions . This implies remembering that the dependency was patched, extracting a patch from the existing changed code, and reapplying the patch, fixing comflicts, recompiling the now special version of that dependency and running tests, checking/updating required license notices accordingly.
This is in essence another form of technical dept.
I'm guessing that Zig, Rust, Oden, and "etc." didn't exist when they started the codebase. Now they need to keep moving in their imperfect state. I don't think anyone would start a large company on Ruby today. (They would on Python, though, which is equally unfortunate.)
I don't see how it is imperfect.
Per request arenas sound super cool on paper, and work very well on system with clear constraints. But if suddenly a request start allocating more than the arena can accommodate you're in a bit of a pickle. They're absolutely not a panacea.
Setting aside the challenge of refactoring the Ruby VM to allow this sort of arenas, they'd be a terrible fit for Shopify's monolith.
Ultimately, while it's a bit counter intuitive, GCs can perform extremely well in term of throughput. Ruby's GC isn't quite there yet, but still perform quite well and is improving every versions.
> allocating more than the arena can accommodate
In Zig, at least, this isn't how arenas work. They're a wrapper around a backing allocator, so if the arena runs out of memory, then that means the process is out of memory, something no allocation strategy can fix (ignoring the fact that Zig returns a specific error when that happens, and maybe you can trigger some cache eviction or something like that).
It's easy to set them to retain a 'reasonable' allocated capacity when they get reset, for whatever value of reasonable, so big allocation spikes get actually freed, but normal use just moves a pointer back and reuses that memory.
I don't see Shopify harvesting a lot of value from a complete Zig rewrite, no. But arenas are basically ideal for the sort of memory use which web servers typically exhibit.
> something no allocation strategy can fix
Well, yes, with a GC when your heap is full, you make space by getting rid of the garbage.
Also, with a good GC, allocating is most of the time just bumping a pointer, exactly like an arena, and the collection time is proportional to the number of live objects, which when triggered out of band is basically 0.
Hence why I think a well tuned GC really isn't that far off.
I really like Zig, and I’ve been thinking about using it for developing servers offering APIs over HTTP, using arenas bound to request lifetime. I think I would be comfortable developing like this myself. But all the devs in my org have only ever used managed-memory languages such as Java, C#, Python or JavaScript, which makes me hesitant, as I’m wondering about the learning curve, and of course the risk of use-after-free. Not something I would do anyway before Zig reaches 1.0.
I wrote substantial amounts of C, and Pascal/Delphi before that, before learning Zig, so you and I wouldn't see the same learning curve. That said, I found it straightforward to take up. Andrew Kelly places a great emphasis on simplicity in the sense Rich Hickey uses the term, so Zig has a small collection of complete solutions which compose well.
Now is a great time to pick up the language, but I would say that production is not the right place to do that for a programmer learning memory management for the first time. Right now we're late in the release cycle, so I'd download a nightly rather than use 0.13, if you wanted to try it out. Advent of Code is coming up, so that's an option.
Using a memory-managed language means you need to design a memory policy for the code. Zig's GeneralPurposeAllocator will catch use after free and double free in debug mode, but that can only create confidence in memory handling code if and when you can be sure that there aren't latent bugs waiting to trigger in production.
Arenas help with that a lot, because they reduce N allocations and frees to 1, for any given set of allocations. But one still has to make sure that the lifetime of allocations within the arena doesn't outlast the round, and you can only get that by design in Zig, lifetimes and ownership aren't part of the type system like they are in Rust. In practice, or I should say with practice, this is readily achievable.
At current levels of language maturity, small teams of experienced Zig developers can and do put servers into production with good results. But it's probably not time for larger teams to learn as they go and try the same thing.
I started programming in Pascal, C and C++, so personally I’m fine with manual memory management, especially with a language like Zig. I actually find it quite refreshing. I’m just wondering if it’s possible to “scale” this approach to a team of developers who may not have that past experience (having only worked with GCed languages) without ending in a code base littered with use-after-free errors.
And when the default arena size is often outgrown, you'll known from whatever diagnostics/logging/dashboard solution you are using. Which is incidentally also a great tool when optimizing per-request memory usage.
Being explicit about memory has many advantages, and is a strict requirement when scaling.
I don't think using Zig over Python is gonna have the biggest impact in making your next big company successful. It's a drop in the ocean compared to the quality of people you have to actually design and build it.
It doesn't matter for a startup trying to get acquired.
It does matter for a company trying to scale its user base while keeping costs down.
That is nonsense. If you can run your code on 10 servers instead of 1k servers, that is an insane time and money saver that could make or break a company.
For most startups the difference between Rust and Ruby is not 10 servers vs 1000 but rather between using 0.1% of a CPU or 1% of a CPU. A single server running Rails will easily scale to hundreds of thousands of daily users. Most companies never get that many users in the first place, and those that do will have the funds to afford rewriting the hottest paths in a more performant language.
I certainly don't claim to be an expert, but I have a hunch that getting to the point where performance becomes a significant factor (in the success or failure of a product) isn't going to be about the choice of language. I also think you're vastly underestimating the performance that good architects can get out of ANY (primary) language through good system design. Good design vs bad design makes the biggest difference in my experience, at least from a technical standpoint. Probably just nonsense though as you say.
Startups are not large companies in the beginning.
Although I'm not sure what the preferred language for quickly getting a startup up and running would be these days.
Whatever works and you can find enough developers for.
The language (or the rest of the stack even) is rarely a barrier to success. What matters are a good idea, good motivation, and decent availability of competence.
JMHO.