How we made JSON.stringify more than twice as fast

(v8.dev)

345 points | by emschwartz 21 hours ago ago

110 comments

  • deathwarmedover 22 minutes ago

    Not that I doubt the value of the work, and the reasoning of its performance directly affecting common operations makes intuitive sense, but I would have liked to hear more about what concrete problems were being solved. Was there any interesting data across the V8 ecosystem about `JSON.stringify` dominating runtimes?

    • abdulhaq 9 minutes ago

      it doesn't need to dominate run times when it's being called by hundreds of millions of pages every day. The power saving worldwide will be considerable.

  • hinkley 18 hours ago

    JSON encoding is a huge impediment to interprocess communication in NodeJS.

    Sooner or later is seems like everyone gets the idea of reducing event loop stalls in their NodeJS code by trying to offload it to another thread, only to discover they’ve tripled the CPU load in the main thread.

    I’ve seen people stringify arrays one entry at a time. Sounds like maybe they are doing that internally now.

    If anything I would encourage the V8 team to go farther with this. Can you avoid bailing out for subsets of data? What about the CString issue? Does this bring faststr back from the dead?

    • jcdavis 14 hours ago

      Based off of my first ever forays into node performance analysis last year, JSON.stringify was one of the biggest impediments to just about everything around performant node services. The fact that everyone uses stringify to for dict keys, the fact that apollo/express just serializes the entire response into a string instead of incrementally streaming it back (I think there are some possible workarounds for this, but they seemed very hacky)

      As someone who has come from a JVM/go background, I was kinda shocked how amateur hour it felt tbh.

      • MehdiHK 14 hours ago

        > JSON.stringify was one of the biggest impediments to just about everything around performant node services

        That's what I experienced too. But I think the deeper problem is Node's cooperative multitasking model. A preemptive multitasking (like Go) wouldn't block the whole event-loop (other concurrent tasks) during serializing a large response (often the case with GraphQL, but possible with any other API too). Yeah, it does kinda feel like amateur hour.

        • Rohansi 3 hours ago

          That's not really a Node problem but a JavaScript problem. Nothing about it was built to support parallel execution like Go and other languages. That's why they use web workers, separate processes, etc. to make use of more than a single core. But then you'll probably be dependent on JSON serialization to send data between those event loops.

      • hinkley 13 hours ago

        > Based off of my first ever forays into node performance analysis last year, JSON.stringify was one of the biggest impediments to just about everything around performant node services

        Just so. It is, or at least can be, the plurality of the sequential part of any Amdahl's Law calculation for Nodejs.

        I'm curious if any of the 'side effect free' commentary in this post is about moving parts of the JSON calculation off of the event loop. That would certainly be very interesting if true.

        However for concurrency reasons I suspect it could never be fully off. The best you could likely do is have multiple threads converting the object while the event loop remains blocked. Not entirely unlike concurrent marking in the JVM.

      • dmit 13 hours ago

        Node is the biggest impediment to performant Node services. The entire value proposition is "What if you could hire people who write code in the most popular programming language in the world?" Well, guess what

        • hinkley 11 hours ago

          Nodejs will never be as bad as VB was.

          • germandiago 3 hours ago

            I know Javascript is fast but...

            I cannot go with such a messy ecosystem. I find Python highly preferrable for my backend code for more or less low and middle traffic stuff.

            I know Python is not that good deployment-wise, but the language is really understandable, I have tools for every use case, I can easily provide bindings from C++ code and it is a joy to work with.

            If on top of that, they keep increasing its performance, I think I will stick to it for lots of backend tasks (except for high performance, where I have lately been doing with C++ and Capnproto RPC for distributed stuff).

            • m-schuetz 3 hours ago

              It comes down to language preferences. I find Python to be the worst thing computer science has to offer. No nested scoping in functions, variables leak through branches and loops due to lack of scopes, no classic for loops, but worst of all, installing python packages and frameworks never ever goes smoothly.

              I would like to love Jupyter notebooks because Notebooks are great for prototyping, but Jupyter and Python plotting libs are so clunky and slow, I always have to fall back to Node or writing a web page with JS and svg for plotting and prototyping.

          • jk3000 3 hours ago

            More or less accidentally I turned a simple Excel spreadsheet into a sizable data management system. Once you learn where the bottlenecks are, it is surprising how fast VB is nowadays.

          • hnlmorg an hour ago

            That depends entirely on what you measure:

            - Rapid application development

            VB was easier and quicker

            - GUI development

            At least on Windows, in my opinion, VB is still the best language ever created for that. Borland had a good stab at it with their IDEs but nothing really came close to VB6 in terms of speed and ease of development.

            Granted this isn't JS's fault, but CSS et al is just a mess in comparison.

            - Cross-platform development

            You have a point there. VB6 was a different era though.

            - Type safety

            VB6 wins here again

            - Readability

            This is admittedly subjective, but I personally don't find idiomatic node.js code all that readable. VB's ALGOL-inspired roots aren't for everyone but if I personally don't mind Begin/End blocks.

            - Consistency

            JS has so many weird edge cases. That's not to say that VB didn't have its own quirks. However they were less numerous in my experience.

            Then you have inconsistencies between different JS implementations too.

            - Concurrency

            Both languages fail badly here. Yeah node has async/await but i personally hate that design and, ultimately, node.js is still single-threaded at its core. So while JS is technically better, it's still so bad that I cannot justify giving it the win here.

            - Developer familiarity

            JS is used by more people.

            - Code longevity

            Does this metric even deserve a rebuttal given the known problem of Javascript framework churn? You can't even recompile any sizable 2 year old Javascript projects without running into problems. Literally every other popular language trumps Javascript in that regard.

            - Developer tooling

            VB6 came with everything you needed and worked from the moment you finished the VB Visual Studio install.

            With node.js you have a plethora of different moving parts you need to manually configure just to get started.

            ---

            I'm not suggesting people should write new software in VB. But it was unironically a good language for what it was designed for.

            Node/JS isn't even a good language for its intended purpose. It's just a clusterfuck of an ecosystem. Even people who maintain core JS components know this -- which is what tooling is constantly being migrated to other languages like Rust and Go. And why some many people are creating businesses around their bespoke JS runtimes aiming to solve the issues that node.js create (and thus creating more problems due to ever-increasing numbers of "standards").

            Literally the only redeemable factor of node.js is the network effect of everyone using it. But to me that feels more like Stockholm Syndrome than a ringing endorsement.

            And if the best compliment you can give node.js is "it's better than this other ecosystem that died 2 decades ago" then you must realise yourself just how bad things really are.

          • schrodinger 6 hours ago

            Low bar!

        • teaearlgraycold 11 hours ago

          I'd say the value prop is you can share code (and with TS, types as well) between your web front end and back end.

          • Lutger an hour ago

            That is useful, but you can achieve a similar benefit if you manage to spec out your api with openapi, and then generate the typescript api client. A lot of web frameworks make it easy to generate openapi spec from code.

            The maintenance burden shifts from hand syncing types, to setting up and maintaining the often quite complex codegen steps. Once you have it configured and working smoothly, it is a nice system and often worth it in my experience.

            The biggest benefit is not the productivity increase when creating new types, but the overall reliability and ease of changing stuff around that already exists.

          • schrodinger 6 hours ago

            I haven't found this to pay off in reality as much as I'd hoped… have you?

            • bubblyworld 5 hours ago

              Me neither, for multiple reasons, especially if you are bundling your frontend. Very easy to accidentally include browser-incompatible code, becomes a bit of a cat and mouse game.

              On types, I think the real value proposition is having a single source of truth for all domain types but because there's a serialisation layer in the way (http) it's rarely that simple. I've fallen back to typing my frontend explicitly where I need to, way simpler and not that much work.

              (basically as soon as you have any kind of context-specific serialisation, maybe excluding or transforming a field, maybe you have "populated" options in your API for relations, etc - you end up writing type mapping code between the BE and FE that tends to become brittle fast)

            • ChocolateGod 4 hours ago

              It's useful for running things like Zod validators on both the client and server, since you can have realtime validation to changes that doesn't require pinging the server.

            • strken 6 hours ago

              It's been very useful for specific things: unified, complicated domain logic that benefits from running faster than it would take to do a round trip to the server and back.

              I've only rarely needed to do this. The two examples that stick in my mind are firstly event and calendar logic, and secondly implementing protocols that wrap webrtc.

            • bavell 6 hours ago

              Yes, incredible productivity gains from using a single language in frontend and backend.

              • makeitdouble 5 hours ago

                That's Java's story.

                A single language to rule them all: on the server, on the client, in the browser, in appliances. It truly was everywhere at some point.

                Then people massively wish for something better and move to dedicated languages.

                Put another way, for most shops the productivity gains and of having single languages are far from incredible, to being negatives in the most typical settings.

                • ifwinterco 5 hours ago

                  Java applets were never ubiquitous the same was JS is on the web though - there's literally a JS environment always available on every page unless the user explicitly disables it, which very few people do.

                  JS is here to stay as the main scripting language for the web which means there probably will be a place for node as a back end scripting language. A lot of back ends are relatively simple CRUD API stuff where using node is completely feasible and there are real benefits to being able to share type definitions etc across front end and back end

                  • makeitdouble 5 hours ago

                    > there are real benefits to being able to share type definitions etc across front end and back end

                    There are benefits, but cons as well. As you point out, if the backend is only straight proxying the DB, any language will do so you might as well use the same as the frontend.

                    I think very few companies running for a few years still have backends that simple. At some point you'll want to hide or abstract things from the frontend. Your backend will do more and more processing, more validation, it will handle more and more domain specific logic (tax/money, auditing, scheduling etc). It becomes more and more of a beast on its own and you won't stay stuck with a language which's only real benefit is partially sharing types with the frontend.

                • lmm 4 hours ago

                  Java never ran well on the desktop or in the browser (arguably it never truly ran in the browser at all), and it was an extremely low-productivity language in general in that era.

                  There is a significant gain from running a single language everywhere. Not enough to completely overwhelm everything else - using two good languages will still beat one bad language - but all else being equal, using a single language will do a lot better.

                  • makeitdouble 2 hours ago

                    Yes, Java was never really good (I'd argue on any platform. Server side is fine, but not "really good" IMHO)

                    It made me think about the amount of work that went into JS to make it the powerhouse it is today.

                    Even in the browser, we're only able to do all these crazy things because of herculian efforts from Google, Apple and Firefox to optimize every corner and build runtimes that have basically the same complexity as the OS they run on, to the point we got Chrome OS as a side product.

                    From that POV, we could probably take any language, pour that much effort into it and make it a more than decently performing platform. Java could have been that, if we really wanted it hard enough. There just was no incentive to do so for any of the bigger players outside of Sun and Oracle.

                    > all else being equal, using a single language will do a lot better.

                    Yes, there will be specific cases where a dedicated server stack is more of a liability. I still haven't found many, tbh. In the most extreme cases, people will turn to platforms like Firebase, and throw money at the pb to completely abstract the server side.

            • teaearlgraycold 6 hours ago

              I’ve used the ability to import back end types into the front end to get a zero-cost no-file-watcher API validator.

              My blog post here isn’t as good as it should be, but hopefully it gets the point across

              https://danangell.com/blog/posts/type-level-api-client/

          • Cthulhu_ 3 hours ago

            But that's only really relevant in the last layer, the backend-for-frontend pattern; as an organization or domain expands, more layers can be added. e.g. in my current job, the back-end is a SAP system that has been around for a long time. The Typescript API layer has a heap of annotations and whatnots on each parameter, which means it's less useful to be directly used in the front-end. What happens instead is that an OpenAPI spec is generated based on the TS / annotations, and that is used to generate an API client or front-end/simplified TS types.

            TL;DR, this value prop is limited.

    • nijave 11 hours ago

      Same problem in Python. It'd be nice to have good/efficient IPC primitives with higher level APIs on top for common patterns

    • userbinator 7 hours ago

      You mean:

      > JSON encoding is a huge impediment to communication

      I wonder how much computational overhead JSON'ing adds to communications at a global scale, in contrast to just sending the bytes directly in a fixed format or something far more efficient to parse like ASN.1.

      • hinkley 5 hours ago

        No. Because painful code never gets optimized as much as less painful code. People convince themselves to look elsewhere, and an incomplete picture leads to local optima.

    • tgv 4 hours ago

      > If anything I would encourage the V8 team to go farther with this.

      That feels the wrong way to go. I would encourage the people that have this problem to look elsewhere. Node/V8 isn't well suited to backend or the heavier computational problems. Javascript is shaped by web usage, and it will stay like that for some time. You can't expect the V8 team to bail them out.

      The Typescript team switched to Go, because it's similar enough to TS/JS to do part of the translation automatically. I'm no AI enthousiast, but they are quite good at doing idiomatic translations too.

      • com2kid 3 hours ago

        > Node/V8 isn't well suited to backend

        Node was literally designed to be good for one thing - backend web service development.

        It is exceptionally good at it. The runtime overhead is tiny compared to the JVM, the async model is simple as hell to wrap your head around and has a fraction of the complexity of what other languages are doing in this space, and Node running on a potato of a CPU can handle thousands of requests per second w/o breaking a sweat using the most naively written code.

        Also the compactness of the language is incredible, you can get a full ExpressJS service up and running, including auth, in less than a dozen lines of code. The amount of magic that happens is almost zero, especially compared to other languages and frameworks. I know some people like their magic BS (and some of the stuff FastAPI does is nifty), but Express is "what you see is what you get" by default.

        > The Typescript team switched to Go, because it's similar enough to TS/JS to do part of the translation automatically.

        The TS team switched to go because JS is horrible at anything that isn't strings or doubles. The lack of an int type hinders the language, so runtimes do a lot of work to try and determine when a number can be treated like an int.

        JS's type system is both absurdly flexible and also limiting. Because JS basically allows you to do anything with types, Typescript ends up being one of the most powerful type systems that has seen mass adoption. (Yes other languages have more powerful type systems, but none of them have the wide spread adoption TS does).

        If I need to model a problem domain, TS is an excellent tool for doing so. If I need to respond to thousands of small requests, Node is an excellent tool for doing so. If I need to do some actual computation on those incoming requests, eh, maybe pick another tech stack.

        But for the majority of service endpoints that consist of "get message from user, query DB, reformat DB response, send to user"? Node is incredible at solving that problem.

        • tgv 2 hours ago

          > Node was literally designed to be good for one thing - backend web service development.

          I don't think it was, at least not originally, But even if it was, that doesn't mean it actually is good, and certainly not for all cases.

          > Node running on a potato of a CPU can handle thousands of requests per second w/o breaking a sweat using the most naively written code.

          The parent comment is specifically about this. It breaks down at a certain point.

          > you can get a full ExpressJS service up and running, including auth, in less than a dozen lines of code

          Ease of use is nice for a start, but usually becomes technical debt. E.g., you can write a pretty small search algorithm, but it will perform terribly. Not a problem at the start. You can set up a service with a just a little bit of code in any major language using some framework. Heck, there are code free servers. But you will have to add more and more work-arounds as the application grows. There's no free lunch.

          > The TS team switched to go because JS is horrible at anything that isn't strings or doubles.

          They switched because V8 is too slow and uses quite a bit of memory. At least, that's what they wrote. But that was not what I wanted to address. I was trying to say that if you have to switch, Go is a decent option, because it's so close to JS/TS.

          > But for the majority of service endpoints ...

          Because they are simple, as you say. But when you run into problems, asking the V8 team to bail you out with a few more hacks doesn't seem right.

    • brundolf 14 hours ago

      Yeah. I think I've only ever found one situation where offloading work to a worker saved more time than was lost through serializing/deserializing. Doing heavy work often means working with a huge set of data- which means the cost of passing that data via messages scales with the benefits of parallelizing the work.

      • hinkley 13 hours ago

        I think the clues are all there in the MDN docs for web workers. Having a worker act as a forward proxy for services; you send it a URL, it decides if it needs to make a network request, it cooks down the response for you and sends you the condensed result.

        Most tasks take more memory in the middle that at the beginning and end. And if you're sharing memory between processes that can only communicate by setting bytes, then the memory at the beginning and end represents the communication overhead. The latency.

        But this is also why things like p-limit work - they pause an array of arbitrary tasks during the induction phase, before the data expands into a complex state that has to be retained in memory concurrent with all of its peers. By partially linearizing you put a clamp on peak memory usage that Promise.all(arr.map(...)) does not, not just the thundering herd fix.

    • dwattttt 13 hours ago

      Now to just write the processing code in something that compiles to WebAssembly, and you can start copying and sending ArrayBuffers to your workers!

      Or I guess you can do it without the WebAssembly step.

      • hinkley 5 hours ago

        A JSON.toBuffer would be another good addition to V8. There are a couple code paths that look like they might do this, but from all accounts it goes Object->String->Buffer, and for speed you want to skip the intermediate.

        • dwattttt 2 hours ago

          I was actually imagining skipping the Object step; if you go from wire -> buffer, and only ever work on it in buffer form (i.e. in WebAssembly, in a language more amenable to working on buffers/bytes), you skip needing the Object -> JSON step. Notwithstanding whatever you need to do in the wire -> buffer step.

  • jonas21 14 hours ago

    The part that was most surprising to me was how much the performance of serializing floating-point numbers has improved, even just in the past decade [1].

    [1] https://github.com/jk-jeon/dragonbox?tab=readme-ov-file#perf...

    • jameshart 11 hours ago

      Roundtripping IEEE floating point values via conversion to decimal UTF-8 strings and back is a ridiculously fragile process, too, not just slow.

      The difference between which values are precisely representable in binary and which are precisely representable in decimal means small errors can creep in.

      • jk-jeon 11 hours ago

        A way to achieve perfect round-tripping was proposed back in 1990, by Steele and White (and likely they are not the first ones who came up with a similar idea). I guess their proposal probably wasn't extremely popular at least until 2000's, compared to more classical `printf`-like rounding methods, but it seems many languages and platforms these days do provide such round-tripping formatting algorithms as the default option. So, I guess nowadays roundtripping isn't that hard, unless people do something sophisticated without really understanding what they're doing.

        • lifthrasiir 11 hours ago

          I do think the OP was worrying about such people. Now a performant and correctly rounded JSON library is reasonably common, but it was not the case a decade ago (I think).

        • kccqzy 11 hours ago

          Interesting! I didn't know about Steele and White's 1990 method. I did however remember the Burger and Dybvig's method from 1996.

      • gugagore 11 hours ago

        You don't have to precisely represent the float in decimal. You just have to have each float have a unique decimal representation, which you can guarantee if you include enough digits: 9 for 32-bit floats, and 17 for 64-bit floats.

        https://randomascii.wordpress.com/2012/02/11/they-sure-look-...

        • jameshart 11 hours ago

          And you need to trust that whoever is generating the JSON you’re consuming, or will consume the JSON you generate, is using a library which agrees about what those representations round to.

          • jk-jeon 11 hours ago

            Note that the consumer side doesn't really have a lot of ambiguity. You just read the number, compute its precise value as written, round it to the closest binary representation with banker's rounding. You do anything other than this only under very special circumstances. Virtually all ambiguity lies on the producer side, which can be cleared out by using any of the formatting algorithms with the roundtripping-guarantee.

            EDIT: If you're talking about decimal->binary->decimal round-tripping, it's a completely different story though.

          • kccqzy 11 hours ago

            JSON itself doesn't mandate that IEEE754 numbers be used.

            • Waterluvian 10 hours ago

              This is one of those really common misunderstandings in my experience. Indeed JSON doesn’t encode any specific precision at all. It’s just a decimal number of any length you possibly want, knowing that parsing libraries will likely decode it into something like IEEE754. This is why libraries like Python’s json will let you give it a custom parser, if, say, you wanted a Decimal object for numbers.

            • MobiusHorizons 9 hours ago

              Like it or not, but json data types are inherently linked to the primatives available in JavaScript. You can, of course write JSON that can’t be handled with the native types available in JavaScript, but the native parser will always deserialize to a native type. Until very recently all numbers were iee754 doubles in JavaScript, although arbitrary precision bignums do exist now. So the defacto precision limit of a number in JSON that needs to be compatible is an IEE754. If you control your clients you can do whatever you want though.

              • jameshart 8 hours ago

                The standard definitely limits what precision you should expect to be handled.

                But how JSON numbers are handled by different parsers might surprise you. This blog post actually does a good job of detailing the subtleties and the choices made in a few standard languages and libraries: https://github.com/bterlson/blog/blob/main/content/blog/what...

                I think one particular surprise is that C# and Java standard parsers both use openAPI schema hints that a piece of data is of type ‘number’ to map the value to a decimal floating point type, not a binary one.

                • xxs 2 hours ago

                  >C# and Java standard parsers

                  Not sure which parser you consider standard, as Java doesn't have one at all (in the standard libraries). Other than that the existing ones just take the target type (not json) when they deserialize into, e.g. int, long, etc.

            • jameshart 10 hours ago

              Indeed - you could be serializing to or from JSON where the in-memory representation you're aiming for is actually a floating point decimal. JSON doesn't care.

      • kccqzy 11 hours ago

        Most languages in use (such as Python) have solved this problem ages ago. Take any floating point value other than NaN, convert it to string and convert the string back. It will compare exactly equal. Not only that, they are able to produce the shortest string representation.

        • jameshart 11 hours ago

          Maybe 'ridiculously fragile' is the wrong word. Perhaps 'needlessly' fragile would be better.

          The point is that it takes application of algorithms that need to be provably correctly implemented on both ends of any JSON serialization/deserialization. And if one implantation can roundtrip its own floating point values, that's great - but JSON is an interop format, so does it roundtrip if you send it to another system and back?

          It's just an unnecessary layer of complexity that binary floating point serializers do not have to worry about.

        • aardvark179 9 hours ago

          True, but many of them have had bugs in printing or parsing such numbers, and once those creep in they can cause real long term problems. I remember having to maintain alternative datums and projections in GIS software because of a parser error that had been introduced in the late 80s.

  • ot 12 hours ago

    The SWAR escaping algorithm [1] is very similar to the one I implemented in Folly JSON a few years ago [2]. The latter works on 8 byte words instead of 4 bytes, and it also returns the position of the first byte that needs escaping, so that the fast path does not add noticeable overhead on escape-heavy strings.

    [1] https://source.chromium.org/chromium/_/chromium/v8/v8/+/5cbc...

    [2] https://github.com/facebook/folly/commit/2f0cabfb48b8a8df84f...

  • monster_truck 13 hours ago

    I don't think v8 gets enough praise. It is fucking insane how fast javascript can be these days

    • andyferris 13 hours ago

      Yeah, it is quite impressive!

      It's a real example of "you can solve just about anything with a billion dollars" though :)

      I'd prefer JavaScript kept evolving (think "strict", but "stricter", "stricter still", ...) to a simpler and easier to compile/JIT language.

      • fngjdflmdflg 11 hours ago

        I want JS with sound types. It's interesting how sound types can't be added to JS because runtime checks would be too expensive, but then so much of what makes JS slow is having to check types all the time anyway, and the only way to speed it up is to retroactively infer the types. I want types plus a "use typechecked" that tells the VM I already did some agreed upon level of compile-time checks and now it only needs to do true runtime checks that can't be done at compile time.

        • rictic 9 hours ago

          Tricky thing there is that the VM would still need to do checks at all the boundary points, including all of the language-level and runtime-level APIs. At what point along such a migration would it become net faster?

          It'd need a ton of buy-in from the whole community and all VM implementors to have a chance at pencilling out in any reasonable time span. Not saying I'm against it, just noting.

          • fngjdflmdflg 7 hours ago

            Agreed, but I do think all major libraries would be rewritten in any soundly typed version of JS pretty quickly, especially if it was faster (assuming that is the case).

            >runtime-level APIs

            like the Document APIs?

        • teaearlgraycold 11 hours ago

          The most likely path forward on that would be a Typescript AOT compiler, maybe with some limitations on the code you write.

          • samwillis 4 hours ago

            Porffor is doing that, JS -> WASM (an an IR) -> C -> Native

            For TypeScript it uses the types as hints to the compiler, for example it has int types that alias number.

            Very early still, but very cool.

            https://porffor.dev/

          • fngjdflmdflg 11 hours ago

            Compiled to what, wasm?

            • teaearlgraycold 9 hours ago

              I wasn't thinking about browser runtimes. But maybe one day browsers will have native TS support?

              • Cthulhu_ 3 hours ago

                > But maybe one day browsers will have native TS support?

                This may already be a thing; node has native TS support by just ignoring or stripping types from the code, TS feature that can't be easily stripped (iirc namespaces, enums) are deprecated and discouraged nowadays.

                TS is not actually that special in terms of running it. TS types are for type checking which isn't done at runtime, running TS is just running the JS parts of the code as-is.

              • fngjdflmdflg 7 hours ago

                For non-browser runtimes, react native is actually working on an AOT TS/Flow engine. But they ship a bytecode binary rather than what I'm proposing, although what I'm proposing might also be infeasible.

      • Cthulhu_ 3 hours ago

        > I'd prefer JavaScript kept evolving (think "strict", but "stricter", "stricter still", ...) to a simpler and easier to compile/JIT language.

        This is / was ASM.js, a limited subset of JS without all the dynamic behaviour which allowed the interpreter to skip a lot of checks and assumptions. This was deprecated in favor of WASM - basically communicating that if you need the strictness or performance, use a different language.

        As for JS strictness, eslint / biome with all the rules engaged will also make it strict.

        • Timwi an hour ago

          ASM.js and 'use strict' have completely different purposes. One is a performance thing and is (or was supposed to be) used as a compiler target. The other is all about making the programmer’s life easier by disabling features that conflict with principles of maintainability.

      • ihuman 7 hours ago

        Like asm.js was, before webassembly replaced it?

        • Timwi an hour ago

          No, the purposes of asm.js versus 'use strict' are completely different and in conflict with one another.

      • ayaros 12 hours ago

        Yes, this is what I want too. Give me "stricter" mode.

    • shivawu 11 hours ago

      On the other hand, I consider v8 the most extreme optimized runtime in a weird way, in that there’re like 100 people on the planet understand how it works, while the rest of us be like “why my JS not fast”

  • bob1029 an hour ago

    How is this compared to other ecosystems? I've been serializing JSON for about a decade and it's been so fast I haven't really thought about it. Simdjson can do gigabytes per second per core.

    Once you factor in prefetching, branch prediction, etc., a highly optimized JSON serializer should be effectively free for most real world workloads.

    The part where json sucks is IO overhead when modifying blobs. It doesn't matter how fast your serializer is if you have to push 100 megabytes to block storage every time a user changes a boolean preference.

  • MutedEstate45 14 hours ago

    I really like seeing the segmented buffer approach. It's basically the rope data structure trick I used to hand-roll in userland with libraries like fast-json-stringify, now native and way cleaner. Have you run into the bailout conditions much? Any replacer, space, or custom .toJSON() kicks you back to the slow path?

  • Tiberium 10 hours ago

    V8 is extremely good, but (maybe due to JS itself?) it still falls short of LuaJIT and even JVM performance. Although at least for JVM it takes way longer to warm up than the other two.

    • mhh__ 35 minutes ago

      It's JS, V8 is afaict much more advanced than luajit and jvm.

      Although java also has the advantage of not having to be ~ real time (i.e. has a compiler)

    • Cthulhu_ 3 hours ago

      > maybe due to JS itself?

      Nail on the head; a lot of JS overhead is due to its dynamic nature. asm.js disallowed some of this dynamic behaviour (like changing the shape of objects, iirc), meaning they could skip a lot of these checks.

    • MrBuddyCasino 9 hours ago

      „even“

      Hombre, you’re about the best there is.

      • xxs 2 hours ago

        that 'even' made me chuckle - exactly the same 'reaction'

  • smagin 2 hours ago

    Unrelated but I have the easiest design improvement for your website: change <html> background color to #4285f4 (same as in topbar) so it looks like a sheet of paper on a blue base

  • bgdkbtv 7 hours ago

    Unrelated, but v8.dev website is incredibly fast! Thought it would be content preloading with link hovering but no. Refreshing

    • hereonout2 an hour ago

      It's also very simple and free of ads or any other extraneous clutter, a bit like hacker news, which is also fast.

      There's probably a lesson in there somewhere.

    • Culonavirus 7 hours ago

      Speaking of websites, does anyone know when will this land in Node? Node 24 has v8 13.6, and this is 13.8... I mean, this seems like a too big of a performance upgrade to just put it in the next release, especially since Node 24 will be the next LTS version.

  • notpushkin 5 hours ago

    > No indexed properties on objects: The fast path is optimized for objects with regular, string-based keys. If an object contains array-like indexed properties (e.g., '0', '1', ...), it will be handled by the slower, more general serializer.

    Any idea why?

    • Timwi an hour ago

      I wonder that too. Are they saying that objects with integer-looking keys are serialized as JSON arrays?? Surely not...?

  • tisdadd 9 hours ago

    I will need to check this vs the usual safe-stable-stringify package I usually use in my projects now when I'm back on the computer, but it is always nice to see the speedup.

  • dvrp 7 hours ago

    Woah, I didn't know about Packed SIMD/SIMD within a register (SWAR).

  • lifthrasiir 11 hours ago

    As usual, the advancement in double-to-string algorithms is usually driven by JSON (this time Dragonbox).

  • taeric 13 hours ago

    I confess that I'm at a bit of a loss to know what sort of side effects would be common when serializing something? Is there an obvious class of reasons for this that I'm just accidentally ignoring right off?

    • vinkelhake 13 hours ago

      A simple example is `toJSON`. If an object defines that method, it'll get invoked automatically by JSON.stringify and it could have arbitrary side effects.

      I think it's less about side effects being common when serializing, just that their fast path avoids anything that could have side effects (like toJSON).

      The article touches briefly on this.

    • kevingadd 12 hours ago

      Calling a property getter can have side effects, so if you serialize an object with a getter you have to be very cautious to make sure nothing weird happens underneath you during serialization.

      People have exploited this sort of side effect to get bug bounties before via type confusion attacks, iirc.

  • seanwilson 5 hours ago

    Slightly related: I die a little inside each time I see `JSON.parse(JSON.stringify(object))` thinking about the inefficiencies involved compared to how you'd do this in a more efficient language.

    There's structuredClone (https://developer.mozilla.org/en-US/docs/Web/API/Window/stru... https://caniuse.com/?search=structuredClone) with baseline support (93% of users), but it doesn't work if fields contain DOM objects or functions meaning you might have to iterate over and preprocess objects before cloning so more error-prone, manual and again inefficient?

    • remify 2 hours ago

      March 2022 is not that long ago for a codebase. It takes time but Javascript has come a long way and it's definitely going in the right direction.

  • 65 8 hours ago

    I'd like to see how object duplication now compares in speed when using JSON.parse(JSON.stringify()) vs. recursive duplication.

    • Cthulhu_ 2 hours ago

      Also take `structuredClone()` in this comparison; I don't understand why parse/stringify is still a thing, it's a workaround. I'm sure there can be a special code path to make this particular use case super fast or infer that it needs to create a full copy of an object, but there shouldn't be one.

  • greatgib 12 hours ago

    An important question that was not addressed is whether the general path will be slower to account for what is needed to check first if the fast path can be used.

    • Leszek 6 hours ago

      It's not - the general path is a bailout for the fast path which continues where the fast path stopped, so you don't have to check the whole object for fastness (and you get the fast path up until the bailout)

  • jongjong 9 hours ago

    JSON serialization is already quite fast IMO so this is quite good. I think last time I compared JSON serialization to Protocol Buffers, JSON was just a little bit slower for typical scenarios but not materially so. These kinds of optimizations can shift the balance in terms of performance.

    JSON is a great minimalist format which is both human and machine readable. I never quite understood the popularity of ProtoBuf; the binary format is a major sacrifice to readability. I get that some people appreciate the type validation but it adds a lot of complexity and friction to the transport protocol layer.

    • xxs 2 hours ago

      > machine readable

      It is readable but it's not a good/fast format. IEEE754<->string is just expensive even w/ all the shortcuts and improvements. byte[]s have no good way to be presented either.

    • hamandcheese 6 hours ago

      For me the appeal of protobuf is the wire-format forward-backward compatibility.

      It's hard enough to not break logical compatibility, so I appreciate not having to think too hard about wire compat. You can of course solve the same thing with JSON, but, well, YOU have to solve it.

      (Also worth noting, there are a lot of things I don't like about the grpc ecosystem so I don't actually use it that much. But this is one of the pieces I really like a lot).

      • nasretdinov 3 hours ago

        Arguably JSON doesn't have this problem at all since it encodes the field names too. The only thing it doesn't handle is field renames, but I mean, come on, you know you can't rename a field in public API anyways :)

    • DanielHB 3 hours ago

      I imagine compression of JSON also adds significant overhead compared to ProtoBuf on top of extra memory usage.

      I don't disagree that people go for ProtoBuf a bit too eagerly though.

    • secondcoming 2 hours ago

      A format cannot be both human and machine readable. JSON is human readable, that's the point of it. Human readability is great for debugging only but it has an overhead because it's not machine friendly. Protobuf messages are both smaller and quicker to decode. If you're in an environment where you're handling millions of messages per second binary formats pay dividends. The number of messages viewed by a human is miniscule so there's no real gain to having that slow path. Just write a message dump tool.

  • t1234s 11 hours ago

    speed is always good!

  • stephenlf 11 hours ago

    Yo nice

  • pyrolistical 12 hours ago

    > Optimizing the underlying temporary buffer

    So array list instead of array?

  • iouser 13 hours ago

    Did you run any tests/regressions against the security problems that are common with parsers? Seems like the solution might be at risk of creating CVEs later

    • rpearl 11 hours ago

      ...Do you think that v8 doesn't have tests for what might be one of the most executed userspace codepaths in the world?