JSON for Classic C++

(github.com)

94 points | by davikr 20 hours ago ago

74 comments

  • conradev 18 hours ago

    I remember searching for a JSON library with minimal dependencies a while ago, and came across this:

    https://rawgit.com/miloyip/nativejson-benchmark/master/sampl...

    The variance in feature set, design and performance is huge across all of them. I ultimately landed on libjson, written in C: https://github.com/vincenthz/libjson

    It does a lot for you, but it notably does not build a tree for you and does not try to interpret numbers, which I found perfect for adding to languages with C FFI that have their own collection and number types. It’s also great for partial parsing if you need to do any sort of streaming.

    It looks like this one can’t currently do partial parsing, but it looks great if C++ maps/vectors are your target.

    • msarnoff 16 hours ago

      If you want to go extremely lightweight, there’s jsmn: https://github.com/zserge/jsmn

      It does no dynamic memory allocation, which is a plus in constrained IoT/embedded applications. But it’s really only a tokenizer. For example, if you want to parse fields out of a map, you have to write your own wrappers to iterate over key/value pairs. Since no data is copied out of the original buffer, all the “tokens” are given as byte offsets and lengths, not null-terminated strings, so you can’t just do printf(“%s”).

      If you can’t (or don’t want to) malloc, it gets the job done. Not sure I’d recommend it for other applications though.

      • conradev 6 hours ago

        I actually evaluated and used jsmn and almost mentioned it in my comment. It was really quite cool, but I believe I couldn’t use it due to the lack of UTF-8 validation. Because UTF-8 validation is in the state machine for libjson, I can actually ignore incomplete UTF-8 escape sequences in incomplete JSON strings when streaming.

      • jdougan 16 hours ago

        Is there a reason you can't do printf("%.*s", strlen, strptr); ?

        • deathanatos 4 hours ago

          A non-allocating library would be forced to return to you the unparsed string literal, since returning a parsed string can require an allocation. It might tell you that the literal is valid.

          E.g., take the JSON:

            "\"\uD83D\uDCA9\""
          
          There's no (pointer, length) into that that you can then printf(…, ptr, len); you'd get the escapes, raw.

          Ofc., there might be situations like debugging where that's fine.

        • jeffreygoesto 15 hours ago

          The terminating character is still the closing double quote and not a null, since the library does neither copy out nor alter the input. For example tiny_json replaces the closing quotes to create C strings, but that needs the full file to be in a mutable buffer which can be prohibitive for small controllers reading some config from flash only.

          • jdougan 13 hours ago

            With the "%.*s" format you need no null at the end. It just counts out the characters:

                #include <stdio.h>
            
                void main()
                {
                  char buff[10] = {'R', 'o', 'b', 'o', 't', 't', 'y', 'p', 'e', 's'};
                  printf("=>%.*s<=", 4, ((char *)&buff + 3));
                }
            
            prints

                =>otty<=
            • jeffreygoesto 10 hours ago

              Ah. Ok. Scanning the length before printing is mandatory then.

              • jdougan 3 hours ago

                @msarnoff had already stipulated that the json lib is returning lengths:

                    all the “tokens” are given as byte offsets and lengths
              • ska 7 hours ago

                You have to do (a variant of) one or the other, no?

        • msarnoff 7 hours ago

          Yes, that’s exactly what I’ve done.

      • rurban 16 hours ago

        I settled on this one too.

        Far better than nicklohmann's monster build times.

    • nwpierce 17 hours ago

      Somehow I didn't run across that one in my searching - I'll check it out. I've been working on a json C library myself:

      https://github.com/nwpierce/jsb

      My goal was to convert a stream of JSON to/from a binary stream that is easier to traverse and manipulate.

    • alex_suzuki 17 hours ago

      Did you try cJSON? Works well for me. https://github.com/DaveGamble/cJSON

  • darknavi 18 hours ago

    Compile time is largely a "developer problem", but so is the usability of a library. nlohmann/json's main perk that it is selling is that it's interface is usable. Whether or not a developer values usability at typing time vs compile time is an interesting thing to ponder for sure.

    • jart 16 hours ago

      Compile time is a collective problem and usability is an individual problem. I work with llama.cpp. The files in that codebase that were made using nlohmann json take about a minute to compile using g++ -O3 -g, all because one guy who originally wrote it wanted to type fewer keystrokes on his keyboard by using a more magical library, and the rest of us have to suffer for it every time we experiment with a 1 line of code change to those files.

      • chipdart 16 hours ago

        > (...) and the rest of us have to suffer for it every time we experiment with a 1 line of code change to those files.

        If you feel this is an issue then why don't you move it to an independent submodule that can be compiled independently? That means you can build it in parallel along with the whole project, and in the end you just link the resulting binaries.

        • fsloth 15 hours ago

          ” If you feel this is an issue then why don't you move it to an independent submodule that can be compiled independently?”

          If it’s a header, you necessarily can’t. Header gets included every time you want to compile code that depends on a header.

          Compilers may offer precompilation etc but if the code you want to change has direct dependency to a large header you need to recompile all of the dependencies.

          This is one of the painpoints C++.

          • pjmlp 15 hours ago

            It is a pain point of build management regardless of the language, even with a language having proper modules one can have a cascade build, if the public interface or module ABI is impacted.

            C++ modules are here, unfortunely outside VC++ and clang latest, plus MSBuild or CMake/ninja, they are not an option.

            • papichulo2023 13 hours ago

              Are they? According to some people (github issue to support cpp modules on vscode) the standard is mess and is likely to go away. VSCode doesnt support modules atm.

              • int_19h 5 hours ago

                Assuming that you're referring to https://github.com/microsoft/vscode-cpptools/issues/6302, I see two comments along these lines, neither of which is from actual implementers. That isn't evidence either that the standard is a mess, nor that it's likely to go away.

                The reason why this is taking such a long time is because the entire approach is a rather drastic change to how C++ compilers usually work, and C++ compilers (or even frontends, such as the stuff used by IDEs) are complicated things that aren't trivial to make major changes to.

                • papichulo2023 2 hours ago

                  Yeah but the people implementing it are prob more interested on you moving to VS, so there a slighty conflict of interest, I appreciate their work but I also a bit sceptical and think this is the main reason behind it. 6yo is a lot of time for such an important feature. They dont they to support it on all compilers/frontends in order to release it.

              • pjmlp 9 hours ago

                Visual Studio is what matters.

                VSCode is never going to be as good, you are better of with Clion then.

                • fsloth 8 hours ago

                  This!

                  As win/mac user Visual Studio is my preferred tool, but in MacOS Clion (with vscode for few random workflow things not supported in Clion) is an adequate replacement (but Visual Studio remains king).

                  VSCode can be used as an industrial editor if one likes to, but if it does not feel right, it’s not a skill issue.

        • jart 16 hours ago

          I just wrote a new server instead. There's nothing I won't do, no lengths I'm not willing to go, when it comes to cutting back on build latency.

          • gary_0 16 hours ago

            I follow the same philosophy, to the point where at this point I barely use the STL; most of that template-heavy junk has been replaced in most of my projects. For instance, most of what I typically used <iostream> for was replaced with a 150-line .h (plus a 50-line .cpp that uses explicit template insantiation and a <charconv> include). {fmt} was too heavy for me. And I'm locked into C++17 because C++20 seems to double down on the 20k-line header madness.

            When I was stuck with C++ codebases that forced me to take a mandatory coffee break every time I needed to run a bit of new code, it made me a little bit insane! Never again.

          • chipdart 16 hours ago

            > I just wrote a new server instead.

            I'm sorry, this makes no sense at all. Why would anyone write a new server just because a small component was taking a minute to build?

            • wiseowise 14 hours ago

              It makes no sense at all that person concerned with slow build time rewrote slow component to compile faster?

      • hnlmorg 14 hours ago

        As a prolific contributor to open source yourself, I’d have expected you to be a little more sympathetic to other open source developers giving up their time freely.

        For some contributors, they’ll have a day job, a family and other personal commitments. so writing open source code is a luxury they don’t have a lot of time for. I know this because I fall exactly into that camp myself.

        • jart 14 hours ago

          I'm defending open source developers. We can't freely modify open source code if it has glacial build times. It's specifically because people are volunteering that we should aim to be as conscientious as possible when it comes to build latency. Someone who volunteers to contribute code that compiles slowly is not being respectful of the time of all the other volunteers, which is like pumping the brakes on the open source movement. So I will make my views clear that development practices need to improve.

        • wiseowise 14 hours ago

          Just because they give up their time freely it makes their decisions immune to criticism?

          • hnlmorg 14 hours ago

            Constructive feedback is fine. jarts comment wasn’t that.

            • wiseowise 14 hours ago

              We will never know if jarts comment was constructive or not until we know original developers decision process.

              If original decision process was indeed “less keystrokes”, then how is that not a constructive criticism?

              • gr4vityWall 6 hours ago

                I don't think a supposedly bad decision has to be answered with being snarky. A pull request, or a fork focused on reducing build times are actual net gains. From that poster's original name, seems like they went on and did just that, which is great I believe.

                At the very least, giving the original developer the benefit of doubt, or assuming their decision made sense under the circumstances they were in at the time, is IMO a better start than just public criticism.

              • hnlmorg 14 hours ago

                The developers motives doesn’t change the snarky way jart wrote their comment.

                And if you felt their comment was acceptable then I question how much you’ve contributed to open source yourself. Snarky comments like jarts are all too common and really demotivate people from maintaining popular projects.

                But don’t just take my word on it, there’s a plethora of other contributors who’ve talked about this topic as well.

                • wiseowise 24 minutes ago

                  What difference does it even make if it’s an open source project or not? Compile times are a big deal.

                • TeMPOraL 14 hours ago

                  Where's the snark though? jart's comment reads true literally.

                  Compile times are a big deal, and 'jart is right about individual vs. collective problems. And unlike most other critics on the Internet, 'jart actually provided a solution along with the criticism. If that kind of behavior "demotivates [some] people from maintaining popular projects", I still feel it's a net win.

      • occz 15 hours ago

        While the pursuit of faster build times is definitely a worthy cause, I feel like there's something I'm not quite seeing here. Does the JSON-code change frequently enough to incur build cache misses and the full minute penalty? Is there something inherent about the structure of the library that makes it unable to have its compilation be cached? Is the code structured in such a way that editing other code requires also invalidating the cache for the JSON-related code? I guess one way would be to break out the JSON parsing code to its own module and have it produce language-specific structs to be interacted with by the rest of the program.

        • jart 15 hours ago

          Programming is the process of manipulating data structures, so if you're building a JSON server, then every piece of code in your server is going to be dealing with and operating on JSON data structures. It can't be neatly tucked away in a corner. Because it would be foolish to design a server that makes needless copies of all its inputs and outputs. This truth would be the same if you were using something like protobuf instead. Therefore it's important that your fundamental data structures be something that (a) you can control, and (b) doesn't make everything it touches take forever to build. Do you feel in control of someone else's 24000 line header full of template magic? If that thing is sitting between me and my data structures, then I will wipe it out of existence.

        • mort96 15 hours ago

          It seems like nlohmann/json is a header-only library, meaning the entire library has to be compiled once for every source file which uses it any time that source file or its includes has updated.

          So I guess in a JSON-heavy code base or a code base where nlohmann/json has leaked into common headers, you may end up recompiling the library a few dozen times per build where a few dozen of your C++ source files must be recompiled (e.g due to common header changes)...

          (But don't worry, the linker will then spend a bunch of time throwing away almost all of that work so you only get one copy of the library in your binary)

          • occz 14 hours ago

            I missed that part. That is a pretty significant downside in that case.

        • wiseowise 14 hours ago

          > Does the JSON-code change frequently enough to incur build cache misses and the full minute penalty?

          The moment you switch branches - it changes.

          If you develop for Android - it generates build for with hash name from some CMake/Gradle variables, the moment one of those changes (like AGP version) you get a new build dir and essentially have to compile from scratch.

          • occz 14 hours ago

            If you're on something reasonably smart like Bazel it will be able to determine whether the module itself has been changed and requires recompilation instead of running from cache.

            • wiseowise 8 hours ago

              Nice.

              We, and majority of Android projects, aren’t on Bazel, though.

              • occz 7 hours ago

                This is true, and it's kind of a bummer to be honest. There's some serious time being wasted on recompilation that could be avoided with a really sharp build system.

                Bazel comes with its own bag of sharp edges though so it's unfortunately not like you can just adopt it and be on your merry way.

    • chipdart 16 hours ago

      > Compile time is largely a "developer problem", but so is the usability of a library.

      Compiler time is way more than a "developer problem". It's an operational problem that ends up permeating to software architecture and development practices, and ultimately affects how the whole project is delivered and deployed.

    • marmakoide 12 hours ago

      Significantly faster compilation means less friction to iterate ideas, try things, which in the end lead to more polished results.

      A nice interface is agreable, but maybe there are diminishing returns when you pay it with large compile time. I remember pondering about that when working with the Eigen math library, which is very nice but such a resource hog when you compile a project using it.

  • henshao 18 hours ago

    Really interesting that nlohmann isn't fully compliant. What cases are these?

    It seems to me though that if you're encountering the edges of json where nlohmann or simple parsing doesn't work properly, a binary format might be better. And if you're trying to serialize so much data that speed actually becomes an issue, then again, binary format might be what you really want.

    The killer feature of nlohmann are the the NLOHMANN_DEFINE_TYPE_INTRUSIVE or NLOHMANN_DEFINE_TYPE_NON_INTRUSIVE macros that handle all of the ??? -> json -> ??? steps for you. That alone make it my default go to unless the above reasons force me to go another direction.

  • leni536 15 hours ago

    On the other end of the spectrum there is [1]. It's both performance and usability oriented, although compile times are probably higher.

    Nlohmann is the slowest out of the popular libraries, AFAIK, and not particularly more usable than rapidjson, in my experience. So "better than nlohmann" is not very novel.

    [1] https://github.com/beached/daw_json_link

  • psyclobe 16 hours ago

    The moment nlohmann's library came out, I switched to it and I never looked back.

    I loved the interface and its exactly how I would've designed a json library with modern c++.

    Just maybe turn off the implicit conversion option, that can get a bit messy ;)

  • zeroq 16 hours ago

    "This project is a reaction agains..." is such a punk move I can't do anything but appreciate.

  • cod1r 18 hours ago

    jart is such a good programmer. a lot of people already know this but i just have to give props where it's due.

  • makz 18 hours ago

    What does “Classic C++” mean?

    • jll29 17 hours ago

      This library is nicely concise, and the code is mostly readable (although there are some non-obvious tricks that could be better documented).

      The Makefile could need some work:

        json_test.cpp:360:23: warning: missing terminating '"' character [-Winvalid-pp-token]
        { Json::success, R"({
                              ^
        fatal error: too many errors emitted, stopping now [-ferror-limit=]
        9 warnings and 20 errors generated.
        make: *** [json_test.o] Error 1
        % c++ --version                   
        Apple clang version 15.0.0 (clang-1500.1.0.2.5)
        Target: arm64-apple-darwin22.6.0
        Thread model: posix
        InstalledDir: /Library/Developer/CommandLineTools/usr/bin
      
      Compiling direclty with

        c++ --std=c++11 -c json.cpp
      
      works fine, though.
    • jandrewrogers 17 hours ago

      There are approximately three major dialects of C++. They are distinguished by major changes in what idiomatic code looks like, enabled by the addition of core features to the language that made it more efficient and type-safe to express many things.

      The era of so-called “modern” C++ started with C++11, which was a radical reworking of the language. All prior versions of C++ are “legacy” or “classic”. Idiomatic code in “modern” and “classic” dialects almost look like different languages.

      C++20 arguably marks a new dialect break but it doesn’t have a colloquial label to distinguish it from “legacy” and “modern” AFAIK. Idiomatic C++20 looks pretty foreign from a C++11 perspective (but is unambiguously an improvement).

      • int_19h 5 hours ago

        I don't think it's entirely accurate. "Modern idiomatic C++" was a thing already before C++11 - that would be the kind of code that heavily used the standard library and especially STL containers, iterators etc (but also stuff like auto_ptr etc; and yes, for all its flaws, it was actually used).

        And don't forget that C++03 TR1 also added a bunch of very useful stuff - most notably, std::shared_ptr and std::function. And, of course, Boost has been a thing long before C++11, filling many gaps for "modern C++" projects of the time.

        "classic C++" from that perspective is C++ written more or less Java-style.

      • jart 17 hours ago

        This library supports building with C++11. I haven't tried compiling it with an older standard, but I imagine it might work. One thing I like about the C++11 compilers like GCC 4.9 is they build code magnificently faster than recent editions. See https://x.com/JustineTunney/status/1795427808631758936

        • aninteger 16 hours ago

          > This library supports building with C++11. I haven't tried compiling it with an older standard, but I imagine it might work.

          I believe it does require C++11, due to std::nullptr_t and r-value references (&&), but that might be it. It's not a show stopper though since everyone should have a c++11 compiler now (even Ubuntu 14.04 LTS, which still has paid support I believe).

          > One thing I like about the C++11 compilers like GCC 4.9 is they build code magnificently faster than recent editions

          Kind of reminds me of gcc 2.95 which people kept around for the compiler speed. They would use gcc 3.x for the warning support and then compile with gcc 2.95 after fixing the warnings :).

          • jart 16 hours ago

            Yes they'd be very trivial to remove locally. It might also be nice to have #ifdef statements around them like we're already doing for std::string_view. If we consider that many big name C projects like curl are still on C89 then there's surely got to be people still out there using 2000's era C++.

          • chipdart 16 hours ago

            > It's not a show stopper though since everyone should have a c++11 compiler now (...)

            I think the point of pointing out it's C++11 is that it's not "classic C++" as it's using "modern C++" features. Thus it's a mystery why it would be referred to as classic C++.

            • jart 16 hours ago

              Just because I included an rvalue constructor doesn't make it C++11. This library was originally written in C. It hasn't changed a whole lot since Gautham and I originally wrote it: https://github.com/jart/cosmopolitan/blob/master/tool/net/lj... I feel perfectly comfortable calling C++11 "classic" or even "baroque" compared to what people are doing with C++ in 2024. However if you disagree with me, and feel that classic means C++03, then I've made certain that your preferences are supported by this library too. Just remove the rvalue and nullptr_t constructors. I'll probably add #ifdefs soon to automate that too.

              • chipdart 3 hours ago

                > Just because I included an rvalue constructor doesn't make it C++11.

                Actually, it does. I mean, does it compile when you pass -std=c++98?

                > This library was originally written in C.

                Doesn't matter. If it uses C++11 features, it's C++11.

                > I feel perfectly comfortable calling C++11 "classic" or even "baroque" compared to what people are doing with C++ in 2024.

                Irrelevant. You can go the Humpty Dumpty way as far as you want to go and call anything any way. It doesn't matter. If you use C++11 features, it's C++11. If it's C++11 then you're discussing modern C++. You don't need to use all bells and whistles to quality.

    • pbrowne011 18 hours ago

      "Classic C++" and "Modern C++" refer to the language before and after C++11, respectively.

      Some of the key differences are use of standard library and its containers, smart pointers, and other language features that look less like C. In this specific library, this refers to some of the techniques like bit manipulation, manual memory management and string parsing, and using things like enums to improve speed and reduce complexity.

      An example of a more robust (but still "classic") library would be something like https://github.com/Tencent/rapidjson.

  • 0x1ceb00da 18 hours ago

    https://github.com/jart/json.cpp/blob/4f0a02dab1af7d81888cf5...

    The response doesn't tell you the location of the problem in the input.

    • jart 17 hours ago

      That might actually be the explanation for why json.cpp benchmarks 39x faster than nlohmann's library if I include the failure test cases.

  • ur-whale 18 hours ago

    Code in jart's version is refreshingly clean and easy to read compared the nlohmann's version.

    As an aside, I wonder: what are the ThomPike* set of macros actually doing in jart's implem ?

    Also, a speed comparison of this vs the other one would be very welcome: conformance and simplicity are certainly important criteria when picking a JSON parser, but speed is rather crucial.

    • jart 17 hours ago

      Thompson Pike encoding. It predates the UTF-8 standard and was invented on a napkin in a New Jersey diner. It allows the full spectrum of 32-bit numbers to be encoded, rather than restricting characters to only those also present in UTF-16. The json.cpp library enforces UTF-8 restrictions on parsing, because we have no choice. But you're allowed to serialize anything you want, thanks to the ThomPike macros.

  • tsurnyc 18 hours ago

    What are the performance numbers? nlohmann/json is no speed demon.

  • UncleOxidant 17 hours ago

    Sounds like there's a backlash to modern C++.

  • nurettin 14 hours ago

    This is a fine library, but I use nlohmann extensively and haven't experienced any considerable compilation slowdown once I added it to the project.

    Overloading from_json to modularize parsing is really useful, I think that should be a part of every templated C++ json parser library.

    That said, I have seen these ThomPike* macros in cosmopolitan.h before, I wonder what the origin is.

  • madduci 16 hours ago

    Interesting approach, but without providing a conan/vcpkg in (the end of) 2024, makes only friction.

    We are not living in 90s anymore..

    • epcoa 16 hours ago

      Dunking on nlohmann for performance is pretty easy. I’m interested in what the value proposition is over one of rapidjson, glaze, or simdjson (all of which have some amount of SIMD or SWAR optimization, and more importantly SAX and the use of something other than std::map)