A New ASN.1 API for Python

(blog.trailofbits.com)

160 points | by woodruffw 21 hours ago ago

78 comments

  • ChuckMcM 18 hours ago

    A small bit of historical context. When I was participating in the PKP meetings at RSADSI, I believe it was Ron who insisted that DER was the only reasonable choice if we were going to encode things with ASN.1 (which we were because both DEC and RSA had already insisted that it had to be OSI compatible or they wouldn't support it, my suggestion that we use Sun's XDR was soundly rebuked, but hey I had to offer)

    Generally it was presumed that because these were 'handshake' type steps (which is to say the prelude to establishing a cryptographic context for what would happen next) performance wasn't as important as determinism.

    • OhMeadhbh 13 hours ago

      oh. did i meet you there? i was contracting at RSADSI at the time and argued w/ Burt K. about how easy it was to mess up a general DER parser, much less an ASN.1 compiler. I remember we found about two bugs per week in ICL's compiler. Burt and Ron were BIG ASN.1 fans at the time and I could never figure out why. Ron kept pushing Burt and Bob Baldwin to include more generic ASN.1 features in BSAFE. Part of my misery during SET development can be directly traced to ICL's crappy ASN.1 compiler, yet it was probably the best one on the market at the time.

      Anywho... XDR isn't my favourite, but I would have definitely preferred it to DER/BER/ASN.1.

      Stop me before I make a CORBA reference.

      • ChuckMcM 11 hours ago

        > oh. did i meet you there?

        Probably :-). Ron was a huge fan of Roger Needham's (and, ngl, I was too) and Roger along with Andy Birrell and others were on a kick to make RPCs "seamless" so that you could reason about them like you did computer programs that were all local. Roger and I debated whether or not it was achievable (vs. desirable) at Cambridge when we had the PKI meeting there. We both agreed that computers would get fast and cheap enough that the value of having a canonical form on the wire vastly outweighed any disadvantage that "some" clients would have to conversion to put things in a native format they understood. (Andy wasn't convinced of that, at least at that time). But I believe that was the principle behind the insistence on ASN.1, determinism and canonical formats. Once you built the marshalling/unmarshalling libraries you could treat them as a constant tax on latency. That made analyzing state machines easier and debugging race conditions. Plus when they improved you could just replace the constants you used for the time it would take.

        • cryptonector 6 hours ago

          I wonder how much Needham had to do with Sun's AUTH_DH. It must have been Whit Diffie's baby, but if Needham was pushing RPC then I imagine there must have been interactions with Diffie.

          It turns out that one should not design protocols to require canonical encoding for things like signature verification. Just verify the signature over the blob being signed as it is, and only then decode. Much like nowadays we understand that encrypt-then-MAC is better than MAC-then-encrypt. (Kerberos gets away with MAC-then-encrypt because nowadays its cryptosystems use AES in ciphertext stealing mode and with confounders, so it never needs padding, so there's no padding oracle in that MAC-then-encrypt construction. Speaking of Kerberos, it's based on Needham-Schroeder... Sun must have been a fun place back then. It still was when I was there much much later.)

          • ChuckMcM 5 hours ago

            As I recall not much. (I wrote much of the original the AUTH_DH code with Whit's help if you're wondering and a bunch NIS+)

            • cryptonector 5 hours ago

              Oh man, I touched mech_dh occasionally, and u/lukeh and I have talked about doing a modern version that uses DNSSEC and/or certificates.

    • tptacek 13 hours ago

      One of the few concessions I'll make to Sun: XDR was under-appreciated.

      • cryptonector 6 hours ago

        XDR is like a four-octet aligned version of PER for a cut-down version of ASN.1. It's really neat.

        XDR would not need much work to be a full-blown ER for ASN.1... But XDR is extremely inefficient as to booleans (4 bytes per!) and optional fields (since they are encoded as a 4-byte boolean followed by the value if the field is present).

      • ChuckMcM 11 hours ago

        You can thank Tom Lyon for it. Tom pretty much did the entire RPC/XDR/NFS stack to kick things off.

  • time4tea an hour ago

    20+ years ago used ASN.1 for talking between micro services. (HTTP Services, as they were called then) Very performant. Had to buy OSS tools licence but other than that quite nice.

  • orthecreedence 15 hours ago

    I was writing a cryptographically-inclined system with serialization in msgpack. At one point, I upgraded the libraries I was using and all my signatures started breaking because the msgpack library started using a different representation under the hood for some of my data structures. That's when I did some research and found ASN.1 DER and haven't really looked back since switching over to it. If you plan on signing your data structures and don't want to implement your own serialization format, give ASN.1 DER a look.

    • amluto 7 hours ago

      If you are planning to sign your data structures, IMO your first choice should be to sign byte strings: be explicit that the thing that is signed is a specific string of bytes (which cryptographic protocol people love to call octets). Anything interpreting the signed data needs to start with those bytes and interpret them — do NOT assume that, just because you have some data structure that you think serializes to those bytes, then that data structure is authentic.

      Many, many cryptographic disasters would have been avoided by following the advice above.

      • RainyDayTmrw 5 hours ago

        That matches the advice from Latacora[1]. That advice makes a lot of sense from a security correctness and surface area perspective.

        There's a potential developer experience and efficiency concern, though. This likely forces two deserialization operations, and therefore two big memory copies, once for deserializing the envelope and once for deserializing the inner message. If we assume that most of the outer message is the inner message, and relatively little of it is the signature or MAC, then our extra memory copy is for almost the full length of the full message.

        [1]: https://www.latacora.com/blog/2019/07/24/how-not-to/

  • nicce 20 hours ago

    There is also rasn library for Rust that now supports most of the codecs (BER/CER/DER/PER/APER/OER/COER/JER/XER).

    Disclaimer: I have contributed a lot recently. OER codec (modern flair of ASN.1) is very optimized (almost as much as it can be with safe Rust and without CPU specific stuff). I am still working with benchmarking results, which I plan to share in close future. But it starts to be the fastest there is in open-source world. It is also faster than Google's Protobuf libraries or any Protobuf library in Rust. (naive comparison, no reflection support). Hopefully other codecs could be optimized too.

    [1] https://github.com/librasn/rasn

    • cryptonector 6 hours ago

      Neat!

      I do object to the idea that one should manually map ASN.1 to Rust (or any other language) type definitions because that conversion is going to be error-prone. I'd rather have a compiler that generates everything. It seems that rasn has that, yes? https://github.com/librasn/compiler

      • XAMPPRocky 4 hours ago

        Correct, the compiler allows you to generate the Rust bindings automatically. Worth noting that the compiler is at an earlier stage of development (the library was started six years ago, the compiler started roughly two years ago). So there are features that aren't used or supported by the compiler that are available in the library.

        Yes writing the definitions by hand can time consuming and error-prone, but I designed the library in mind around the declarative API to make it easy to both manually write and generate, I also personally prefer writing Rust whenever possible, so nowadays I would sooner write an ASN.1 module in Rust and then if needed build a generator for the ASN.1 textual representation than write ASN.1 directly since I get access to much better and stronger tooling.

        Also in my research when designing the crate, there are often requests in other ASN.1 or X.509 libraries to allow decoding semantically invalid messages because in the wild there are often services sending incorrect data, and so I designed rasn to allow you mix and match and easily build your own ASN.1 types from definitions so that when you do need something bespoke, it's easy and safe.

    • lilyball 10 hours ago

      This one looks interesting. A few years ago I looked at all of the Rust ASN.1 libraries I could find and they all had various issues. I'm a little surprised I didn't find this one.

  • flowerthoughts 17 hours ago

    Related: if you ever want to create your own serialization format, please at least have a cursory look at the basics of ASN.1. It's very complete both in terms of textual descriptions (how it started) and breadth of encoding rules (because it's practical.)

    (You can skip the classes and macros, though they are indeed cool...)

    • tptacek 15 hours ago

      This sounds dangerously like a suggestion that more people use ASN.1.

      • Ekaros 3 hours ago

        Understanding prior art and getting more comprehensive list of things that need to be considered is always good.

        Not doing it is like inventing new programming language after just learning one of them.

      • cryptonector 6 hours ago

        Would you rather they reinvent the wheel badly? Thjat's what ProtocolBuffers is: badly reinvented ASN.1/DER!

        PB is:

          - TLV (tag-length-value), like DER
          - you have to explicitly list the
            tags in the IDL as if it was ASN.1
            in 1984 (but actually, worse,
            because even back then tags were
            not always required in ASN.1, only
            for diambiguation)
          - it's super similar to DER, yet not
            not the same
          - PB was created in part because ASN.1
            had so little open source tooling,
            but PB had none until they wrote it
            so they could just have written the
            ASN.1 tooling they'd wished they had
        
        smh
        • RainyDayTmrw 5 hours ago

          In complete fairnes to PBs, PBs have a heck of a lot less surface area than ASN.1. You could argue, why not use a subset of ASN.1, but it seems people have trouble agreeing which subset to use.

        • mort96 2 hours ago

          Why wouldn't you want to explicitly number fields? Protocols evolve and get extended over time, making the numbering explicit ensures that there's no accidental backwards compat breakage from re-ordering fields. Implicit field numbers sounds like an excellent reason to not use ASN.1.

          This shilling for an over-engineered 80s encoding ecosystem that nobody uses is really putting me off.

        • flowerthoughts 5 hours ago

          The one thing that grinds my gears about BER/CER/DER is that they managed to come up with two different varint encoding schemes for the tag and length.

          • cryptonector 5 hours ago

            Meh. One rarely ever needs tags larger than 30, and even more seldom tags larger than twice that, say.

      • RainyDayTmrw 12 hours ago

        What should people use today, given the choice, that isn't ASN.1?

        Edited to add: If they need something with a canonical byte representation, for example for hashing or MAC purposes?

        • viraptor 9 hours ago

          How much of it do you need in that representation? Usually I see that need in either: x509 where you're already using der, or tiny fragments where a custom tag-length-value would cover almost every usage without having to touch asn.

          • RainyDayTmrw 5 hours ago

            All I really need is serialization for structs. I'm trying to avoid inventing my own format, because it seems to be footgun-prone.

        • wglb 10 hours ago
        • cryptonector 6 hours ago

          First of all you should never need a canonical representation. If you think you do, you're almost certainly wrong. In particular you should not design protocols so that you have to re-encode things in order to validate signatures.

          So then you don't need DER or anything like it.

          Second, ASN.1 is fantastic. You should at least study it a bit before you pick something else.

          Third, pick something you have good tooling for. I don't care if it's ASN.1, XDR, DCE RPC / MSRPC, JSON, CBOR, etc. Just make sure you have good tooling. And don't pick XML unless you really need it to interop with things that are already using XML.

          EDIT: I generally don't care about downvotes, but in this case I do. Which part of the above was objectionable? Point 1, 2, or 3? My negativity as to XML for protocols? XML for docs is alright.

  • johnisgood 20 hours ago

    Erlang also has great ASN.1 support. For the rest, I hope OSS Nokalva'a proprietary solutions will go away, eventually.

    For Java I used yafred's asn1-tool, which is apparently not available anymore. Other than that, it worked well.

    Originally it was available here: https://github.com/yafred/asn1-tool (archived: https://web.archive.org/web/20240416031004/https://github.co...)

    Any recommendations?

  • dikei 20 hours ago

    DER is still easy, UPER (unaligned packed encoding rules) is so much harder, yet it's prevalent in Telecom industry. Last I checked, there was no freely available tool than can handle UPER l00%

    • bryancoxwell 20 hours ago

      Not only is UPER hard to parse, but (I believe) 3GPP ASN1 definitions are provided only in .docx files which aren’t exactly the easiest to work with. It’s just really not a fun domain.

      • jeroenhd 20 hours ago

        The ASN.1 format itself isn't too bad. It shows its age and has some very weird decisions behind it as places, but it's not that difficult to encode and is quite efficient.

        Unfortunately, the protocols themselves can be confusing, badly (or worse: inconsistently) documented, and the binary formats often lack sensible backwards compatibility (or, even worse: optional backwards compatibility). Definitions are spread across different protocols (and versions thereof) and vendors within the space like to make their own custom protocols that are like the official standardised protocols, but slightly different in infuriating ways.

        If you parser works (something open source rarely cares about so good luck finding one for your platform), the definitions extracted from those DOCX files are probably the least of your challenges.

      • sobkas 13 hours ago

        First you can download specifications in either PDF or doc(x). Second doc(x) are simple enough that simple doc(x) to ASCII/text is good enough to produce working ASN.1 definition. Copy&paste is also an option.

    • masklinn 20 hours ago

      FWIW rasn linked above claims to support UPER, but I couldn't tell you how completely.

      • nicce 19 hours ago

        There are many tools that can handle UPER up to certain level (some rare ASN.1 types might not be supported). I think the main issue is not in the codec, rather the lack of compilers that can create correct language-level representation of the ASN.1 definitions. 3GPP specifications are enormous and you don't want to create them by hand. ASN.1 has some very difficult notations, e.g. inner subtype constraints and information object classes. Subtype constraints may affect for the encoding output in UPER and if you are not representing them correctly overall, then you are not compatible.

    • flowerthoughts 17 hours ago

      How come they don't (just) apply zlib on DER? Is telco equipment able to stream process UPER without buffering more than non-constructed values?

      • eqvinox 16 hours ago

        PER were defined in 1994; back then applying zlib wasn't something you "just" do. Modern use is backwards compatibility (or cargo cult.)

        • nicce 16 hours ago

          UPER is extremely compact encoding format. It still makes sense to use UPER, because after all, it is an international standard and telecommunication protocols itself are supposed to add as little overhead on top of actual payload as possible.

          For example, if you have ASN.1 UTF-8 string that is constrained to 52 specific characters - UPER encoding can present every character with 6 bits (not bytes).

          In modern world you can apply zlib on top of UPER encoding or internal payload, however, depending on the use case.

      • userbinator 15 hours ago

        When every bit passing through the network gets charged (if not to the customer, then it's taking up capacity that could otherwise be charged to the customer), and the software in the endpoints needs to be as low-power as possible, zlib is additional overhead you definitely don't want.

    • nly 13 hours ago

      asn1c claims to support "unaligned basic PER"

  • venamresm__ 20 hours ago

    In the ASN.1 space everyone hopes that someone can dethrone OSS Nokalva's proprietary solutions

    • woodruffw 20 hours ago

      I think it's context-dependent: I don't have insight into OSS Nokalva's use inside big companies, but in the Open Source world it certainly isn't dominant.

      In Open Source, I think the biggest ASN.1 implementations I come across are OpenSSL's, libtasn1, asn1c, and then per-language implementations like pyasn1.

      • venamresm__ 2 hours ago

        Most of the open source tools need patching to properly support certain scenarios (been there done that). They also lack support for parsing ASN.1 Value Notation format (textual), which is used everywhere in specifications, OSS Nokalva offers the full set of tools to handle this even with a playground and ASN.1 editor, this is non-existent in open source right now. For now the open source tools only focus on the crypto aspect, and doesn't really dive into telco, banking, biometric, and others.

      • cryptonector 6 hours ago
      • nicce 20 hours ago

        Basically any commercial ASN.1 compiler prevents usage of the output in any open-source project. There is that.

        • sobkas 13 hours ago

          Licence also prevent you from modifying generated code.

    • cryptonector 6 hours ago

      I don't have the time, though I do have the inclination, to finish Heimdal's ASN.1 compiler, which is already quite awesome. u/lukeh used Heimdal's ASN.1 compiler's JSON transformation of modules output to build an ASN.1 compiler and runtime for Swift.

    • memling 19 hours ago

      > In the ASN.1 space everyone hopes that someone can dethrone OSS Nokalva's proprietary solutions

      You're buying more than a compiler and runtime, though: you're also getting an SLA and a stricter guarantee about interoperability and bugs and so forth. I have no idea how good their support is (maybe it's atrocious?), but these are important. I had a client who relied on the open-sourced asn1c once who complained about some of the bugs they found in it; they got pushed into buying commercial when the cost-benefit outweighed the software licensing issues.

      • cryptonector 6 hours ago

        Meh. After all, if you're not using ASN.1 you're using something like ProtocolBuffers or FlatBuffers or whatever and all open source tooling.

  • woodrowbarlow 21 hours ago

    neat!

    related: you can also create wireshark dissectors from ASN.1 files

    https://www.wireshark.org/docs/wsdg_html_chunked/ASN1StepByS...

  • mootptr 10 hours ago

    Parser differential exploits are a understated problem, especially with ASN.1, which I didn't expect to see anyone thinking about. Kudos on this initiative!

    • kccqzy 8 hours ago

      I understand that it is a problem but I'm more used to seeing arguments that monocultures and single implementations are bad: WebSQL for example didn't become a standard because there was only a single implementation.

  • lilyball 10 hours ago

    Oh right, the asn1 crate, which supports CHOICE but only up to 3 alternatives, which means it can't even be used to implement X.509 certificate decoding. Makes me wonder what they're going to do when they get that far.

  • benatkin 6 hours ago

    From the post

    > with the help of funding from Alpha-Omega

    From the site:

    > funded by Microsoft, Google, and Amazon

    Also it's a Linux Foundation project.

    Interesting. Python's a big community, and there's some disagreement here over whether this would be better done in pure python. I think it's good that there's a rust/cloud contingent in python land but hope pure python remains popular.

  • dec0dedab0de 16 hours ago

    Does anyone miss when "pure python" was a selling point of your library? I understand the need for speed, but I wish it were more common to ship a compiled binary that does the thing fast, as well as the same algorithm in python to fall back on if it's not available.

    • smnrchrds 15 hours ago

      Pure Python was a huge selling point back when using a compiled library involved downloading and running random .exe files from someone's personal page on a university website. It is much less of a selling point now that we have binary wheels and uv/Poetry/etc. that create cross-platform lock files.

      I feel nostalgic seeing (a mirror of) that download page again, but that era was such a pain.

      Mirror: http://qiniucdn.star-lai.cn/portal/2016/09/05/tu3p2vd4znn

      • dec0dedab0de 15 hours ago

        I always thought the selling point of Pure Python was that you might be running on some esoteric implementation of python, or hardware that the library maintainer didn't include binaries for.

        I mean, I am glad wheels exist, they make things a lot easier for a lot of people. I just believed in the dream of targeting the language for maximum compatibility and hoping a better implementation would eventually run it faster.

        • kccqzy 8 hours ago

          Indeed. I was an early adopter of Google's App Engine using Python and at that time Pure Python is a must.

    • pjmlp 15 hours ago

      I rather find tragic that contrary to other dynamic languages, Python seems to fall under the curse of rewriting bindings into C and C++, or nowadays more fashionable Rust.

      And yes, Smalltalk, Self and various Lisp variants are just as dynamic.

      • woodruffw 15 hours ago

        Why is it tragic? It's more or less idiomatic in Python to put the hot or performance-sensitive paths of a package in native code; Rust has arguably made that into a much safer practice.

        • pjmlp 5 hours ago

          Because it forces mastering two languages, or depending on third party developers for anything that matters beyond basic OS scripting tasks.

          It became idiomatic as there was no other alternative.

      • foolswisdom 14 hours ago

        It's part of the original selling points of python, so it's not surprising that we've never stopped doing it.

        • pjmlp 5 hours ago

          As someone that has been using Python since version 1.6 that was certainly not one of the original selling points.

          Rather being a better Perl for UNIX scripts and Zope CMS, there was no other reason to use Python in 2000.

    • sharperguy 15 hours ago

      They should just distribute it in some kind of bytecode compiled language with JIT VM like java. Then at least it will be cross platform.

      • pjmlp 4 hours ago

        Yeah, I heard about this new one taken from browsers.

        It is supposed to fix everything that previous ones never achieved.

    • pphysch 15 hours ago

      I certainly don't miss needing to install additional system libraries in addition to my pip install.

      • pjmlp 4 hours ago

        In what way does having to compile a Rust library prevent this?