A New ASN.1 API for Python

(blog.trailofbits.com)

171 points | by woodruffw 3 months ago ago

132 comments

ChuckMcM 3 months ago

A small bit of historical context. When I was participating in the PKP meetings at RSADSI, I believe it was Ron who insisted that DER was the only reasonable choice if we were going to encode things with ASN.1 (which we were because both DEC and RSA had already insisted that it had to be OSI compatible or they wouldn't support it, my suggestion that we use Sun's XDR was soundly rebuked, but hey I had to offer)

Generally it was presumed that because these were 'handshake' type steps (which is to say the prelude to establishing a cryptographic context for what would happen next) performance wasn't as important as determinism.

[-]

OhMeadhbh 3 months ago

oh. did i meet you there? i was contracting at RSADSI at the time and argued w/ Burt K. about how easy it was to mess up a general DER parser, much less an ASN.1 compiler. I remember we found about two bugs per week in ICL's compiler. Burt and Ron were BIG ASN.1 fans at the time and I could never figure out why. Ron kept pushing Burt and Bob Baldwin to include more generic ASN.1 features in BSAFE. Part of my misery during SET development can be directly traced to ICL's crappy ASN.1 compiler, yet it was probably the best one on the market at the time.

Anywho... XDR isn't my favourite, but I would have definitely preferred it to DER/BER/ASN.1.

Stop me before I make a CORBA reference.

[-]

ChuckMcM 3 months ago

> oh. did i meet you there?

Probably :-). Ron was a huge fan of Roger Needham's (and, ngl, I was too) and Roger along with Andy Birrell and others were on a kick to make RPCs "seamless" so that you could reason about them like you did computer programs that were all local. Roger and I debated whether or not it was achievable (vs. desirable) at Cambridge when we had the PKI meeting there. We both agreed that computers would get fast and cheap enough that the value of having a canonical form on the wire vastly outweighed any disadvantage that "some" clients would have to conversion to put things in a native format they understood. (Andy wasn't convinced of that, at least at that time). But I believe that was the principle behind the insistence on ASN.1, determinism and canonical formats. Once you built the marshalling/unmarshalling libraries you could treat them as a constant tax on latency. That made analyzing state machines easier and debugging race conditions. Plus when they improved you could just replace the constants you used for the time it would take.

[-]

cryptonector 3 months ago

I wonder how much Needham had to do with Sun's AUTH_DH. It must have been Whit Diffie's baby, but if Needham was pushing RPC then I imagine there must have been interactions with Diffie.

It turns out that one should not design protocols to require canonical encoding for things like signature verification. Just verify the signature over the blob being signed as it is, and only then decode. Much like nowadays we understand that encrypt-then-MAC is better than MAC-then-encrypt. (Kerberos gets away with MAC-then-encrypt because nowadays its cryptosystems use AES in ciphertext stealing mode and with confounders, so it never needs padding, so there's no padding oracle in that MAC-then-encrypt construction. Speaking of Kerberos, it's based on Needham-Schroeder... Sun must have been a fun place back then. It still was when I was there much much later.)

[-]

ChuckMcM 3 months ago

As I recall not much. (I wrote much of the original the AUTH_DH code with Whit's help if you're wondering and a bunch NIS+)

[-]

cryptonector 3 months ago

Oh man, I touched mech_dh occasionally, and u/lukeh and I have talked about doing a modern version that uses DNSSEC and/or certificates.

tptacek 3 months ago

One of the few concessions I'll make to Sun: XDR was under-appreciated.

[-]

ChuckMcM 3 months ago

You can thank Tom Lyon for it. Tom pretty much did the entire RPC/XDR/NFS stack to kick things off.

cryptonector 3 months ago

XDR is like a four-octet aligned version of PER for a cut-down version of ASN.1. It's really neat.

XDR would not need much work to be a full-blown ER for ASN.1... But XDR is extremely inefficient as to booleans (4 bytes per!) and optional fields (since they are encoded as a 4-byte boolean followed by the value if the field is present).

orthecreedence 3 months ago

I was writing a cryptographically-inclined system with serialization in msgpack. At one point, I upgraded the libraries I was using and all my signatures started breaking because the msgpack library started using a different representation under the hood for some of my data structures. That's when I did some research and found ASN.1 DER and haven't really looked back since switching over to it. If you plan on signing your data structures and don't want to implement your own serialization format, give ASN.1 DER a look.

[-]

amluto 3 months ago

If you are planning to sign your data structures, IMO your first choice should be to sign byte strings: be explicit that the thing that is signed is a specific string of bytes (which cryptographic protocol people love to call octets). Anything interpreting the signed data needs to start with those bytes and interpret them — do NOT assume that, just because you have some data structure that you think serializes to those bytes, then that data structure is authentic.

Many, many cryptographic disasters would have been avoided by following the advice above.

[-]

RainyDayTmrw 3 months ago

That matches the advice from Latacora[1]. That advice makes a lot of sense from a security correctness and surface area perspective.

There's a potential developer experience and efficiency concern, though. This likely forces two deserialization operations, and therefore two big memory copies, once for deserializing the envelope and once for deserializing the inner message. If we assume that most of the outer message is the inner message, and relatively little of it is the signature or MAC, then our extra memory copy is for almost the full length of the full message.

[1]: https://www.latacora.com/blog/2019/07/24/how-not-to/

[-]

amluto 3 months ago

There are a few serialization/deserialization systems that are close enough to zero-copy that this has no overhead. Cap’n Proto and FlatBuffers were designed around roughly this idea. Even some protobuf implementations allow in-place reads of bytes.

[-]

RainyDayTmrw 3 months ago

Does this work with nesting? I guess, if reading the envelope gets you a pointer to the inner buffer, you can pass that to another read operation. If that can be done safely (with the library ensuring the appropriate checks before it casts/transmutes), that would be very powerful.

[-]

amluto 3 months ago

It should work fine. In C and C++, it's straightforward to YOLO it -- all you need is a pointer and a length. Rust can do more or less the same thing but with compiler-enforced safety. Many GC/ARC languages can efficiently handle slicing buffers, and it mostly comes down to library design. (Even Python can do this, although you generally pay for it in the rather large cost of every other operation...)

[-]

RainyDayTmrw 3 months ago

It took me a few tries to convince myself, but I think I agree with you that Rust can do this with lifetimes and unsafe. Importantly, the unsafe is self-contained, can be reliably generated by macros or codegen, and the end user doesn't have to muck with it.

zzo38computer 3 months ago

> If you are planning to sign your data structures, IMO your first choice should be to sign byte strings

Yes, that is right; but, the byte sequence can be the canonical form of the data structure, and DER is canonical form.

nicce 3 months ago

There is also rasn library for Rust that now supports most of the codecs (BER/CER/DER/PER/APER/OER/COER/JER/XER).

Disclaimer: I have contributed a lot recently. OER codec (modern flair of ASN.1) is very optimized (almost as much as it can be with safe Rust and without CPU specific stuff). I am still working with benchmarking results, which I plan to share in close future. But it starts to be the fastest there is in open-source world. It is also faster than Google's Protobuf libraries or any Protobuf library in Rust. (naive comparison, no reflection support). Hopefully other codecs could be optimized too.

[1] https://github.com/librasn/rasn

[-]

cryptonector 3 months ago

Neat!

I do object to the idea that one should manually map ASN.1 to Rust (or any other language) type definitions because that conversion is going to be error-prone. I'd rather have a compiler that generates everything. It seems that rasn has that, yes? https://github.com/librasn/compiler

[-]

XAMPPRocky 3 months ago

Correct, the compiler allows you to generate the Rust bindings automatically. Worth noting that the compiler is at an earlier stage of development (the library was started six years ago, the compiler started roughly two years ago). So there are features that aren't used or supported by the compiler that are available in the library.

Yes writing the definitions by hand can time consuming and error-prone, but I designed the library in mind around the declarative API to make it easy to both manually write and generate, I also personally prefer writing Rust whenever possible, so nowadays I would sooner write an ASN.1 module in Rust and then if needed build a generator for the ASN.1 textual representation than write ASN.1 directly since I get access to much better and stronger tooling.

Also in my research when designing the crate, there are often requests in other ASN.1 or X.509 libraries to allow decoding semantically invalid messages because in the wild there are often services sending incorrect data, and so I designed rasn to allow you mix and match and easily build your own ASN.1 types from definitions so that when you do need something bespoke, it's easy and safe.

[-]

nicce 3 months ago

> Yes writing the definitions by hand can time consuming and error-prone

With the proper environment, this isn't that time consuming or error-prone anymore, based on my recent experiments. When I initially started exploring with the library, it was a bit difficult at first (mostly because back then there was no reference documentation and you had to rely on tests), but after some time API gets easy to understand. Actual type definitions remain very small because of the derive macros.

Nowadays LLMs are pretty good at handling ASN.1 definitions. If you provide the context correctly, by giving e.g. the reference from the rasn's README, maybe even some type definitions from the library itself, and just give the ASN.1 schemas, LLMs can generate the bindings very accurately. The workflow switches from being the manual writer to be just the reviewer. Compiler is also already good at generating the basic definitions on most cases.

I really like the idea that then standard can be published as a crate, and nobody needs to ever touch for the same standard again, unless there is a bug or standard gets an update. Crates can be then used with different open-source projects with the same types. What I known about commercial ASN.1 usage, different companies buy these products to compile the same standards over and over again.

I bet that there are no many companies that define their internal APIs with ASN.1 and buys commercial tool just to support ASN.1 usage in their own business, without any international standards included.

[-]

cryptonector 3 months ago

> > Yes writing the definitions by hand can time consuming and error-prone

> With the proper environment, this isn't that time consuming or error-prone anymore, based on my recent experiments.

If you have modules with automatic, and manual IMPLICIT and EXPLICIT tagging and you fail to translate those things correctly then you can trivially make mistakes that cause your implementation to not interoperate.

lilyball 3 months ago

This one looks interesting. A few years ago I looked at all of the Rust ASN.1 libraries I could find and they all had various issues. I'm a little surprised I didn't find this one.

3 months ago

[deleted]

timewizard 3 months ago

[flagged]

[-]

mananaysiempre 3 months ago

The Python library proposed in TFA is to be based on a different, DER-only Rust ASN.1 library[1]. So ASN.1 in Rust is more than tangentially relevant here.

[1] https://github.com/alex/rust-asn1

[-]

timewizard 3 months ago

The discussion of rust and ASN.1 libraries and interoperability with other languages might be. The relative differences between the two libraries might be. The efforts of the author to get their library into Python might be.

Simply advertising a different project, with all the standard "rust tropes," is not what most people would consider relevant. It's hamfisted and weird.

flowerthoughts 3 months ago

Related: if you ever want to create your own serialization format, please at least have a cursory look at the basics of ASN.1. It's very complete both in terms of textual descriptions (how it started) and breadth of encoding rules (because it's practical.)

(You can skip the classes and macros, though they are indeed cool...)

[-]

tptacek 3 months ago

This sounds dangerously like a suggestion that more people use ASN.1.

[-]

cryptonector 3 months ago

Would you rather they reinvent the wheel badly? Thjat's what ProtocolBuffers is: badly reinvented ASN.1/DER!

PB is:

  - TLV (tag-length-value), like DER
  - you have to explicitly list the
    tags in the IDL as if it was ASN.1
    in 1984 (but actually, worse,
    because even back then tags were
    not always required in ASN.1, only
    for diambiguation)
  - it's super similar to DER, yet not
    not the same
  - PB was created in part because ASN.1
    had so little open source tooling,
    but PB had none until they wrote it
    so they could just have written the
    ASN.1 tooling they'd wished they had

smh

[-]

RainyDayTmrw 3 months ago

In complete fairnes to PBs, PBs have a heck of a lot less surface area than ASN.1. You could argue, why not use a subset of ASN.1, but it seems people have trouble agreeing which subset to use.

[-]

cryptonector 3 months ago

I don't agree with that. PB is practically the same as DER. All the attack surface area lies in the codec, specifically in the decoder.

[-]

mananaysiempre 3 months ago

There are two notions of surface area here: that exposed to the external input, which must be secured, and that exposed to the programmer, which must be understood. As far as the latter is concerned, one can’t really disassociate the encoding of DER from the, well, notation of ASN.1, which, while definitely not as foreign as it may first appear, is still very rich compared to the one Protobufs use. (I do think a good tutorial and a cheat-sheet comparison to more widely used IDLs would help—for certain, obscure dusty corners and jargon-laden specs have never stopped anyone from writing the COM dialect of DCE IDL.)

Even if we restrict ourselves to the former notion, the simple first stage of parsing that handles DER proper is not the only one we have to contend with: we also have to translate things like strings, dates, and times to ones the embedding environment commonly uses. Like, I’m the kind of weird pervert that would find it fun to implement transcoding between T.61 and Unicode faithfully, but has anyone ever actually put T.61 in an ASN.1 T61String? As far as I know, not as far as PKIX is concerned—seemingly every T61String in a certificate just has ISO 8859-1 or *shudder* even Windows-1252 inside it (and that’s part of the reason T61Strings are flat out prohibited in today’s Web PKI, but who can tell about private PKIs?). And I’ll have to answer questions like this about every one of a dozen obscure and/or antiquated data types that core ASN.1 has (EMBEDDED PDV anyone?..).

[-]

cryptonector 3 months ago

> There are two notions of surface area here: that exposed to the external input, which must be secured, and that exposed to the programmer, which must be understood. As far as the latter is concerned, one can’t really disassociate the encoding of DER from the, well, notation of ASN.1, [...]

I disagree. I say that as a part-time maintainer of an open source ASN.1 stack that generates ergonomic C from ASN.1.

> I do think a good tutorial and a cheat-sheet comparison [...]

For ASN.1? There's a great deal of content out there, and several books. I'm not sure what more can be done. Tutorials? Look around this thread. People who can't be bothered with docs nowadays also can't be bothered with tutorials -- they just rely on LLMs.

> Like, I’m the kind of weird pervert that would find it fun to implement transcoding between T.61 and Unicode faithfully, but has anyone ever actually put T.61 in an ASN.1 T61String?

Me too, but as you note, no one really does that. My approach as to PKIX is to only-allow-ASCII for string types other than UTF8String.

> And I’ll have to answer questions like this about every one of a dozen obscure and/or antiquated data types that core ASN.1 has (EMBEDDED PDV anyone?..).

Now do C++!

A "modern" subset of ASN.1 is not that much smaller than x.680 + all of x.681, x.682, and x.683.

flowerthoughts 3 months ago

The one thing that grinds my gears about BER/CER/DER is that they managed to come up with two different varint encoding schemes for the tag and length.

[-]

cryptonector 3 months ago

Meh. One rarely ever needs tags larger than 30, and even more seldom tags larger than twice that, say.

[-]

flowerthoughts 3 months ago

Yeah, but if you're writing a parser for use by others, you have to implement both, even if it's "rarely" used. Or some intern somewhere will have a bad day after getting tasked with "just add this value here, it'll be an easy starter project." :)

[-]

cryptonector 3 months ago

And then it's a tiny bit more code. It's really not a problem.

mort96 3 months ago

Why wouldn't you want to explicitly number fields? Protocols evolve and get extended over time, making the numbering explicit ensures that there's no accidental backwards compat breakage from re-ordering fields. Implicit field numbers sounds like an excellent reason to not use ASN.1.

This shilling for an over-engineered 80s encoding ecosystem that nobody uses is really putting me off.

[-]

cryptonector 3 months ago

> Why wouldn't you want to explicitly number fields? Protocols evolve and get extended over time, making the numbering explicit ensures that there's no accidental backwards compat breakage from re-ordering fields.

ASN.1 went through this whole evolution, and ended up developing extensive support for extensibility and "automatic tagging" so you don't have to manually tag. That happened because the tagging was a) annoying, b) led to inconsistent use, c) led to mistakes, d) was almost completely unnecessary in encoding rules that aren't tag-length-value, like PER and OER.

The fact that you are not yet able to imagine that evolution, and that you are not cognizant with ASN.1's history proves the point that one should study what came before before reinventing the wheel [badly].

[-]

mananaysiempre 3 months ago

I have to admit that I could not make heads or tails of the extension marker stuff in the ASN.1 standards I’ve read (so the essential ones like basic ASN.1 and BER, not the really fun stuff like object classes, macros, or ECN). This is rather unlike the rest of those standards. So, could you elaborate on what those actually do and/or why they’re the right way to do things?

[-]

cryptonector 3 months ago

> So, could you elaborate on what those actually do and/or why they’re the right way to do things?

Yes.

TL;DR: formal languages allow you to have tooling that greatly reduces developer load -work and cognitive load-, which yields more complete and correct implementations of specifications that use formal languages.

---

My apologies for the following wall of text, but I hope you can spare the time to read it.

Suppose you don't have formal ways to express certain things like "if you see extra fields at the end of this structure, ignore them", so you write that stuff in English (or French, or...). Not every implementor will be a fluent English (or French, or ...) reader, and even the ones who are might move too fast and break things. If you make something formal in a machine-readable language, then you don't have that problem.

Formalizing things like this adds plenty of value and doesn't cost much as far as the specification language and specs using it go. It does cost something to make tooling implement it fully, but it's not really that big a deal -- this stuff is a lot simpler than -say- Clang and LLVM.

> I could not make heads or tails of the extension marker stuff

It's like this. Suppose you have a "struct" you might want to add fields to later on:

  SEQUENCE {
     foo UTF8String
     n   INTEGER
  }

well, then you an "extensibility marker" to denoted this:

  SEQUENCE {
     foo UTF8String
     n   INTEGER,
     ...
  }

This tells your tooling to ignore and skip over any extensions present when decoding.

But now you want to define some such extensions and leave the result to be extensible, so you write:

  SEQUENCE {
     foo UTF8String
     n   INTEGER,
     ...,
     [[2: -- first extension
     bar OBJECT IDENTIFIER,
     ]],
     ...  -- still extensible!
  }

You can also use extensibility markers in constraints, like:

  -- small integer now, but some day maybe larger
  SmallInt INTEGER (-128..128, ...)

> in the ASN.1 standards I’ve read (so the essential ones like basic ASN.1 and BER, not the really fun stuff like object classes, macros, or ECN). This is rather unlike the rest of those standards.

Extensibility markers are in the base ASN.1 spec, x.680.

The "Information Object System" and "object classes" and such are in x.681, x.682, and x.683. That's all a fancy way of expressing formally "parameterized types" and what kinds of things go in "typed holes", where a "typed hole" is something like a Rust-like "enum with data" where the enum is extensible through external registries. A typical example is PKIX certificate extensions:

  TBSCertificate  ::=  SEQUENCE  {
      version         [0]  Version DEFAULT v1,
      serialNumber         CertificateSerialNumber,
      signature            AlgorithmIdentifier{SIGNATURE-ALGORITHM,
                                {SignatureAlgorithms}},
      issuer               Name,
      validity             Validity,
      subject              Name,
      subjectPublicKeyInfo SubjectPublicKeyInfo,
      ... ,
      [[2:               -- If present, version MUST be v2
      issuerUniqueID  [1]  IMPLICIT UniqueIdentifier OPTIONAL,
      subjectUniqueID [2]  IMPLICIT UniqueIdentifier OPTIONAL
      ]],
      [[3:               -- If present, version MUST be v3 --
      extensions      [3]  Extensions{{CertExtensions}} OPTIONAL
      ]],
      ... }

Here `signature` and `extensions` are typed holes. A signature will be some algorithm identifier, optional algorithm parameters, and a signature byte blob. While `extensions` will be a `SEQUENCE OF` (array of) `Extension`:

  Extensions{EXTENSION:ExtensionSet} ::=
      SEQUENCE SIZE (1..MAX) OF Extension{{ExtensionSet}}

(this means that `Extensions` is an array of at least one item of type `Extension` where all those elements are constrained by the "object set" identified by the formal parameter `ExtensionSet` -- formal parameter meaning: the actual parameter is not specified here, but it is specified about where we saw `extensions` is `Extensions{{CertExtensions}}`, and so the actual parameter is `CertExtensions`. Here's what `CertExtensions` is:

   CertExtensions EXTENSION ::= {
           ext-AuthorityKeyIdentifier | ext-SubjectKeyIdentifier |
           ext-KeyUsage | ext-PrivateKeyUsagePeriod |
           ext-CertificatePolicies | ext-PolicyMappings |
           ext-SubjectAltName | ext-IssuerAltName |
           ext-SubjectDirectoryAttributes |
           ext-BasicConstraints | ext-NameConstraints |
           ext-PolicyConstraints | ext-ExtKeyUsage |
           ext-CRLDistributionPoints | ext-InhibitAnyPolicy |
           ext-FreshestCRL | ext-AuthorityInfoAccess |
           ext-SubjectInfoAccessSyntax, ... }

where each of those `ext-*` is an information object that looks like this:

   ext-SubjectAltName EXTENSION ::= { SYNTAX
       GeneralNames IDENTIFIED BY id-ce-subjectAltName }

which says that a SAN (subjectAltName) is identified by the OID `ext-SubjectAltName` and consists of a byte blob containing an encoded GeneralNames value:

   GeneralNames ::= SEQUENCE SIZE (1..MAX) OF GeneralName

   GeneralName ::= CHOICE {
        otherName                   [0]  INSTANCE OF OTHER-NAME,
        rfc822Name                  [1]  IA5String,
        dNSName                     [2]  IA5String,
        x400Address                 [3]  ORAddress,
        directoryName               [4]  Name,
        ediPartyName                [5]  EDIPartyName,
        uniformResourceIdentifier   [6]  IA5String,
        iPAddress                   [7]  OCTET STRING,
        registeredID                [8]  OBJECT IDENTIFIER
   }

All the PKIX certificate extensions, and CRL extensions, and attribute certificate extensions and ... extensions are specified like this in, for example, RFC 5912.

If you have a compiler that can handle this then you can have it generate a decoder that fully decodes the most complex certificate in one go and yields a something like a struct (or whatever the host language calls it) that nests all the things. And it can also generate you an encoder that takes a value of that sort.

The alternative is that if you want to fish out a particular thing from a certificate you would have to first decode the certificate, then find the extension you wanted by iterating over the sequence of extensions and looking for the right OID to find the byte blob containing the extension which you would then have to invoke the decoder for. This is a very manual process, it's error-prone, and it's so boring and tedious required extensions.

I want to emphasize how awesome this "decode all the way through, in one invocation" feature is. It really is the most important step to having full implementations of specs.

ECN is more obscure and less used. It was intended as a response to hardware designers' demand for "bits on the wire" docs like for TCP and IP headers. In the 80s and 90s the ITU-T thought they could get ASN.1 to be used even at layers like IP and TCP, and people working on the Internet said "lay off the crack that's crazy talk because hardware needs to efficiently decode packet headers yo!". The idea was to use ASN.1 and extend it with ways to denote how things would get encoded on the wire rather than leaving all those details to the encoding rules like BER/DER/CER, PER, OER, XER, JER, etc. Unless you have a need for ECN because you're implementing a protocol that requires it, I would steer clear of it.

As you can tell the ITU-T is in love with formal languages. And they are quite right to so be. Other standards development organizations, like the IETF for example, sometimes make heavy use of such formal languages, and other times not. For example, PKIX, Kerberos, SNMP, etc., all use ASN.1 extensively, and PKIX in particular makes the most sophisticated use of ASN.1 (see RFC 5912!), while things like TLS and SSHv2 have ad-hoc languages for their specifications, and in the case of TLS that language is not always used consistenly, so it's hard to write a compiler for it, and in the case of SSHv2 that language is much too limited to bother writing a compiler for.

You can tell that ITU-T specs are of much higher quality than IETF specs, but then the ITU-T requires $$$ to participate while the IETF is free, and the ITU-T has very good tech writers on staff, and ITU-T participants are often paid specifically for their participation. While the IETF has a paid RFC-Editor and RFC Production Center and editors, but RFC editors only get involved at the very end of RFC publication, so they can't possibly produce much better RFCs than the original authors and editors of the Internet-Drafts that precede them, and Internet-Draft authors are rarely paid to work full time on IETF work. Some IETF specs are of very high quality, but few, if any, approach the quality of the ITU-T x.68x series (ASN.1) and x.69x series (ASN.1 encoding rules).

What all of the above says is that we don't always need the highest quality, most formalized specifications, but whenever we can get them, it's really much better than when we can't.

mort96 3 months ago

Sounds like unnecessary complexity which makes it more error prone.

[-]

cryptonector 3 months ago

> Sounds like unnecessary complexity which makes it more error prone.

No! On the contrary, it makes it less error prone. Any time you formalize what would have been English (or French, or...) text things get safer, not riskier.

cryptonector 3 months ago

That's like saying that Rust is unnecessary complexity over C...

Ekaros 3 months ago

Understanding prior art and getting more comprehensive list of things that need to be considered is always good.

Not doing it is like inventing new programming language after just learning one of them.

RainyDayTmrw 3 months ago

What should people use today, given the choice, that isn't ASN.1?

Edited to add: If they need something with a canonical byte representation, for example for hashing or MAC purposes?

[-]

viraptor 3 months ago

How much of it do you need in that representation? Usually I see that need in either: x509 where you're already using der, or tiny fragments where a custom tag-length-value would cover almost every usage without having to touch asn.

[-]

RainyDayTmrw 3 months ago

All I really need is serialization for structs. I'm trying to avoid inventing my own format, because it seems to be footgun-prone.

wglb 3 months ago

Here are some issues: https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=asn.1

[-]

zzo38computer 3 months ago

Some of the bug reports here are not actually about ASN.1 even if they are in programs that also use ASN.1. However, some are actually about ASN.1.

But, there are bugs in many computer programs, whether or not they use ASN.1, anyways.

CVE-2022-0778 does not seem to be about ASN.1 (although ASN.1 is mentioned in the description); it seems to be a bug with computing a modular square root for non-prime moduli, and these numbers can come from any source and does not necessarily have anything to do with ASN.1.

CVE-2021-3712 does have to do with ASN.1 implementation, but this is a bad assumption in some other parts of the program that use the ASN.1 structure. (My own implementation also does not require the string stored in the ASN1_Value structure to be null-terminated, but none of the functions implicitly null-terminate it or expect it to be. One reason for this is to avoid memory allocations when they are not needed.)

Many programs dealing with OIDs have problems with it, since the program is badly designed; a properly designed program (which is not that difficult to do) will not have these problems with OIDs. It is rarely necessary to decode OIDs, except for display (my own implementation limits it to 160 digits per part when displaying a OID, which is probably much more than is needed, but should avoid the problem described in CVE-2023-2650 anyways). When comparing OIDs, you can compare them in binary format directly (if one is in text format, you should convert that one to binary to compare them, instead of the other way around). If you only want to validate OIDs, that can be done without decoding the numbers: Check that it is at least one byte long, the first byte is not 0x80, the last byte does not have the high bit set, and any byte that does not have the high bit set is not immediately followed by a byte 0x80. (The same validation can apply to relative OIDs, although some applications may allow relative OIDs to be empty, which is OK; but absolute OIDs are never allowed to be empty.) (Some other reports listed there also relate to OIDs. If the program is well-designed then it will not have these problems, as I described.)

My own implementation never decodes ASN.1 values until you explicitly tell it to do so, with a function according to the type being decoded, and returns an error condition if the type is incorrect. All values are stored in a ASN1_Value structure which works the same way.

Some of the CVE reports are about things which can occur just as easily in other programs not related to ASN.1. Things such as buffer overflows, improperly determining the length, integer overflows, etc, can potentially occur in any other program too.

None of the things listed by CVE reports seems to be inherent security issues with ASN.1 itself.

cryptonector 3 months ago

First of all you should never need a canonical representation. If you think you do, you're almost certainly wrong. In particular you should not design protocols so that you have to re-encode things in order to validate signatures.

So then you don't need DER or anything like it.

Second, ASN.1 is fantastic. You should at least study it a bit before you pick something else.

Third, pick something you have good tooling for. I don't care if it's ASN.1, XDR, DCE RPC / MSRPC, JSON, CBOR, etc. Just make sure you have good tooling. And don't pick XML unless you really need it to interop with things that are already using XML.

EDIT: I generally don't care about downvotes, but in this case I do. Which part of the above was objectionable? Point 1, 2, or 3? My negativity as to XML for protocols? XML for docs is alright.

[-]

RainyDayTmrw 3 months ago

Interesting. What do you make of PASETO[1] and specifically PAE[2], then?

[1]: https://github.com/paseto-standard/paseto-spec/blob/master/d... [2]: https://github.com/paseto-standard/paseto-spec/blob/master/d...

[-]

cryptonector 3 months ago

I'll have to read the docs. I'll comment here in a few days.

johnisgood 3 months ago

Erlang also has great ASN.1 support. For the rest, I hope OSS Nokalva'a proprietary solutions will go away, eventually.

For Java I used yafred's asn1-tool, which is apparently not available anymore. Other than that, it worked well.

Originally it was available here: https://github.com/yafred/asn1-tool (archived: https://web.archive.org/web/20240416031004/https://github.co...)

Any recommendations?

[-]

nailer 3 months ago

I used Peculiar Ventures ASN1.js to build an in-browser PKI platform years ago. It can sit on top of webcrypto and do everything you need in terms of managing TLS certs.

https://asn1js.org/

https://www.npmjs.com/package/asn1js

elFarto 3 months ago

I was using the asn1bean Java library yesterday funnily enough. I'm sure it's fine for X.509 stuff, however lucky me got to use it with the more obscure parts of X.400. It's lacking support for COMPONENTS OF, and a bunch of other things that were likely deprecated from the ASN.1 spec a few decades ago.

[-]

johnisgood 3 months ago

Any luck with it?

Check the README of: https://web.archive.org/web/20240416031004/https://github.co...

I need something like this.

nightpool 3 months ago

https://mvnrepository.com/artifact/de.rub.nds/asn1-tool links to https://github.com/tls-attacker/ASN.1-Tool which redirects to https://github.com/tls-attacker/ASN.1-Attacker, so that seems like it's the new name for the tool you're looking for.

[-]

johnisgood 3 months ago

No, it is not the same. Sadly the author probably made it private, might reach out to him, just don't know how. :( He did not put his e-mail anywhere from what I have found so far.

https://github.com/zhonghuihuo/asn1-tool is still available, but it is very old, it is probably a VERY old fork.

I need something like this :(. I need it for Java / Kotlin. I do not have the repository cloned, so I am kind of in the dark.[1]

Found the archived page of the previously mentioned project that is probably private now: https://web.archive.org/web/20240416031004/https://github.co...

[1] Never mind, I found the newest (probably) asn1-compiler.jar! I still need an actively maintained alternative, however, for Java / Kotlin. For ASN.1 between Erlang / Elixir <> Java / Kotlin.

[-]

iDon 3 months ago

IBM did a partial implementation of ASN.1 in Java, and released it via the IBM AlphaWorks open-source repository. I used it in a telecommunications system in the 90s. Luckily the GSM protocol we were interfacing with only used a small subset of ASN.1, which was covered by the IBM software. IBM AlphaWorks is still online : https://www.ibm.com/support/pages/aix-toolbox-open-source-so... but only lists libtasn1, which is in C.

Here's a post about AlphaWorks : https://www.cnet.com/tech/services-and-software/ibm-alphawor...

Searching for that, I found this post, https://stackoverflow.com/questions/37056554/opensource-java... which mentions : https://www.beanit.com/asn1/ https://sourceforge.net/projects/jac-asn1/ which are more recent java ASN.1 implementations.

[-]

3 months ago

[deleted]

dikei 3 months ago

DER is still easy, UPER (unaligned packed encoding rules) is so much harder, yet it's prevalent in Telecom industry. Last I checked, there was no freely available tool than can handle UPER l00%

[-]

bryancoxwell 3 months ago

Not only is UPER hard to parse, but (I believe) 3GPP ASN1 definitions are provided only in .docx files which aren’t exactly the easiest to work with. It’s just really not a fun domain.

[-]

jeroenhd 3 months ago

The ASN.1 format itself isn't too bad. It shows its age and has some very weird decisions behind it as places, but it's not that difficult to encode and is quite efficient.

Unfortunately, the protocols themselves can be confusing, badly (or worse: inconsistently) documented, and the binary formats often lack sensible backwards compatibility (or, even worse: optional backwards compatibility). Definitions are spread across different protocols (and versions thereof) and vendors within the space like to make their own custom protocols that are like the official standardised protocols, but slightly different in infuriating ways.

If you parser works (something open source rarely cares about so good luck finding one for your platform), the definitions extracted from those DOCX files are probably the least of your challenges.

sobkas 3 months ago

First you can download specifications in either PDF or doc(x). Second doc(x) are simple enough that simple doc(x) to ASCII/text is good enough to produce working ASN.1 definition. Copy&paste is also an option.

masklinn 3 months ago

FWIW rasn linked above claims to support UPER, but I couldn't tell you how completely.

[-]

nicce 3 months ago

There are many tools that can handle UPER up to certain level (some rare ASN.1 types might not be supported). I think the main issue is not in the codec, rather the lack of compilers that can create correct language-level representation of the ASN.1 definitions. 3GPP specifications are enormous and you don't want to create them by hand. ASN.1 has some very difficult notations, e.g. inner subtype constraints and information object classes. Subtype constraints may affect for the encoding output in UPER and if you are not representing them correctly overall, then you are not compatible.

flowerthoughts 3 months ago

How come they don't (just) apply zlib on DER? Is telco equipment able to stream process UPER without buffering more than non-constructed values?

[-]

userbinator 3 months ago

When every bit passing through the network gets charged (if not to the customer, then it's taking up capacity that could otherwise be charged to the customer), and the software in the endpoints needs to be as low-power as possible, zlib is additional overhead you definitely don't want.

eqvinox 3 months ago

PER were defined in 1994; back then applying zlib wasn't something you "just" do. Modern use is backwards compatibility (or cargo cult.)

[-]

nicce 3 months ago

UPER is extremely compact encoding format. It still makes sense to use UPER, because after all, it is an international standard and telecommunication protocols itself are supposed to add as little overhead on top of actual payload as possible.

For example, if you have ASN.1 UTF-8 string that is constrained to 52 specific characters - UPER encoding can present every character with 6 bits (not bytes).

In modern world you can apply zlib on top of UPER encoding or internal payload, however, depending on the use case.

nly 3 months ago

asn1c claims to support "unaligned basic PER"

venamresm__ 3 months ago

In the ASN.1 space everyone hopes that someone can dethrone OSS Nokalva's proprietary solutions

[-]

woodruffw 3 months ago

I think it's context-dependent: I don't have insight into OSS Nokalva's use inside big companies, but in the Open Source world it certainly isn't dominant.

In Open Source, I think the biggest ASN.1 implementations I come across are OpenSSL's, libtasn1, asn1c, and then per-language implementations like pyasn1.

[-]

nicce 3 months ago

Basically any commercial ASN.1 compiler prevents usage of the output in any open-source project. There is that.

[-]

sobkas 3 months ago

Licence also prevent you from modifying generated code.

cryptonector 3 months ago

https://github.com/heimdal/heimdal/tree/master/lib/asn1

venamresm__ 3 months ago

Most of the open source tools need patching to properly support certain scenarios (been there done that). They also lack support for parsing ASN.1 Value Notation format (textual), which is used everywhere in specifications, OSS Nokalva offers the full set of tools to handle this even with a playground and ASN.1 editor, this is non-existent in open source right now. For now the open source tools only focus on the crypto aspect, and doesn't really dive into telco, banking, biometric, and others.

memling 3 months ago

> In the ASN.1 space everyone hopes that someone can dethrone OSS Nokalva's proprietary solutions

You're buying more than a compiler and runtime, though: you're also getting an SLA and a stricter guarantee about interoperability and bugs and so forth. I have no idea how good their support is (maybe it's atrocious?), but these are important. I had a client who relied on the open-sourced asn1c once who complained about some of the bugs they found in it; they got pushed into buying commercial when the cost-benefit outweighed the software licensing issues.

[-]

cryptonector 3 months ago

Meh. After all, if you're not using ASN.1 you're using something like ProtocolBuffers or FlatBuffers or whatever and all open source tooling.

[-]

memling 3 months ago

> Meh. After all, if you're not using ASN.1 you're using something like ProtocolBuffers or FlatBuffers or whatever and all open source tooling.

Oh sure--there are plenty of alternatives to ASN.1. My guess is that most people who have the choice don't use ASN.1 precisely because open-source alternatives exist and can feasibly work for most use cases.

But if you happen to have one of the use cases that require ASN.1, open sourced tooling can be problematic precisely because of the need for a robust SLA.

[-]

cryptonector 3 months ago

> But if you happen to have one of the use cases that require ASN.1, open sourced tooling can be problematic precisely because of the need for a robust SLA.

Why would you need a support SLA for ASN.1 and not for PB/FB? That makes no sense. And there's plenty of open source ASN.1 tooling now -- just look around this thread!

[-]

dikei 3 months ago

The difference is the quality of the OSS implementation: most OSS ASN.1 tool choke on the enormous 3GPP specs and others used in the telco industry, thus cannot generate 100% valid code.

For some use-cases, you can get by with manually adjust the generated code. That works until the hardware vendors release a new device that use a more modern 3GPP specs and your code start breaking again.

When using a commercial ASN.1 tooling, they often update their compilers to support the latest 3GPP specs even before the hardware vendors, and thus supporting a new device is way simpler.

[-]

cryptonector 3 months ago

If I got paid to write an 3GPP implementation one of the things I might do is make one open source ASN.1 stack really good. I've worked on open source projects as part of proprietary work.

memling 3 months ago

> Why would you need a support SLA for ASN.1 and not for PB/FB? That makes no sense. And there's plenty of open source ASN.1 tooling now -- just look around this thread!

If your business depends on five nines plus of reliability in your 5g communications stack, you might be willing to fork over the price for it. Or if you need a bug fix made in a timely fashion to the compiker or runtime, likewise. As I've noted above, a client of mine moved to a commercial suite of tools for this reason.

Protobuf and flatbuffers have different use cases in my experience, although that's somewhat limited. Protobuf at least also introduced breaking changes between versions 2 and 3. ASN.1 isn't perfect in this regard, but these days incompatibikities have to go through ISO or ITU, etc.

Your experience may be different of course. I'm just pointing out that there are reasons people will opt for a commercial product.

[-]

cryptonector 3 months ago

> Protobuf and flatbuffers have different use cases in my experience, although that's somewhat limited.

This is true for the ASN.1 encoding rules as well.

> Protobuf at least also introduced breaking changes between versions 2 and 3. ASN.1 isn't perfect in this regard,

When has ASN.1 ever broken backwards compatibility? I've never heard of an ASN.1 backwards incompatibility. Maybe, if you stretch an interpretation of ASN.1 in 1984 to allow new fields to be added to `SEQUENCE { }` then the later addition of extensibility markers could count as a very weak backwards-incompatible change -- weak in that existing specs that use ASN.1 had to add those markers to `SEQUENCE { }`s that were actually intended to be extensible, but no running code was actually broken. I would be shocked if the ITU-T broke backwards compat for running code.

[-]

memling 3 months ago

> When has ASN.1 ever broken backwards compatibility? I've never heard of an ASN.1 backwards incompatibility. Maybe, if you stretch an interpretation of ASN.1 in 1984 to allow new fields to be added to `SEQUENCE { }` then the later addition of extensibility markers could count as a very weak backwards-incompatible change -- weak in that existing specs that use ASN.1 had to add those markers to `SEQUENCE { }`s that were actually intended to be extensible, but no running code was actually broken. I would be shocked if the ITU-T broke backwards compat for running code.

Good question. I was thinking of the transitions in the '80s, although my experience with standards written during that time is very limited.

But yes, one of the reasons people use ASN.1 is because of its hard and fast commitments to backwards compatibility.

[-]

cryptonector 3 months ago

> But yes, one of the reasons people use ASN.1 is because of its hard and fast commitments to backwards compatibility.

To be fair I think that's generally expected of things like it. XDR? Stable. DCE/RPC? Obsolete, yes, but stable. MSRPC? A derivative of DCE/RPC, and yes, stable. XML? Stable. JSON? Stable. And so on. All of them stable. If PB broke backwards-compat once then to me that's a very serious problem -- details?

[-]

memling 3 months ago

> If PB broke backwards-compat once then to me that's a very serious problem -- details?

Proto2 and Proto3 differ in how they handle default and required elements. Regarding these differences, I found a few references online:

    https://softwareengineering.stackexchange.com/questions/350443/why-protobuf-3-made-all-fields-on-the-messages-optional    

    https://groups.google.com/g/protobuf/c/Pezwn5UYZss

    https://www.hackingnote.com/en/versus/proto2-vs-proto3/

I don't use protobuf regularly, and they claim that the wire formats are bidirectionally compatible. When I last evaluated them with another developer years ago, I don't recall this being the case. (It was not merely a difference between their syntaxes.) I'm not sure the semantics are preserved between the two versions, either (e.g., did I provide a default value? was this element optional and missing? etc.).

They have lately (this is news to me) moved to protobuf editions: https://protobuf.dev/editions/overview/. This provides some flexibility in the code generation and may require some maintenance on the part of the user to ensure that codec behavior remains consistent. Google, for their part, are trying to minimize these disruptions:

> When the subsequent editions are released, default behaviors for features may change. You can have Prototiller do a no-op transformation of your .proto file or you can choose to accept some or all of the new behaviors. Editions are planned to be released roughly once a year.

cryptonector 3 months ago

> five nines

Does one pay for an SLA for every piece of hardware, firmware, and software? The codecs are the least likely cause of downtime.

[-]

memling 3 months ago

> Does one pay for an SLA for every piece of hardware, firmware, and software? The codecs are the least likely cause of downtime.

I don't recall saying that—just that I have had clients for whom the support was sufficiently important (because of their own reliability concerns) that they went commercial instead of open source. (They required, among other things, 24x7 support and dedicated resources to fix bugs when found; they also sought guarantees on turn-around time.)

[-]

cryptonector 3 months ago

Fair enough.

cryptonector 3 months ago

I don't have the time, though I do have the inclination, to finish Heimdal's ASN.1 compiler, which is already quite awesome. u/lukeh used Heimdal's ASN.1 compiler's JSON transformation of modules output to build an ASN.1 compiler and runtime for Swift.

woodrowbarlow 3 months ago

neat!

related: you can also create wireshark dissectors from ASN.1 files

https://www.wireshark.org/docs/wsdg_html_chunked/ASN1StepByS...

zzo38computer 3 months ago

I do not use Python, but I wrote my own library in C for reading/writing DER. (I have made a variant, which adds a few new types such as: key/value list, BCD, TRON character code, etc. The program works even if you do not use these new types.)

DER does have the advantages they mention in that article, and other advantages.

Some people mention that DER is not compact or not efficient; but often what is used instead is formats that are even less compact or less efficient than DER, and/or that are significantly more complicated to handle.

mootptr 3 months ago

Parser differential exploits are a understated problem, especially with ASN.1, which I didn't expect to see anyone thinking about. Kudos on this initiative!

[-]

kccqzy 3 months ago

I understand that it is a problem but I'm more used to seeing arguments that monocultures and single implementations are bad: WebSQL for example didn't become a standard because there was only a single implementation.

[-]

cryptonector 3 months ago

Where is the monoculture here?

[-]

kccqzy 3 months ago

If there were only one implementation for ASN.1 people would decry that whatever that implementation does effectively becomes the standard, and people would be clamoring to write a second implementation.

[-]

cryptonector 3 months ago

Ok, but there are many implementations. And the ASN.1 specs are really clear and readable (once you have a mental model of them).

[-]

kccqzy 3 months ago

Yes, which is why the contrast with monoculture is interesting. Hence my original comment.

[-]

cryptonector 3 months ago

Ah, I see. Thanks.

benatkin 3 months ago

From the post

> with the help of funding from Alpha-Omega

From the site:

> funded by Microsoft, Google, and Amazon

Also it's a Linux Foundation project.

Interesting. Python's a big community, and there's some disagreement here over whether this would be better done in pure python. I think it's good that there's a rust/cloud contingent in python land but hope pure python remains popular.

[-]

woodruffw 3 months ago

For the record: I shopped this project to them; it didn’t originate as an idea from LF or any major company. The idea itself came from PyCA’s maintainers, who have wanted this for a while.

(The native code desirability question is also a red herring here, since PyCA has and will always need to have native code for cryptographic operations. So having a first class ASN.1 API in PyCA was always going to involve native code, with the only variable being how.)

lilyball 3 months ago

Oh right, the asn1 crate, which supports CHOICE but only up to 3 alternatives, which means it can't even be used to implement X.509 certificate decoding. Makes me wonder what they're going to do when they get that far.

[-]

woodruffw 3 months ago

I don't know why you think this; the asn1-rust crate is in fact used to implement not only X.509 certificate decoding[1] but also path validation[2].

Edit: Here's an example of a CHOICE implemented with rust-asn1 that has more than three variants[3].

[1]: https://cryptography.io/en/latest/x509/reference/

[2]: https://cryptography.io/en/latest/x509/verification/

[3]: https://github.com/pyca/cryptography/blob/be6c53dd03172dde6a...

[-]

Arnavion 3 months ago

The asn1 crate provides three builtin Choice enums itself - asn1::{Choice1, Choice2, Choice3} - which support 1, 2 and 3 choices respectively. I assume that is what lilyball is referring to. But as you correctly point out the custom derive supports mapping enums with more variants to CHOICE just fine, so the builtin enums are not a relevant limitation.

time4tea 3 months ago

20+ years ago used ASN.1 for talking between micro services. (HTTP Services, as they were called then) Very performant. Had to buy OSS tools licence but other than that quite nice.

dec0dedab0de 3 months ago

Does anyone miss when "pure python" was a selling point of your library? I understand the need for speed, but I wish it were more common to ship a compiled binary that does the thing fast, as well as the same algorithm in python to fall back on if it's not available.

[-]

smnrchrds 3 months ago

Pure Python was a huge selling point back when using a compiled library involved downloading and running random .exe files from someone's personal page on a university website. It is much less of a selling point now that we have binary wheels and uv/Poetry/etc. that create cross-platform lock files.

I feel nostalgic seeing (a mirror of) that download page again, but that era was such a pain.

Mirror: http://qiniucdn.star-lai.cn/portal/2016/09/05/tu3p2vd4znn

[-]

dec0dedab0de 3 months ago

I always thought the selling point of Pure Python was that you might be running on some esoteric implementation of python, or hardware that the library maintainer didn't include binaries for.

I mean, I am glad wheels exist, they make things a lot easier for a lot of people. I just believed in the dream of targeting the language for maximum compatibility and hoping a better implementation would eventually run it faster.

[-]

kccqzy 3 months ago

Indeed. I was an early adopter of Google's App Engine using Python and at that time Pure Python is a must.

pjmlp 3 months ago

I rather find tragic that contrary to other dynamic languages, Python seems to fall under the curse of rewriting bindings into C and C++, or nowadays more fashionable Rust.

And yes, Smalltalk, Self and various Lisp variants are just as dynamic.

[-]

woodruffw 3 months ago

Why is it tragic? It's more or less idiomatic in Python to put the hot or performance-sensitive paths of a package in native code; Rust has arguably made that into a much safer practice.

[-]

pjmlp 3 months ago

Because it forces mastering two languages, or depending on third party developers for anything that matters beyond basic OS scripting tasks.

It became idiomatic as there was no other alternative.

[-]

woodruffw 3 months ago

You don’t have to master Rust to use this, the same way you don’t have to master C to use all of the critical extensions written in it.

(Besides, no language has this regardless of native extensions: a huge part of Python’s success comes from the fact that there isn’t a perfect graph of competencies in the community, and instead that community members provide high quality abstractions over their respective competencies.)

[-]

pjmlp 3 months ago

Assuming someone else fixes the issues that might come up.

One of the pain points from Python is exactly native libraries and getting them compiled.

[-]

woodruffw 3 months ago

Sure, but that’s the general maintenance risk. I don’t think native code changes that dynamic, particularly in ecosystems where it’s the norm. And doubly so for cryptographic code, where native is the norm for performance and certification reasons.

It’s my impression as a maintainer of many projects that native compilation hasn’t been a driving issue in Python packaging for a while now. Most users download wheels and sidestep the entire problem. Whether or not they should trust random binaries from the internet is of course a bigger question.

foolswisdom 3 months ago

It's part of the original selling points of python, so it's not surprising that we've never stopped doing it.

[-]

pjmlp 3 months ago

As someone that has been using Python since version 1.6 that was certainly not one of the original selling points.

Rather being a better Perl for UNIX scripts and Zope CMS, there was no other reason to use Python in 2000.

pphysch 3 months ago

I certainly don't miss needing to install additional system libraries in addition to my pip install.

[-]

pjmlp 3 months ago

In what way does having to compile a Rust library prevent this?

[-]

pphysch 3 months ago

IME, and I may be off base, the new generation of Rust/Go binaries have a more "batteries-included" philosophy, i.e. developers don't assume that they can piggyback off existing user system libraries, which generally makes it a nicer install UX.

[-]

pjmlp 3 months ago

It will fail to compile as well if they depend on any system library or OS specific feature not present on your system.

sharperguy 3 months ago

They should just distribute it in some kind of bytecode compiled language with JIT VM like java. Then at least it will be cross platform.

[-]

pjmlp 3 months ago

Yeah, I heard about this new one taken from browsers.

It is supposed to fix everything that previous ones never achieved.

3 months ago

[deleted]

retrocryptid 3 months ago

[flagged]

3 months ago

[deleted]

[-]

3 months ago

[deleted]

3 months ago

[deleted]