In short: I wanted to talk a bit about ASN.1, a bit about D, and a bit about the compiler itself, but couldn't think of any real cohesive format.
So I threw a bunch of semi-related ramblings together and I'm daring to call it a blog post.
Sorry in advance since I will admit it's not the greatest quality, but it's really not easy to talk about so much with such brevity (especially since I've already forgot a ton of stuff I wanted to talk about more deeply :( )
A small nitpick: I don’t think your intersection example does what you want it to do. Perhaps there’s some obscure difference in “PER-visibility” or whatnot, but at least set-theoretically,
LegacyFlags2 ::= INTEGER (0 | 2 ^ 4..8) -- as in the article
is exactly equivalent to
LegacyFlags2 ::= INTEGER (0) -- only a single value allowed
as (using standard mathematical notation and making precedence explicit) {0} ∪ ({2} ∩ {4,5,6,7,8}) = {0} ∪ ∅ = {0}.
As someone that had the dis-pleasure to work with Asn.1 data (yes, certificates) I fully symphatise with anguish you've gone through (that 6months of Ansible HR comments cracked me up also :D ).
Bzzt! Wrong! I have worked with ASN.1 for many years, and I love ASN.1. :)
Really, I do.
In particular I like:
- that ASN.1 is generic, not specific to a given encoding rules (compare to XDR, which is both a syntax and a codec specification)
- that ASN.1 lets you get quite formal if you want to in your specifications
For example, RFC 5280 is the base PKIX spec, and if you look at RFCs 5911 and 5912 you'll see the same types (and those of other PKIX-related RFCs) with more formalisms. I use those formalisms in the ASN.1 tooling I maintain to implement a recursive, one-shot codec for certificates in all their glory.
- that ASN.1 has been through the whole evolution of "hey, TLV rules are all you need and you get extensibility for free!!1!" through "oh no, no that's not quite right is it" through "we should add extensibility functionality" and "hmm, tags should not really have to appear in modules, so let's add AUTOMATIC tagging" and "well, let's support lots of encoding rules, like non-TLV binary ones (PER, OER) and XML and JSON!".
Protocol Buffers is still stuck on TLV, all done badly by comparison to BER/DER.
As a former PKI enthusiast (tongue firmly in cheek with that description) I can say if you can limit your exposure to simply issuing certs so you control the data and thus avoid all edge cases, quirks, non-canonical encodings, etc, dealing with ASN.1 is “not too terrible.” But it is bad. The thing that used to regularly amaze me was the insane depths of complexity the designers went to … back in the 70’s! It is astounding to me that they managed to make a system that encapsulated so much complexity and is still in everyday use today.
There was an amusing chain of comments the last time protobuf was mentionned in which some people were arguing that it had been a terrible idea and ASN.1, as a standard, should have been used.
It was hilarious because clearly none of the people who were in favor had ever used ASN.1.
It's not entirely horrible, parsing DER dynamically enough to handle interpreting most common certificates can be done in some 200-300 lines of C#, so I'd take that any day over XML.
The main problem is that to work with the data you need to understand the semantics of the magic object identifiers and while things like the PKIX module can be found easily, the definitions for other more obscure namespaces for extensions can be harder to locate as it's scattered in documentation from various standardization organizations.
So, protobuf could very well have been transported in DER, the problem issue was probably more one of Google not seeing any value of interoperability and wanting to keep it simple (or worse, clashing by oblivious users re-using the wrong less well documented namespaces).
Cryptonector[1] maintains an ASN.1 implementation[2] and usually has good things to say about the language and its specs. (Kind of surprised not he’s not in the comments here already :) )
Thanks for the shout-out! Yes, I do have nice things to say about ASN.1. It's all the others that mostly suck, with a few exceptions like XDR and DCE/Microsoft RPC's IDL.
At least you might be summoning Walter Bright in talking about D. One of my favorite languages I wish more companies would use. Unfortunately for its own sake, Go and Rust are way more popular in the industry.
I feel like back when D might've been a language worth looking into, it was hampered by the proprietary compilers.
And still today, the first thought that comes to mind when I think D is "that language with proprietary compilers", even though there has apparently been some movement on that front? Not really worth looking into now that we have Go as an excellent GC'd compiled language and Rust as an excellent C++ replacement.
Having two different languages for those purposes seems like a better idea anyway than having one "optionally managed" language. I can't even imagine how that could possibly work in a way that doesn't just fragment the community.
I don't think the proprietary compilers is a true set back, look at for example C# before it became as open as .NET has become today (MIT licensed!) and yet the industry took it. I think what D needed was what made Ruby mainly relevant: Rails. D needs a community framework that makes it a strong candidate for a specific domain.
I honestly think if Walter Bright (or anyone within D) invested in having a serious web framework for D even if its not part of the standard library, it could be worth its weight in gold. Right now there's only Vibe.d that stands out but I have not seen it grow very much since its inception, its very slow moving. Give me a feature rich web framework in D comparable to Django or Rails and all my side projects will shift to D. The real issue is it needs to be batteries included since D does not have dozens of OOTB libraries to fill in gaps with.
Look at Go as an example, built-in HTTP server library, production ready, its not ultra fancy but it does the work.
Sounds like you should look into it instead of idly speculating! Also, the funny thing about a divisive feature is that it doesn't matter if it fragments the community if you can use it successfully. There are a lot of loud people in the D community who freak out and whine about the GC, and there are plenty more quiet ones who are happily getting things done without making much noise. It's a great language.
Go is a GC language that has eaten a chunk of the industry (Docker, TypeScript, Kubernetes... Minio... and many more I'm sure) and only some people cry about it, but you know who else owns sizable chunks of the industry? Java and C# which are both GC languages. While some people waste hours crying about GCs the rest of us have built the future around it. Hell, all of AI is eaten by Python another GC language.
Are you saying that if I'm using D-without-GC, I can use any D library, including ones written with the assumption that there is a GC? If not, how does it not fracture the community?
> There are a lot of loud people in the D community who freak out and whine about the GC, and there are plenty more quiet ones who are happily getting things done without making much noise
This sounds like an admission that the community is fractured, except with a weirdly judgemental tone towards those who use D without a GC?
Just wanted to say I enjoyed your post very much. Thank you for writing it. I love D but unfortunately I haven't touched it for several years. I also have some experience writing parsers and implementing protocols.
For those of you who missed this, there was a very interesting thing that happened in the growth of the internet.
At the time people were evolving the protocols through the IETF. So all the things that you rely on now - for the most part - just came into being. One day there was email. There was ftp. There was TCP. There were the Van Jacobson TCP mods.
At this time corporate types paid no attention to the internet. Academic types and the IETF were from what I saw the main developers.
Then one day the corporate world realized they might make money. But the development process of the protocols was incomprehensible (and incompatible) with the corporate culture. TCP was clearly a mess, all these protocols like DNS were a mess. From the corporate perspective.
Whether ASN.1 was a product of that war or just a product of the corporate mentality, it serves as a powerful instance of the what the corporate world looks like vs the academic world looks like. You can find the wreckage from the war littered around. If you see and X.something protocol it could well be one of the relics. There were a few X.things that were adopted and useful, but were others that would haunt your dreams.
Although this is ancient history, and pretty much now told from the corporate perspective, it suggests to us that the corporate process for thinking is not as effective as the alternative - the IETF and Academic.
One is a sort of recipe culture. You write a recipe, everyone follows it and you are happy.
The other is a sort of functional culture. If you can make bread and eat it you are happy. When the bread doesn't taste good you fix it.
Given the kind of bread that is commonly available in the US now, we can draw some conclusions about recipe thinking, recipe culture, corporate culture etc. One could even extend this paradigm of thinking to new things like AI. Or not.
My partner and I were re-watching Father of the Bride the other day (rest in peace, Diane Keaton) and during the early parents meeting the son-in-law to-be describes himself as a communications consultant, working on X.25 networking installations.
I had to pause the movie and explain to my partner just how close the world came to missing out on The Internet, and having instead to suffer the ignominy of visiting sites with addresses like “CN=wikipedia, OU=org, C=US” and god knows what other dreadful protocols underlying them. I think she was surprised how angry and distressed I sounded! It would have been awful!
> how close the world came to missing out on The Internet
Monday-morning-quarterbacking is an unproductive pastime, but I don't think it was very close, on account of the Internet side having developed a bunch of useful (if a bit ramshackle) protocols and applications much faster than the ISO team, because the specs were freely available (not to mention written in a much more understandable manner). I still rue the day the IETF dropped the "distribution of this memo is unlimited" phrase from the RFC preambles. Yeah I understand that it originally had more to do with classification than general availability, but it described the ethos perfectly.
It's not all roses and we're paying for the freewheeling approach to this day in some cases, cf. email spam and BGP hijacking. But it gave results and provided unstoppable momentum.
I get your point and it is reasonable. We are paying today. However, I believe part of the problem is that when you could make money from email, it froze. The evolution stopped. We could easily evolve email if ...
The "if..." is one of the two VERY BIG INTERNET PROBLEMS. How do you pay for things? We have an answer which pollutes. Ads => enshitification. Like recipes for how to boil and egg that are three pages of ads, and then are wrong. But we now have AI, right?
The other problem is identities on the internet. This is hard. But email? Nope. Login with Apple? Nope. Login with Google? Double, Quadruple Nope.
In the real world we have both privacy AND accountability. And. It is very difficult to maintain two identities in real life.
Privacy on the internet? Nope. Accountability? Only if you are invested in your account. Privacy and Accountability together? Nope. Two identities? You can easily do 100's or more. freg@g*.com, greg33222@g*.com, janesex994@g*.com, dogs4humanity@g*.com etc.
Protocol Wars are also a story of early enshittification of Internet, where attempts to push forward with solutions to already known problems were pushed back because they would require investment on vendor side instead of just carrying on using software majorly delivered free of charge because DoD needed a quick replacement for their PDP-10 fleet. (Only slight hyperbole)
A lot of issues also came from ISO standards refusing to get stuck without known anticipated issues taken care of, or with unextendable lockin due to accidental temporary solution ending up long term one, while IETF protocols happily ran forward "because we will fix it later" only to find out that I stalled base ossified things - one of the lessons is to add randomness to new protocols so that naive implementation will fail on day one.
Then there were accidental things, like a major ASN.1 implementation for C in 1990 being apparently shit (a tradition picked up in even worse way by OpenSSL and close to most people playing with X.509 IMO), or even complaints about ASN.1 encodings being slow due to CPU lacking barrel shifters (I figure it must refer to PER somehow)
I'm confused. much of your story is correct, but you replace the primary actors (the ITU and ISO) with 'corporate'. This is true is inasmuch as the ITU represented telephony culture, but isn't really representative of corporatism as a whole.
there is _another_ 'protocol war', but it was certainly a cold one. Internet companies starting in the late 90's just decided they weren't going to care any more about standardization efforts. They could take existing protocols and warp their intent. they could abandon the goal of universal reachability in order to make a product more consumable by the general public and add 'features'. basically whatever would stick. the poster child for this division was the development of IPv6 and the multicast protocols. The IETF just assumed that like the last 20 years, they would hash out the solutions and the network would deploy them. Except the rules had changed out from under them, the internet wasn't being run by government and academic agencies anymore, and the new crew just couldn't be bothered.
two wars. the IETF won the first through rough consensus and running code, but lost the second for nearly the same reason.
There's a Turkish saying "a human will [use] this, a human!" to signify that the thing is so abnormal/out-of-proportion that it doesn't seem to be made for people. The verb changes based on the context. If you had made too much food for example, the verb would be "eat". I think it's a great motto for design.
Remember the Game of Thrones quote, "the man who passes the sentence should swing the sword"? I think it should also be applicable to specs. Anyone who comes up with a spec must be the first responsible party to develop a parser for it. The spec doesn't get ratified unless it comes with working parser code with unit tests.
That kind of requirement might actually improve specs.
Very neat article. I too have spent countless hours (but not as many) hacking on an ASN.1 compiler, adding a subset of X.681, X.682, X.683 functionality to make it possible to -in a single codec invocation!- a whole certificate, with all its details like extensions and OtherName SANs and what not decoded recursively. So it's very nice to see a post about similar work!
ASN.1 really is quite awesome. It gets a lot of hate, but it's not deserved. ASN.1 is not just a syntax or set of encoding rules -- it's a type system, and a very powerful one at that.
I really love D, it's one of my favorite languages. I've started implementing a vim-like text editor in it from scratch (using only Raylib as a dependency) and was surprised how far I was able to get and how good my test coverage was for it. My personal favorite features of D:
* unit tests anywhere, so I usually write my methods/functions with unit tests following them immediately
* blocks like version(unittest) {} makes it easy to exclude/include things that should only be compiled for testing
* enums, unions, asserts, contract programming are all great
I would say I didn't have to learn D much. Whatever I wanted to do with it, I would find in its docs or asked ChatGPT and there would always be a very nice way to do things.
Isn't D supported by the GNU compiler collection? I personally would prefer this type of tooling over what Rust and Go do (I can't even get their compilers to run on my old platform anymore; not to mention all this dependencies on remote resources typical Rust/Go projects seem to have: which seems to be enforced by the ecosystem?)
So, I also write Go and I don't get the part about tooling. I don't need formatters or linters as I'm adult enough to know how to format my code (in fact I dislike tools doing it for me). D also has dub, which is fine, as far as package managers go. The ecosystem is the only issue and Go does arguably have a lot of very cool libraries for virtually anything, but outside of webdev, I can't see myself using them. This is why D works a lot better for projects where I don't need all those dependencies and would do better without them.
Yeah, the foundations of the language are incredible. It's just everything else around it that brings it down (and is unfortunately very hard to motivate people to solve).
D definitely missed a critical period, but I love it all the same.
I freely admit to not being a Go or Rust expert, but from what I can tell using C from D is even easier than in either of these languages. The C++ interop is also decently usable.
Wow, I needed to parse just one small ASN.1 with one value (one signature), but I didn't know ASN.1 can have a specification (to generate parser from it). So I ended up untangling it myself, just for that specific 256 bits.
Still I think it's better to have overapecified format for security stuff, json and xml are just too vague and parsers are unpredictable.
I worked on a Swift ASN.1 compiler [1] a while back (not swift-asn1, mine used Codable). I saved myself some time by using the Heimdal JSON compiler, which can transform ASN.1 into a much more parseable JSON AST.
| ASN.1 is a... some would say baroque, perhaps obsolete, archaic even, "syntax" for expressing data type schemas, and also a set of "encoding rules" (ERs) that specify many ways to encode values of those types for interchange.
for me expressing disdain for ASN.1. On the contrary: I'm saying those who would say that are wrong:
| ASN.1 is a wheel that everyone loves to reinvent, and often badly. It's worth knowing a bit about it before reinventing this wheel badly yet again.
Hey, I love how the author describes ASN.1 as a "syntax" in quotes.
What I disagree is on the disdain being veiled. Seems very explicit to me.
Anyway, yeah, I hadn't heard about it before either, and it's great to know that somebody out there did solve that horrible problem already, and that we can use the library.
Normally, you could say when implementing some standard that you get 80% of the functionality with 20% of the planned time. But with ASN.1 the remaining 20% could take the rest of your life.
I have also had to work with this in many contexts... Deeply embedded systems with no parsers available and where no "proper" ones would fit. So i have hand written but basic parsing and generation a few times.
Oh and there is also non compliant implementations. E.g. some passports (yes the passports with chip use tons of ASN.1) even have incorrect including of big integers (supposed to be the minimum two complement, as I recall some passports used a fixed non-complement format yanked into the 0x02 INTEGER type... Some libraries have special non-compliant parsing modes to deal with it).
According to ASN.1 Wikipedia entry, most of the tools supporting ASN.1 do the following:
1) parse the ASN.1 files,
2) generates the equivalent declaration in a programming language (like C or C++),
3) generates the encoding and decoding functions based on the previous declarations
All of these of exercise are apparently part of data engineering process or lifecycle [1].
Back in early 21st century Python is just another interpreted general purpose programming language alternative, not for web (PHP), not for command tool (TCL), not for system (C/C++), not for data wrangling (Perl), not for numerical (Matlab/Fortran), not for statistics (R).
D will probably follow similar trajectory of Python, but it really needs a special kind of killer application that will bring it to the fore.
I'm envisioning that real-time data streaming, processing and engineering can be D killer utility and defining moment that D is for data.
Ack. I wrote an ASN.1 compiler in Java in the 90s. Mostly just to make sure I understood how it and BER/DER were used in X.509. I think the BER interpretation bits are still being used somewhere
I'm sorry you had to waste a year of your life.
There are few things I dislike more in the computing world than ASN.1/BER. It seems to encourage over-specification and is suboptimal for loosely coupled systems.
(i worked with asn1c (not sure which fork) and had to hack in custom allocator and 64bit support. i shiver every time something needs attention in there)
I was using asn1c with a Rust project since there was no Rust asn1 compiler at the time. It became a bottleneck, and in profiling I found that the string copying helper used everywhere was doing bit-level copying even in our byte-aligned mode, which was extra weird cause that function had a param for byte alignment.
I salute your for deep dive into this. History would have it that ASN.1 was already there as both an IDL and serialization format when HTTPS certs were defined. If it were today, would it be the same or would we end up with protobuf or thrift or similar?
> If it were today, would it be the same or would we end up with protobuf or thrift or similar?
The main advantage of ASN.1 (specifically DER) in an HTTPS/PKI context is that it's a canonical encoding. To my understanding Protobuf isn't; I don't know about Thrift.
(A lot of hay is made about ASN.1 being bad, but it's really BER and other non-DER encodings of ASN.1 that make things painful. If you only read and write DER and limit yourself to the set of rules that occur in e.g. the Internet PKI RFCs, it's a relatively tractable and normal looking serialization format.)
I'm hardly a connoisseur of DER implementations, but my understanding is that there are two main problems with DER. The first is that the format isn't really parseable without using a schema, unlike (say) XML or JSON. This means your generic DER parser needs to have an ASN.1 schema passed into it to parse the DER, and this leads to the second problem, which is that this ends up being complex enough that basically every attempt to do so is full of memory safety issues.
> The first is that the format isn't really parseable without using a schema, unlike (say) XML or JSON.
You can parse DER perfectly well without a schema, it's a self-describing format. ASN.1 definitions give you shape enforcement, but any valid DER stream can be turned into an internal representation even if you don't know the intended structure ahead of time.
rust-asn1[1] is a nice demonstration of this: you can deserialize into a structure if you know your structure AOT, or you can deserialize into the equivalent of a "value" wrapper that enumerates/enforces all valid encodings.
> which is that this ends up being complex enough that basically every attempt to do so is full of memory safety issues.
Sort of -- DER gets a bad rap for two reasons:
1. OpenSSL had (has?) an exceptionally bad and permissive implementation of a DER parser/serializer.
2. Because of OpenSSL's dominance, a lot of "DER" in the wild was really a mixture of DER and BER. This has caused an absolutely obscene amount of pain in PKI standards, which is why just about every modern PKI standard that uses ASN.1 bends over backwards to emphasize that all encodings must be DER and not BER.
(2) in particular is pernicious: the public Web PKI has successfully extirpated BER, but it still skulks around in private PKIs and more neglected corners of the Internet (like RFC 3161 TSAs) because of a long tail of OpenSSL (and other misbehaving implementation) usage.
Overall, DER itself is a mostly normal looking TLV encoding; it's not meaningfully more complicated than Protobuf or any other serialization form. The problem is that it gets mashed together with BER, and it has a legacy of buggy implementations. The latter is IMO more of a byproduct of ASN.1's era -- if Protobuf were invented in 1984, I imagine we'd see the same long tail of buggy parsers regardless of the quality of the design itself.
You can parse DER, but you have no idea what you've just parsed without the schema. In a software library, that's often not very useful, but at least you can verify that the message was loaded correctly, and if you're reverse engineering a proprietary protocol you can at least figure out the parts you need without having to understand the entire thing.
Yes, it's like JSON in that regard. But the key part is that the framing of DER doesn't require a schema; that isn't true for all encoding formats (notably protobuf, where types have overlapping encodings that need to be disambiguated through the schema).
I'd argue that JSON is still easier as it allows you to reason about the structure and build up a (partial) schema at least. You have the keys of the objects you're trying to parse. Something like {"username":"abc","password":"def",userId:1,admin:false} would end up something like Utf8String(3){"abc"}+Utf8String(3){"def"}+Integer(1){1}+Integer(1){0} if encoded in DER style.
This has the fun side effect that DER essentially allows you to process data ("give me the 4th integer and the 2nd string of every third optional item within the fifth list") without knowing what you're interpreting.
> You can parse DER perfectly well without a schema, it's a self-describing format. ASN.1 definitions give you shape enforcement, but any valid DER stream can be turned into an internal representation even if you don't know the intended structure ahead of time.
> rust-asn1[1] is a nice demonstration of this: you can deserialize into a structure if you know your structure AOT, or you can deserialize into the equivalent of a "value" wrapper that enumerates/enforces all valid encodings.
Almost. The "tag" of the data doesn't actually tell you the type of the data by itself (most of the time at least), so while you can say "there is something of length 10 here", you can't say if it's an integer or a string or an array.
> The "tag" of the data doesn't actually tell you the type of the data by itself (most of the time at least), so while you can say "there is something of length 10 here", you can't say if it's an integer or a string or an array.
Could you explain what you mean? The tag does indeed encode this: for an integer you'd see `INTEGER`, for a string you're see `UTF8String` or similar, for an array you'd see `SEQUENCE OF`, etc.
You can verify this for yourself by using a schemaless decoder like Google's der-ascii[1]. For example, here's a decoded certificate[2] -- you get fields and types, you just don't get the semantics (e.g. "this number is a public key") associated with them because there's no schema.
It's been a long time since I last stared at DER, but my recollection was for the ASN.1 schema I was decoding, basically all of the tags ended up not using the universal tag information, so you just had to know what the type was supposed to be. The fact that everything was implicit was why I qualified it with "most of the time"; it was that way in my experience.
Oh, that makes sense. Yeah, I mostly work with DER in contexts that use universal tagging. From what I can tell, IMPLICIT tagging is used somewhat sparingly (but it is used) in the PKI RFCs.
So yeah, in that instance you do need a schema to make progress beyond "an object of some size is here in the stream."
DER is TLV. You don't know the specifics ("this integer is a value between 10 and 53") that the schema contains, but you know it's an integer when you read it.
PER lacks type information, making encoding much more efficient as long as both sides of the connection have access to the schema.
One of my big problems with ASN.1 (and its encodings) is how _crusty_ it is.
You need to populate a string? First look up whether it's a UTF8String, NumericString, PrintableString, TeletexString, VideotexString, IA5String, GraphicString, VisibleString, GeneralString, UniversalString, CHARACTER STRING, or BMPString. I'll note that three of those types have "Universal" / "General" in their name, and several more imply it.
How about a timestamp? Well, do you mean a TIME, UTCTime, GeneralizedTime, or DATE-TIME? Don't be fooled, all those types describe both a date _and_ time, if you just want a time then that's TIME-OF-DAY.
It's understandable how a standard with teletex roots got to this point but doesn't lead to good implementations when there is that much surface area to cover.
I wrote an Asn.1 decoder and since it contains type/size info you can often read a subset and handle the rest as opaque data objects if you need round-tripping, this is required as there can be plenty of data that is unknown to older consumers (like the ETSI EIDAS/Pades personal information extensions in PDF signatures).
However, to have a sane interface for actually working with the data you do need a schema that can be compiled to a language specific notation.
It is not only that ASN.1 was there before SSL, but even the certificate format was there before SSL. The certificate format comes from X.500, which is the "DAP" part of "LDAP", L as in "Lightweight" in "LDAP" refers mostly to LDAP not using public key certificates for client authentication in contrast to X.500 [1]. Bunch of other related stuff comes from RSA's PKCS series specifications, which also mostly use ASN.1.
1] the somewhat ironic part is that when it was discovered that using just passwords for authentication is not enough, the so called "lighweight" LDAP got arguably more complex that X.500. Same thing happened to SNMP (another IETF protocol using ASN.1) being "Simple" for similar reasons.
The IETF has made a bunch of standards lately like COSE for doing certificates and encryption stuff with CBOR. It’s largely for embedded stuff, but I could see it being a modern alternative. I haven’t used it myself yet.
CBOR is self-describing like JSON/XML meaning you don’t need a schema to parse it. It has better set of specific types for integers and binary data unlike JSON. It has an IANA database of tags and a canonical serialization form unlike MsgPack.
If it were designed today, I would imagine it could end up looking like JWT (JOSE) and use JSON. I've seen several key exchange formats in JSON beyond JWT/JOSE in the wild today as well, so we may even get there eventually in a future upgrade of TLS.
Yes and no, the JSON handling of things like binary data (hashes) and big-ints leaves a bit to be desired (sure we can use base64 encoding). Asn.1 isn't great by any extent but for this JSON really isn't much better apart from better library support.
Yes, JOSE is still infinitely better than XmlSignatures and the canonical XML madness to allow signatures _inside_ the document to be signed.
- huge braking change with the whole cert infrastructure
- this question was asked to the people who did choose ASN.1 for X509 and AFIK they saied today they would use protobuf. But I don't remember where I have that from.
- JOSE/JWT etc. aren't exactly that well regarded in the crypto community AFIK or designed with modern insights about how to best do such things (too much header malleability, too much crypto flexibility, too little deterministic encoding of JSON, too much imprecise defined corner cases related to JSON, too much encoding overhead for keys and similar (which for some pq stuff can get in the 100KiB ranges), and the argument of it being readable with a text editor falls apart if anything you care about is binary (keys, etc.) and often encrypted (producing binary)). (And IMHO opinion the plain text argument also falls apart for most non-crypto stuff I mean if you anyway add a base64 encoding you already dev need tooling to read it, and weather your debug tooling does a base64 decode or a (maybe additional) data decode step isn't really relevant, same for viewing in IDE which can handle binary formats just fine etc. but thats an off topic discussion)
- if we look at some modern protocols designed by security specialists/cryptographers and have been standardized we often find other stuff (e.g. protobuf for some JWT alternatives or CBOR for HSK/AuthN related stuff).
> JOSE/JWT etc. aren't exactly that well regarded in the crypto community
That is true, but it's also true that JWT/JOSE is a market winner and "everywhere" today. Obviously, it's not a great one and not without flaws, and its "competition" is things like SAML which even more people hate, so it had a low bar to clear when it was first introduced.
> CBOR
CBOR is a good mention. I have met at least one person hoping a switch to CWT/COSE happens to help somewhat combat JWT bloat in the wild. With WebAuthN requiring CBOR, there's more of a chance to get an official browser CBOR API in JS. If browsers had an out-of-the-box CBOR.parse() and CBOR.stringify(), that would be interesting for a bunch of reasons (including maybe even making CWT more likely).
One of the fun things about CBOR though is that is shares the JSON data model and is intended to be a sibling encoding, so I'd also maybe argue that if CBOR ultimately wins that's still somewhat indirectly a "JSON win".
No, we would use something similar to S-Expressions [1]. Parsing and generation would be at most a few hundred lines of code in almost any language, easily testable, and relatively extensible.
With the top level encoding solved, we could then go back to arguing about all the specific lower level encodings such as compressed vs uncompressed curve points, etc.
- ASN.1 is a set of a docent different binary encodings
- ASN.1's schema languages is IMHO way better designed then Protobuf but also more complex as it has more features
- ASN.1 can encode much more different data layouts (e.g. things where in Protobuf you have to use "tricks") each being layout in the output differently depending on the specific encoding format, annotations on the schema and options during serialization
- ASN.1 has many ways to represent things more "compact" which all come with their own complexity (like bit mask encoded boolean maps)
overall the problem of ASN.1 is that it's absurdly over engineered leading to you needing to now many hundred of pages of across multiple standard documents to just implement one single encoding of the docent existing ones and even then you might run
into ambiguous unclear definitions where you have to ask on the internet for clarification
if we ignore the schema languages for a moment most senior devs probably can write a crappy protobuf implementation over the weekend, but for ASN.1 you might not even be able to digest all relevant standards in that time :/
Realistically if ASN.1 weren't as badly overengineered and had shipped only with some of the more modern of it's encoding formats we probably would all be using ASN.1 for man things including maybe your web server responses and this probably would cut non image/video network bandwidth by 1/3 or more. But then the network is overloaded by image/video transmissions and similar not other stuff so I guess who cares???!???
Protobuf is pretty much ASN.1 with better tooling, optimized for message exchange protocol rather than files, when it comes down to the details. Withouth ASN.1 and the lessons learned from it, another binary serialization protocol would've probably taken its place, and I bet Protobuf and similar tools would look and perhaps work quite differently. The same way JSON would look and act quite differently if XML had never been invented.
they share some ideas, that doesn't make it "pretty much ASN.1". Its only "pretty much the same" if you argue all schema based general purpose binary encoding formats are "pretty much the same".
ASN.1 also isn't "file" specific at all it's main use case is and always has been being used as message exchange protocols.
(Strictly speaking ASN.1 is also not a single binary serialization format but 1. one schema language, 2. some rules for mapping things to some intermediate concepts, 3. a _docent_ different ways how to "exactly" serialize things. And in the 3rd point the difference can be pretty huge, from having something you can partially read even without schema (like protobuff) to more compact representations you can't read without a schema at all.)
> if you argue all schema based general purpose binary encoding formats are "pretty much the same"
At the implementation level they are different, but when integrating these protocols into applications, yeah, pretty much. Schema + data goes in, encoded data comes out, or the other way around. In the same way YAML and XML are pretty much the same, just different expressions of the same concepts. ASN.1 even comes with multiple expressions of exactly the same grammar, both in text form and binary form.
ASN.1 was one of the early standardised protocols in this space, though, and suffers from being used mostlyin obscure or legacy protocols, often with proprietary libraries if you go beyond the PKI side of things.
ASN.1 isn't file specific, it was designed for use in telecoms after all, but encodings like DER work better inside of file formats than Protobuf and many protocols like it. Actually having a formal standard makes including it in file types a lot easier.
Every time I have ever had the displeasure of looking at an X.whatever spec, I always end up coming away with the same conclusion.
Somehow, despite these specifications being 90% metadata by weight, they seem to consistently forget the part of the spec that lets you actually know what something is. and that part is just left up to context.
I could well be missing something, but a majority of the time it feels to me like they set out to make a database schema, and accidentally wrote the sqlite file format spec instead.
Like thanks, its nice that I can parse this into a data structure :). It would be nicer, however if doing so gave me any idea of what I can do with the data I've parsed.
Though to be fair I feel the same way about the entire concept of XML schemas. The fact that you theoretically can validate an xml document against a schema is honestly completely useless. If I am parsing XML, its because my code already knows what information it needs from the XML document, and really it should also know where that information is. I don't need a separate schema definition to tell me that! its already expressed!! In the part where I am parsing out the data I need!!!
> The fact that you theoretically can validate an xml document against a schema is honestly completely useless. If I am parsing XML, its because my code already knows what information it needs from the XML document, and really it should also know where that information is.
You seem to miss the entire point of XML schemas, or any schema really. Validating a document against a schema isn’t really for your code. It’s for documentation of what can be in a given document and how it needs to be structured. So others don’t need to read your code to understand that.
It then allows editing tools to verify generated documents. Or developers to understand how they can structure XML output properly.
Your code could also use it to verify an XML document before passing it to your code. Then you can inform the user of an invalid document and why instead of just crashing at a random point in code without rolling your own. It can also verify an entire document whereas code may only parse portions leading to later corruption.
The only goal of such ridiculous standards is to act as a form of vendor lock-in for vendors implementing those standards; the vendors get to say to governments that it is a standard and the sellers of the standards also get some money.
Any system designed picking such standards is basically betraying their client.
I think, if you want to annoy these people maximally, you should write an annotated version of the standard in a mathematical formal language.
I read the table constraints, which try to do something simple, but it's written in the most convoluted way possible.
I think I considered ASN.1 for a system once, but rejected it because of more modern technically superior system.
If the parser for something like ASN.1 doesn't fit in 80 lines of Haskell, perhaps you just shouldn't use it.
I don't know who these assholes are that say "Sure, let's make things slow and buggy, since we all hail Satan after all".
In short: I wanted to talk a bit about ASN.1, a bit about D, and a bit about the compiler itself, but couldn't think of any real cohesive format.
So I threw a bunch of semi-related ramblings together and I'm daring to call it a blog post.
Sorry in advance since I will admit it's not the greatest quality, but it's really not easy to talk about so much with such brevity (especially since I've already forgot a ton of stuff I wanted to talk about more deeply :( )
A small nitpick: I don’t think your intersection example does what you want it to do. Perhaps there’s some obscure difference in “PER-visibility” or whatnot, but at least set-theoretically,
is exactly equivalent to as (using standard mathematical notation and making precedence explicit) {0} ∪ ({2} ∩ {4,5,6,7,8}) = {0} ∪ ∅ = {0}.As someone that had the dis-pleasure to work with Asn.1 data (yes, certificates) I fully symphatise with anguish you've gone through (that 6months of Ansible HR comments cracked me up also :D ).
It makes me laugh that absolutely no one can say "I've worked with ASN.1" in a positive light :D
Bzzt! Wrong! I have worked with ASN.1 for many years, and I love ASN.1. :)
Really, I do.
In particular I like:
- that ASN.1 is generic, not specific to a given encoding rules (compare to XDR, which is both a syntax and a codec specification)
- that ASN.1 lets you get quite formal if you want to in your specifications
For example, RFC 5280 is the base PKIX spec, and if you look at RFCs 5911 and 5912 you'll see the same types (and those of other PKIX-related RFCs) with more formalisms. I use those formalisms in the ASN.1 tooling I maintain to implement a recursive, one-shot codec for certificates in all their glory.
- that ASN.1 has been through the whole evolution of "hey, TLV rules are all you need and you get extensibility for free!!1!" through "oh no, no that's not quite right is it" through "we should add extensibility functionality" and "hmm, tags should not really have to appear in modules, so let's add AUTOMATIC tagging" and "well, let's support lots of encoding rules, like non-TLV binary ones (PER, OER) and XML and JSON!".
Protocol Buffers is still stuck on TLV, all done badly by comparison to BER/DER.
As a former PKI enthusiast (tongue firmly in cheek with that description) I can say if you can limit your exposure to simply issuing certs so you control the data and thus avoid all edge cases, quirks, non-canonical encodings, etc, dealing with ASN.1 is “not too terrible.” But it is bad. The thing that used to regularly amaze me was the insane depths of complexity the designers went to … back in the 70’s! It is astounding to me that they managed to make a system that encapsulated so much complexity and is still in everyday use today.
You are truly a masochist and I salute you.
It's also amazing that we're basically using only a couple of free-form text fields in the WebPKI for the most crucial parts of validation.
Completely ignoring the ASN.1 support for complicated structures, with more than one CVE linked to incorrect parsing of these text fields m
There was an amusing chain of comments the last time protobuf was mentionned in which some people were arguing that it had been a terrible idea and ASN.1, as a standard, should have been used.
It was hilarious because clearly none of the people who were in favor had ever used ASN.1.
It's not entirely horrible, parsing DER dynamically enough to handle interpreting most common certificates can be done in some 200-300 lines of C#, so I'd take that any day over XML.
The main problem is that to work with the data you need to understand the semantics of the magic object identifiers and while things like the PKIX module can be found easily, the definitions for other more obscure namespaces for extensions can be harder to locate as it's scattered in documentation from various standardization organizations.
So, protobuf could very well have been transported in DER, the problem issue was probably more one of Google not seeing any value of interoperability and wanting to keep it simple (or worse, clashing by oblivious users re-using the wrong less well documented namespaces).
Cryptonector[1] maintains an ASN.1 implementation[2] and usually has good things to say about the language and its specs. (Kind of surprised not he’s not in the comments here already :) )
[1] https://news.ycombinator.com/user?id=cryptonector
[2] https://github.com/heimdal/heimdal/tree/master/lib/asn1
Thanks for the shout-out! Yes, I do have nice things to say about ASN.1. It's all the others that mostly suck, with a few exceptions like XDR and DCE/Microsoft RPC's IDL.
At least you might be summoning Walter Bright in talking about D. One of my favorite languages I wish more companies would use. Unfortunately for its own sake, Go and Rust are way more popular in the industry.
Unfortunately it lost the opportunity back when Remedy Games and Facebook were betting on it.
The various WIP features, and switching focus of what might bring more people into the ecosystem, have given away to other languages.
Even C#, Java and C++ have gotten many of features that were only available in D as Andrei Alexandrescu's book came out in 2011.
I feel like back when D might've been a language worth looking into, it was hampered by the proprietary compilers.
And still today, the first thought that comes to mind when I think D is "that language with proprietary compilers", even though there has apparently been some movement on that front? Not really worth looking into now that we have Go as an excellent GC'd compiled language and Rust as an excellent C++ replacement.
Having two different languages for those purposes seems like a better idea anyway than having one "optionally managed" language. I can't even imagine how that could possibly work in a way that doesn't just fragment the community.
I don't think the proprietary compilers is a true set back, look at for example C# before it became as open as .NET has become today (MIT licensed!) and yet the industry took it. I think what D needed was what made Ruby mainly relevant: Rails. D needs a community framework that makes it a strong candidate for a specific domain.
I honestly think if Walter Bright (or anyone within D) invested in having a serious web framework for D even if its not part of the standard library, it could be worth its weight in gold. Right now there's only Vibe.d that stands out but I have not seen it grow very much since its inception, its very slow moving. Give me a feature rich web framework in D comparable to Django or Rails and all my side projects will shift to D. The real issue is it needs to be batteries included since D does not have dozens of OOTB libraries to fill in gaps with.
Look at Go as an example, built-in HTTP server library, production ready, its not ultra fancy but it does the work.
Sounds like you should look into it instead of idly speculating! Also, the funny thing about a divisive feature is that it doesn't matter if it fragments the community if you can use it successfully. There are a lot of loud people in the D community who freak out and whine about the GC, and there are plenty more quiet ones who are happily getting things done without making much noise. It's a great language.
Go is a GC language that has eaten a chunk of the industry (Docker, TypeScript, Kubernetes... Minio... and many more I'm sure) and only some people cry about it, but you know who else owns sizable chunks of the industry? Java and C# which are both GC languages. While some people waste hours crying about GCs the rest of us have built the future around it. Hell, all of AI is eaten by Python another GC language.
Are you saying that if I'm using D-without-GC, I can use any D library, including ones written with the assumption that there is a GC? If not, how does it not fracture the community?
> There are a lot of loud people in the D community who freak out and whine about the GC, and there are plenty more quiet ones who are happily getting things done without making much noise
This sounds like an admission that the community is fractured, except with a weirdly judgemental tone towards those who use D without a GC?
Just wanted to say I enjoyed your post very much. Thank you for writing it. I love D but unfortunately I haven't touched it for several years. I also have some experience writing parsers and implementing protocols.
Don't worry, it's your blog, and your way. Keep it up, if it makes you whole.
OMG ASN.1.
For those of you who missed this, there was a very interesting thing that happened in the growth of the internet.
At the time people were evolving the protocols through the IETF. So all the things that you rely on now - for the most part - just came into being. One day there was email. There was ftp. There was TCP. There were the Van Jacobson TCP mods.
At this time corporate types paid no attention to the internet. Academic types and the IETF were from what I saw the main developers.
Then one day the corporate world realized they might make money. But the development process of the protocols was incomprehensible (and incompatible) with the corporate culture. TCP was clearly a mess, all these protocols like DNS were a mess. From the corporate perspective.
So began the protocol wars https://en.wikipedia.org/wiki/Protocol_Wars.
Whether ASN.1 was a product of that war or just a product of the corporate mentality, it serves as a powerful instance of the what the corporate world looks like vs the academic world looks like. You can find the wreckage from the war littered around. If you see and X.something protocol it could well be one of the relics. There were a few X.things that were adopted and useful, but were others that would haunt your dreams.
Although this is ancient history, and pretty much now told from the corporate perspective, it suggests to us that the corporate process for thinking is not as effective as the alternative - the IETF and Academic.
One is a sort of recipe culture. You write a recipe, everyone follows it and you are happy. The other is a sort of functional culture. If you can make bread and eat it you are happy. When the bread doesn't taste good you fix it.
Given the kind of bread that is commonly available in the US now, we can draw some conclusions about recipe thinking, recipe culture, corporate culture etc. One could even extend this paradigm of thinking to new things like AI. Or not.
My partner and I were re-watching Father of the Bride the other day (rest in peace, Diane Keaton) and during the early parents meeting the son-in-law to-be describes himself as a communications consultant, working on X.25 networking installations.
I had to pause the movie and explain to my partner just how close the world came to missing out on The Internet, and having instead to suffer the ignominy of visiting sites with addresses like “CN=wikipedia, OU=org, C=US” and god knows what other dreadful protocols underlying them. I think she was surprised how angry and distressed I sounded! It would have been awful!
Poor her!
> how close the world came to missing out on The Internet
Monday-morning-quarterbacking is an unproductive pastime, but I don't think it was very close, on account of the Internet side having developed a bunch of useful (if a bit ramshackle) protocols and applications much faster than the ISO team, because the specs were freely available (not to mention written in a much more understandable manner). I still rue the day the IETF dropped the "distribution of this memo is unlimited" phrase from the RFC preambles. Yeah I understand that it originally had more to do with classification than general availability, but it described the ethos perfectly.
It's not all roses and we're paying for the freewheeling approach to this day in some cases, cf. email spam and BGP hijacking. But it gave results and provided unstoppable momentum.
I get your point and it is reasonable. We are paying today. However, I believe part of the problem is that when you could make money from email, it froze. The evolution stopped. We could easily evolve email if ...
The "if..." is one of the two VERY BIG INTERNET PROBLEMS. How do you pay for things? We have an answer which pollutes. Ads => enshitification. Like recipes for how to boil and egg that are three pages of ads, and then are wrong. But we now have AI, right?
The other problem is identities on the internet. This is hard. But email? Nope. Login with Apple? Nope. Login with Google? Double, Quadruple Nope.
In the real world we have both privacy AND accountability. And. It is very difficult to maintain two identities in real life.
Privacy on the internet? Nope. Accountability? Only if you are invested in your account. Privacy and Accountability together? Nope. Two identities? You can easily do 100's or more. freg@g*.com, greg33222@g*.com, janesex994@g*.com, dogs4humanity@g*.com etc.
There would have been a network like the Internet if the "Bellheads" in the ITU won. It would have been pay-by-the-byte-transferred.
Protocol Wars are also a story of early enshittification of Internet, where attempts to push forward with solutions to already known problems were pushed back because they would require investment on vendor side instead of just carrying on using software majorly delivered free of charge because DoD needed a quick replacement for their PDP-10 fleet. (Only slight hyperbole)
A lot of issues also came from ISO standards refusing to get stuck without known anticipated issues taken care of, or with unextendable lockin due to accidental temporary solution ending up long term one, while IETF protocols happily ran forward "because we will fix it later" only to find out that I stalled base ossified things - one of the lessons is to add randomness to new protocols so that naive implementation will fail on day one.
Then there were accidental things, like a major ASN.1 implementation for C in 1990 being apparently shit (a tradition picked up in even worse way by OpenSSL and close to most people playing with X.509 IMO), or even complaints about ASN.1 encodings being slow due to CPU lacking barrel shifters (I figure it must refer to PER somehow)
"OMG ASN.1" is the name of my next band.
I'm confused. much of your story is correct, but you replace the primary actors (the ITU and ISO) with 'corporate'. This is true is inasmuch as the ITU represented telephony culture, but isn't really representative of corporatism as a whole.
there is _another_ 'protocol war', but it was certainly a cold one. Internet companies starting in the late 90's just decided they weren't going to care any more about standardization efforts. They could take existing protocols and warp their intent. they could abandon the goal of universal reachability in order to make a product more consumable by the general public and add 'features'. basically whatever would stick. the poster child for this division was the development of IPv6 and the multicast protocols. The IETF just assumed that like the last 20 years, they would hash out the solutions and the network would deploy them. Except the rules had changed out from under them, the internet wasn't being run by government and academic agencies anymore, and the new crew just couldn't be bothered.
two wars. the IETF won the first through rough consensus and running code, but lost the second for nearly the same reason.
There's a Turkish saying "a human will [use] this, a human!" to signify that the thing is so abnormal/out-of-proportion that it doesn't seem to be made for people. The verb changes based on the context. If you had made too much food for example, the verb would be "eat". I think it's a great motto for design.
Remember the Game of Thrones quote, "the man who passes the sentence should swing the sword"? I think it should also be applicable to specs. Anyone who comes up with a spec must be the first responsible party to develop a parser for it. The spec doesn't get ratified unless it comes with working parser code with unit tests.
That kind of requirement might actually improve specs.
Very neat article. I too have spent countless hours (but not as many) hacking on an ASN.1 compiler, adding a subset of X.681, X.682, X.683 functionality to make it possible to -in a single codec invocation!- a whole certificate, with all its details like extensions and OtherName SANs and what not decoded recursively. So it's very nice to see a post about similar work!
ASN.1 really is quite awesome. It gets a lot of hate, but it's not deserved. ASN.1 is not just a syntax or set of encoding rules -- it's a type system, and a very powerful one at that.
I really love D, it's one of my favorite languages. I've started implementing a vim-like text editor in it from scratch (using only Raylib as a dependency) and was surprised how far I was able to get and how good my test coverage was for it. My personal favorite features of D:
* unit tests anywhere, so I usually write my methods/functions with unit tests following them immediately
* blocks like version(unittest) {} makes it easy to exclude/include things that should only be compiled for testing
* enums, unions, asserts, contract programming are all great
I would say I didn't have to learn D much. Whatever I wanted to do with it, I would find in its docs or asked ChatGPT and there would always be a very nice way to do things.
D is a bittersweet topic for me.
From a philosophical/language-design standpoint, it ticks so many boxes. It had the potential to be wildly popular, had a few things gone differently.
If the language tooling and library ecosystem were on par with the titans of today, like Rust/Go, it really would be a powerhouse language.
Isn't D supported by the GNU compiler collection? I personally would prefer this type of tooling over what Rust and Go do (I can't even get their compilers to run on my old platform anymore; not to mention all this dependencies on remote resources typical Rust/Go projects seem to have: which seems to be enforced by the ecosystem?)
It is, however keeping LDC and GCC up to date is a volunteer effort with not enough people, so they are always a bit behind dmd.
Still much better than GCCGO, kind of useless for anything beyond Go 1.18, no one is updating it any longer, and may as well join gcj.
LDC isn't regularly behind DMD lately. The issue lately has been more the release process with respect to DMD. People issues impacting that.
Having written real code in D, I can say that the slight discrepancy between dmd, LDC, and gdc isn't a roadblock in practice.
It is
So, I also write Go and I don't get the part about tooling. I don't need formatters or linters as I'm adult enough to know how to format my code (in fact I dislike tools doing it for me). D also has dub, which is fine, as far as package managers go. The ecosystem is the only issue and Go does arguably have a lot of very cool libraries for virtually anything, but outside of webdev, I can't see myself using them. This is why D works a lot better for projects where I don't need all those dependencies and would do better without them.
Yeah, the foundations of the language are incredible. It's just everything else around it that brings it down (and is unfortunately very hard to motivate people to solve).
D definitely missed a critical period, but I love it all the same.
I freely admit to not being a Go or Rust expert, but from what I can tell using C from D is even easier than in either of these languages. The C++ interop is also decently usable.
IMO, the bigger issue is language tooling.
Wow, I needed to parse just one small ASN.1 with one value (one signature), but I didn't know ASN.1 can have a specification (to generate parser from it). So I ended up untangling it myself, just for that specific 256 bits.
Still I think it's better to have overapecified format for security stuff, json and xml are just too vague and parsers are unpredictable.
I worked on a Swift ASN.1 compiler [1] a while back (not swift-asn1, mine used Codable). I saved myself some time by using the Heimdal JSON compiler, which can transform ASN.1 into a much more parseable JSON AST.
[1] https://github.com/PADL/ASN1Codable
[2] https://github.com/heimdal/heimdal/tree/master/lib/asn1
Not heard of either of those projects before, but I love how libasn1's README has a thinly veiled hint of disdain for ASN.1
> which can transform ASN.1 into a much more parseable JSON AST
The sign of a person who's been hurt, and doesn't want others to feel the same pain :D
I think you're mistaking this:
| ASN.1 is a... some would say baroque, perhaps obsolete, archaic even, "syntax" for expressing data type schemas, and also a set of "encoding rules" (ERs) that specify many ways to encode values of those types for interchange.
for me expressing disdain for ASN.1. On the contrary: I'm saying those who would say that are wrong:
| ASN.1 is a wheel that everyone loves to reinvent, and often badly. It's worth knowing a bit about it before reinventing this wheel badly yet again.
:)
Hey, I love how the author describes ASN.1 as a "syntax" in quotes.
What I disagree is on the disdain being veiled. Seems very explicit to me.
Anyway, yeah, I hadn't heard about it before either, and it's great to know that somebody out there did solve that horrible problem already, and that we can use the library.
Ugh, I did not mean to express disdain.
Normally, you could say when implementing some standard that you get 80% of the functionality with 20% of the planned time. But with ASN.1 the remaining 20% could take the rest of your life.
I have also had to work with this in many contexts... Deeply embedded systems with no parsers available and where no "proper" ones would fit. So i have hand written but basic parsing and generation a few times.
Oh and there is also non compliant implementations. E.g. some passports (yes the passports with chip use tons of ASN.1) even have incorrect including of big integers (supposed to be the minimum two complement, as I recall some passports used a fixed non-complement format yanked into the 0x02 INTEGER type... Some libraries have special non-compliant parsing modes to deal with it).
According to ASN.1 Wikipedia entry, most of the tools supporting ASN.1 do the following:
1) parse the ASN.1 files, 2) generates the equivalent declaration in a programming language (like C or C++), 3) generates the encoding and decoding functions based on the previous declarations
All of these of exercise are apparently part of data engineering process or lifecycle [1].
Back in early 21st century Python is just another interpreted general purpose programming language alternative, not for web (PHP), not for command tool (TCL), not for system (C/C++), not for data wrangling (Perl), not for numerical (Matlab/Fortran), not for statistics (R).
D will probably follow similar trajectory of Python, but it really needs a special kind of killer application that will bring it to the fore.
I'm envisioning that real-time data streaming, processing and engineering can be D killer utility and defining moment that D is for data.
[1] Fundamentals of Data Engineering:
https://www.oreilly.com/library/view/fundamentals-of-data/97...
> D will probably follow similar trajectory of Python
Apologies in advance for being that guy but D's trajectory seems pretty much locked in by now, while Python has been rebirthed with machine learning.
Ack. I wrote an ASN.1 compiler in Java in the 90s. Mostly just to make sure I understood how it and BER/DER were used in X.509. I think the BER interpretation bits are still being used somewhere
I'm sorry you had to waste a year of your life.
There are few things I dislike more in the computing world than ASN.1/BER. It seems to encourage over-specification and is suboptimal for loosely coupled systems.
But it looks like you had a decent time...
some people simply like pain :D
(i worked with asn1c (not sure which fork) and had to hack in custom allocator and 64bit support. i shiver every time something needs attention in there)
I was using asn1c with a Rust project since there was no Rust asn1 compiler at the time. It became a bottleneck, and in profiling I found that the string copying helper used everywhere was doing bit-level copying even in our byte-aligned mode, which was extra weird cause that function had a param for byte alignment.
One memcpy made it like 30% faster overall.
:)
Honestly any compiler project in pure C is pretty hardcore in my eyes, ASN.1 must amplify the sheer horror.
I salute your for deep dive into this. History would have it that ASN.1 was already there as both an IDL and serialization format when HTTPS certs were defined. If it were today, would it be the same or would we end up with protobuf or thrift or similar?
> If it were today, would it be the same or would we end up with protobuf or thrift or similar?
The main advantage of ASN.1 (specifically DER) in an HTTPS/PKI context is that it's a canonical encoding. To my understanding Protobuf isn't; I don't know about Thrift.
(A lot of hay is made about ASN.1 being bad, but it's really BER and other non-DER encodings of ASN.1 that make things painful. If you only read and write DER and limit yourself to the set of rules that occur in e.g. the Internet PKI RFCs, it's a relatively tractable and normal looking serialization format.)
I'm hardly a connoisseur of DER implementations, but my understanding is that there are two main problems with DER. The first is that the format isn't really parseable without using a schema, unlike (say) XML or JSON. This means your generic DER parser needs to have an ASN.1 schema passed into it to parse the DER, and this leads to the second problem, which is that this ends up being complex enough that basically every attempt to do so is full of memory safety issues.
> The first is that the format isn't really parseable without using a schema, unlike (say) XML or JSON.
You can parse DER perfectly well without a schema, it's a self-describing format. ASN.1 definitions give you shape enforcement, but any valid DER stream can be turned into an internal representation even if you don't know the intended structure ahead of time.
rust-asn1[1] is a nice demonstration of this: you can deserialize into a structure if you know your structure AOT, or you can deserialize into the equivalent of a "value" wrapper that enumerates/enforces all valid encodings.
> which is that this ends up being complex enough that basically every attempt to do so is full of memory safety issues.
Sort of -- DER gets a bad rap for two reasons:
1. OpenSSL had (has?) an exceptionally bad and permissive implementation of a DER parser/serializer.
2. Because of OpenSSL's dominance, a lot of "DER" in the wild was really a mixture of DER and BER. This has caused an absolutely obscene amount of pain in PKI standards, which is why just about every modern PKI standard that uses ASN.1 bends over backwards to emphasize that all encodings must be DER and not BER.
(2) in particular is pernicious: the public Web PKI has successfully extirpated BER, but it still skulks around in private PKIs and more neglected corners of the Internet (like RFC 3161 TSAs) because of a long tail of OpenSSL (and other misbehaving implementation) usage.
Overall, DER itself is a mostly normal looking TLV encoding; it's not meaningfully more complicated than Protobuf or any other serialization form. The problem is that it gets mashed together with BER, and it has a legacy of buggy implementations. The latter is IMO more of a byproduct of ASN.1's era -- if Protobuf were invented in 1984, I imagine we'd see the same long tail of buggy parsers regardless of the quality of the design itself.
> You can parse DER perfectly well without a schema, it's a self-describing format.
If the schema uses IMPLICIT tags then - unless I'm missing something - this isn't (easily) possible.
The most you'd be able to tell is whether the TLV contains a primitive or constructed value.
This is a pretty good resource on custom tagging, and goes over how IMPLICIT works: https://www.oss.com/asn1/resources/asn1-made-simple/asn1-qui...
> Because of OpenSSL's dominance, a lot of "DER" in the wild was really a mixture of DER and BER
:sweat: That might explain why some of the root certs on my machine appear to be BER encoded (barring decoder bugs, which is honestly more likely).
Ah yeah, IMPLICIT is the main edge case. That's a good point.
You can parse DER, but you have no idea what you've just parsed without the schema. In a software library, that's often not very useful, but at least you can verify that the message was loaded correctly, and if you're reverse engineering a proprietary protocol you can at least figure out the parts you need without having to understand the entire thing.
Yes, it's like JSON in that regard. But the key part is that the framing of DER doesn't require a schema; that isn't true for all encoding formats (notably protobuf, where types have overlapping encodings that need to be disambiguated through the schema).
I'd argue that JSON is still easier as it allows you to reason about the structure and build up a (partial) schema at least. You have the keys of the objects you're trying to parse. Something like {"username":"abc","password":"def",userId:1,admin:false} would end up something like Utf8String(3){"abc"}+Utf8String(3){"def"}+Integer(1){1}+Integer(1){0} if encoded in DER style.
This has the fun side effect that DER essentially allows you to process data ("give me the 4th integer and the 2nd string of every third optional item within the fifth list") without knowing what you're interpreting.
> You can parse DER perfectly well without a schema, it's a self-describing format. ASN.1 definitions give you shape enforcement, but any valid DER stream can be turned into an internal representation even if you don't know the intended structure ahead of time.
> rust-asn1[1] is a nice demonstration of this: you can deserialize into a structure if you know your structure AOT, or you can deserialize into the equivalent of a "value" wrapper that enumerates/enforces all valid encodings.
Almost. The "tag" of the data doesn't actually tell you the type of the data by itself (most of the time at least), so while you can say "there is something of length 10 here", you can't say if it's an integer or a string or an array.
> The "tag" of the data doesn't actually tell you the type of the data by itself (most of the time at least), so while you can say "there is something of length 10 here", you can't say if it's an integer or a string or an array.
Could you explain what you mean? The tag does indeed encode this: for an integer you'd see `INTEGER`, for a string you're see `UTF8String` or similar, for an array you'd see `SEQUENCE OF`, etc.
You can verify this for yourself by using a schemaless decoder like Google's der-ascii[1]. For example, here's a decoded certificate[2] -- you get fields and types, you just don't get the semantics (e.g. "this number is a public key") associated with them because there's no schema.
[1]: https://github.com/google/der-ascii
[2]: https://github.com/google/der-ascii/blob/main/samples/cert.t...
It's been a long time since I last stared at DER, but my recollection was for the ASN.1 schema I was decoding, basically all of the tags ended up not using the universal tag information, so you just had to know what the type was supposed to be. The fact that everything was implicit was why I qualified it with "most of the time"; it was that way in my experience.
Oh, that makes sense. Yeah, I mostly work with DER in contexts that use universal tagging. From what I can tell, IMPLICIT tagging is used somewhat sparingly (but it is used) in the PKI RFCs.
So yeah, in that instance you do need a schema to make progress beyond "an object of some size is here in the stream."
DER is TLV. You don't know the specifics ("this integer is a value between 10 and 53") that the schema contains, but you know it's an integer when you read it.
PER lacks type information, making encoding much more efficient as long as both sides of the connection have access to the schema.
One of my big problems with ASN.1 (and its encodings) is how _crusty_ it is.
You need to populate a string? First look up whether it's a UTF8String, NumericString, PrintableString, TeletexString, VideotexString, IA5String, GraphicString, VisibleString, GeneralString, UniversalString, CHARACTER STRING, or BMPString. I'll note that three of those types have "Universal" / "General" in their name, and several more imply it.
How about a timestamp? Well, do you mean a TIME, UTCTime, GeneralizedTime, or DATE-TIME? Don't be fooled, all those types describe both a date _and_ time, if you just want a time then that's TIME-OF-DAY.
It's understandable how a standard with teletex roots got to this point but doesn't lead to good implementations when there is that much surface area to cover.
I wrote an Asn.1 decoder and since it contains type/size info you can often read a subset and handle the rest as opaque data objects if you need round-tripping, this is required as there can be plenty of data that is unknown to older consumers (like the ETSI EIDAS/Pades personal information extensions in PDF signatures).
However, to have a sane interface for actually working with the data you do need a schema that can be compiled to a language specific notation.
It is not only that ASN.1 was there before SSL, but even the certificate format was there before SSL. The certificate format comes from X.500, which is the "DAP" part of "LDAP", L as in "Lightweight" in "LDAP" refers mostly to LDAP not using public key certificates for client authentication in contrast to X.500 [1]. Bunch of other related stuff comes from RSA's PKCS series specifications, which also mostly use ASN.1.
1] the somewhat ironic part is that when it was discovered that using just passwords for authentication is not enough, the so called "lighweight" LDAP got arguably more complex that X.500. Same thing happened to SNMP (another IETF protocol using ASN.1) being "Simple" for similar reasons.
The IETF has made a bunch of standards lately like COSE for doing certificates and encryption stuff with CBOR. It’s largely for embedded stuff, but I could see it being a modern alternative. I haven’t used it myself yet.
CBOR is self-describing like JSON/XML meaning you don’t need a schema to parse it. It has better set of specific types for integers and binary data unlike JSON. It has an IANA database of tags and a canonical serialization form unlike MsgPack.
If it were designed today, I would imagine it could end up looking like JWT (JOSE) and use JSON. I've seen several key exchange formats in JSON beyond JWT/JOSE in the wild today as well, so we may even get there eventually in a future upgrade of TLS.
Yes and no, the JSON handling of things like binary data (hashes) and big-ints leaves a bit to be desired (sure we can use base64 encoding). Asn.1 isn't great by any extent but for this JSON really isn't much better apart from better library support.
Yes, JOSE is still infinitely better than XmlSignatures and the canonical XML madness to allow signatures _inside_ the document to be signed.
possible but unlikely for multiple reasons
- huge braking change with the whole cert infrastructure
- this question was asked to the people who did choose ASN.1 for X509 and AFIK they saied today they would use protobuf. But I don't remember where I have that from.
- JOSE/JWT etc. aren't exactly that well regarded in the crypto community AFIK or designed with modern insights about how to best do such things (too much header malleability, too much crypto flexibility, too little deterministic encoding of JSON, too much imprecise defined corner cases related to JSON, too much encoding overhead for keys and similar (which for some pq stuff can get in the 100KiB ranges), and the argument of it being readable with a text editor falls apart if anything you care about is binary (keys, etc.) and often encrypted (producing binary)). (And IMHO opinion the plain text argument also falls apart for most non-crypto stuff I mean if you anyway add a base64 encoding you already dev need tooling to read it, and weather your debug tooling does a base64 decode or a (maybe additional) data decode step isn't really relevant, same for viewing in IDE which can handle binary formats just fine etc. but thats an off topic discussion)
- if we look at some modern protocols designed by security specialists/cryptographers and have been standardized we often find other stuff (e.g. protobuf for some JWT alternatives or CBOR for HSK/AuthN related stuff).
> JOSE/JWT etc. aren't exactly that well regarded in the crypto community
That is true, but it's also true that JWT/JOSE is a market winner and "everywhere" today. Obviously, it's not a great one and not without flaws, and its "competition" is things like SAML which even more people hate, so it had a low bar to clear when it was first introduced.
> CBOR
CBOR is a good mention. I have met at least one person hoping a switch to CWT/COSE happens to help somewhat combat JWT bloat in the wild. With WebAuthN requiring CBOR, there's more of a chance to get an official browser CBOR API in JS. If browsers had an out-of-the-box CBOR.parse() and CBOR.stringify(), that would be interesting for a bunch of reasons (including maybe even making CWT more likely).
One of the fun things about CBOR though is that is shares the JSON data model and is intended to be a sibling encoding, so I'd also maybe argue that if CBOR ultimately wins that's still somewhat indirectly a "JSON win".
No, we would use something similar to S-Expressions [1]. Parsing and generation would be at most a few hundred lines of code in almost any language, easily testable, and relatively extensible.
With the top level encoding solved, we could then go back to arguing about all the specific lower level encodings such as compressed vs uncompressed curve points, etc.
[1] https://datatracker.ietf.org/doc/rfc9804
ASN.1 seems orders of magnitude simpler than Protobuf or Thrift.
how did you end up believing that?
- ASN.1 is a set of a docent different binary encodings
- ASN.1's schema languages is IMHO way better designed then Protobuf but also more complex as it has more features
- ASN.1 can encode much more different data layouts (e.g. things where in Protobuf you have to use "tricks") each being layout in the output differently depending on the specific encoding format, annotations on the schema and options during serialization
- ASN.1 has many ways to represent things more "compact" which all come with their own complexity (like bit mask encoded boolean maps)
overall the problem of ASN.1 is that it's absurdly over engineered leading to you needing to now many hundred of pages of across multiple standard documents to just implement one single encoding of the docent existing ones and even then you might run into ambiguous unclear definitions where you have to ask on the internet for clarification
if we ignore the schema languages for a moment most senior devs probably can write a crappy protobuf implementation over the weekend, but for ASN.1 you might not even be able to digest all relevant standards in that time :/
Realistically if ASN.1 weren't as badly overengineered and had shipped only with some of the more modern of it's encoding formats we probably would all be using ASN.1 for man things including maybe your web server responses and this probably would cut non image/video network bandwidth by 1/3 or more. But then the network is overloaded by image/video transmissions and similar not other stuff so I guess who cares???!???
for "docent", do you mean "dozen"?
I had to look up https://www.merriam-webster.com/dictionary/docent
Protobuf is pretty much ASN.1 with better tooling, optimized for message exchange protocol rather than files, when it comes down to the details. Withouth ASN.1 and the lessons learned from it, another binary serialization protocol would've probably taken its place, and I bet Protobuf and similar tools would look and perhaps work quite differently. The same way JSON would look and act quite differently if XML had never been invented.
> Protobuf is pretty much ASN.1
no, not at all
they share some ideas, that doesn't make it "pretty much ASN.1". Its only "pretty much the same" if you argue all schema based general purpose binary encoding formats are "pretty much the same".
ASN.1 also isn't "file" specific at all it's main use case is and always has been being used as message exchange protocols.
(Strictly speaking ASN.1 is also not a single binary serialization format but 1. one schema language, 2. some rules for mapping things to some intermediate concepts, 3. a _docent_ different ways how to "exactly" serialize things. And in the 3rd point the difference can be pretty huge, from having something you can partially read even without schema (like protobuff) to more compact representations you can't read without a schema at all.)
> if you argue all schema based general purpose binary encoding formats are "pretty much the same"
At the implementation level they are different, but when integrating these protocols into applications, yeah, pretty much. Schema + data goes in, encoded data comes out, or the other way around. In the same way YAML and XML are pretty much the same, just different expressions of the same concepts. ASN.1 even comes with multiple expressions of exactly the same grammar, both in text form and binary form.
ASN.1 was one of the early standardised protocols in this space, though, and suffers from being used mostlyin obscure or legacy protocols, often with proprietary libraries if you go beyond the PKI side of things.
ASN.1 isn't file specific, it was designed for use in telecoms after all, but encodings like DER work better inside of file formats than Protobuf and many protocols like it. Actually having a formal standard makes including it in file types a lot easier.
Every time I have ever had the displeasure of looking at an X.whatever spec, I always end up coming away with the same conclusion.
Somehow, despite these specifications being 90% metadata by weight, they seem to consistently forget the part of the spec that lets you actually know what something is. and that part is just left up to context.
I could well be missing something, but a majority of the time it feels to me like they set out to make a database schema, and accidentally wrote the sqlite file format spec instead.
Like thanks, its nice that I can parse this into a data structure :). It would be nicer, however if doing so gave me any idea of what I can do with the data I've parsed.
Though to be fair I feel the same way about the entire concept of XML schemas. The fact that you theoretically can validate an xml document against a schema is honestly completely useless. If I am parsing XML, its because my code already knows what information it needs from the XML document, and really it should also know where that information is. I don't need a separate schema definition to tell me that! its already expressed!! In the part where I am parsing out the data I need!!!
> The fact that you theoretically can validate an xml document against a schema is honestly completely useless. If I am parsing XML, its because my code already knows what information it needs from the XML document, and really it should also know where that information is.
You seem to miss the entire point of XML schemas, or any schema really. Validating a document against a schema isn’t really for your code. It’s for documentation of what can be in a given document and how it needs to be structured. So others don’t need to read your code to understand that.
It then allows editing tools to verify generated documents. Or developers to understand how they can structure XML output properly.
Your code could also use it to verify an XML document before passing it to your code. Then you can inform the user of an invalid document and why instead of just crashing at a random point in code without rolling your own. It can also verify an entire document whereas code may only parse portions leading to later corruption.
SNMP MIB files are written in ASN.1. That is the extent of my knowledge about ASN.1, was nice to learn a little more by reading this blog post.
Thank you, now I’m much more disillusioned in asn.1
The only goal of such ridiculous standards is to act as a form of vendor lock-in for vendors implementing those standards; the vendors get to say to governments that it is a standard and the sellers of the standards also get some money.
Any system designed picking such standards is basically betraying their client.
I think, if you want to annoy these people maximally, you should write an annotated version of the standard in a mathematical formal language.
I read the table constraints, which try to do something simple, but it's written in the most convoluted way possible.
I think I considered ASN.1 for a system once, but rejected it because of more modern technically superior system.
If the parser for something like ASN.1 doesn't fit in 80 lines of Haskell, perhaps you just shouldn't use it.
I don't know who these assholes are that say "Sure, let's make things slow and buggy, since we all hail Satan after all".