Handling cookies is a minefield

(grayduck.mn)

211 points | by todsacerdoti 5 hours ago ago

99 comments

  • trevor-e a minute ago

    I came across a similar issue when experimenting with the Crystal language. I thought it would be fun to build a simple web scraper to test it out, only to find the default HTTP client fails to parse many cookies set by the response and aborts.

  • maxwellg 13 minutes ago

    Cookies are filled with weird gotchas and uncomfortable behavior that works 99.95% of the time. My favorite cookie minefield is cookie shadowing - if you set cookies with the same name but different key properties (domain, path, etc.) you can get multiple near-identical cookies set at once - with no ability for the backend or JS to tell which is which.

    Try going to https://example.com/somepath and entering the following into the browser console:

      document.cookie = "foo=a"; 
      document.cookie = "foo=b; domain=.example.com";
      document.cookie = "foo=c; path=/somepath";
      document.cookie
    
    I get

      'foo=c; foo=a; foo=b'
  • kibwen 4 hours ago

    The article mentions Rust's approach, but note that (unlike the other mentioned languages) Rust doesn't ship any cookie handling facilities in the standard library, so it's actually looking at the behavior of the third-party "cookie" crate (which includes the option to percent-encode as Ruby does): https://docs.rs/cookie/0.18.1/cookie/

    • marumari 4 hours ago

      Thanks for pointing that out -- I've updated the article and given you credit down at the bottom. Let me know if you'd prefer something other than "kibwen."

    • juped 4 hours ago

      De facto standardization by snapping up good names early!

      • echelon 3 hours ago

        Not really. A lot of essential third party Rust crates and projects have "weird" names, eg. "nom", "tokio", etc. You can see that from the list of most downloaded crates [1].

        This one just happens to have been owned and maintained by core Rust folks and used in a lot of larger libraries. This is more the exception than the rule.

        It's a given that you should do due diligence on crates and not just use the first name that matches your use case. There's a lot of crate name squatting and abandonware.

        Rust crates need namespacing to avoid this and similar problems going forward.

        [1] https://crates.io/crates?sort=downloads

        • codetrotter 2 hours ago

          A sibling comment talked about “UwU names”. Not sure exactly if they are referring to “tokio” or something else. But if it’s tokio, they might find this informative:

          > I enjoyed visiting Tokio (Tokyo) the city and I liked the "io" suffix and how it plays w/ Mio as well. I don't know... naming is hard so I didn't spend too much time thinking about it.

          https://www.reddit.com/r/rust/comments/d3ld9z/comment/f03lnm...

          From the original release announcement of tokio on r/rust on Reddit.

          And also to the sibling commenter, if tokio is a problematic name to you:

          Would either of the following names be equally problematic or not?

          - Chicago. Code name for Windows 95, and also the name of a city in the USA. https://en.wikipedia.org/wiki/Development_of_Windows_95 https://en.wikipedia.org/wiki/Chicago

          - Oslo. Name of a team working on OpenStack, and also appears in their package names. Oslo is the capital of Norway. https://wiki.openstack.org/wiki/Oslo https://en.wikipedia.org/wiki/Oslo

          If yes, why? If no, also why?

          • mardef an hour ago

            Just want to point out that location names are used for codenames because they cannot be trademarked

            Big tech uses them instead of wasting legal time and money having to clear a new name that's temporary or non-public.

            Changing the name to Tokio removes this benefit and still leaves it disconnected from its purpose.

          • samatman an hour ago

            You really should learn the difference between [flagged] [dead] and just [dead] if you're going to run [showdead].

            There's no reason to give an account called dangsux any oxygen at all. Not to mention most users of this site are going to have no idea what you're talking about. People get banned for a reason.

        • littlestymaar 2 hours ago

          > Rust crates need namespacing to avoid this and similar problems going forward.

          It hasn't been implemented despite crowd demanding it on HN for years because it won't solve the problem (namespace squatting is going to replace name squatting and tada! you're back to square one with an extra step).

          • Macha 2 hours ago

            I do agree that people will assume xyz/xyz is more authoriative than some-org/xyz, but I think there is benefit to knowing that everything under xyz/* has a single owner. The current approach is to name companion crates like xyz_abc but someone else could come along with xyz_def and it's not immediately obvious that xyz_abc has the same owner as xyz but xyz_def does not.

          • Illniyar an hour ago

            Solved the problem almost completely in npm. Sure you can't search for a name of a company or a project and expect it to be related to the company or project. But there's no way to solve that.

            But once you know a namespace is owned by a company or project, you can know that everything under it is legit. Which solves the vast majority of squatting and impersonation problems.

            Also you know that everything under "node" for example is part of the language.

          • dpcx an hour ago

            php deals with this by using the username/organization name of a repository as the namespace name of packages. At least then you're having to squat something further up the food chain.

  • jeffreyrogers 3 hours ago

    About 10 years ago I implemented cookie based sessions for a project I was working on. I had a terrible time debugging why auth was working in Safari but not Chrome (or vice-versa, can't remember). Turned out that one of the browsers just wouldn't set cookies if they didn't have the right format, and I wasn't doing anything particularly weird, it was a difference of '-' vs '_' if I recall correctly.

    • hombre_fatal 2 hours ago

      IIRC there is (or was?) a difference in case-sensitivity between Safari and Chrome, maybe with the Set-Cookie header? I've run into something before which stopped me from using camelCase as cookie keys.

      Can't seem to find the exact issue from googling it.

  • 0xbadcafebee 2 hours ago

    Did anyone else notice that the HTTP protocol embeds within it ten-thousand different protocols? Browsers and web servers both "add-on" a ton of functionality, which all have specifications and de-facto specifications, and all of it is delivered through the umbrella of basically one generic "HTTP" protocol. You can't have the client specify what version of these ten-thousand non-specifications it is compatible with, and the server can't either. We can't upgrade the "specs" because none of the rest of the clients will understand, and there won't be backwards-compatibility. So we just have this morass of random shit that nobody can agree on and can't fix. And there is no planned obsolescence, so we have to carry forward whatever bad decisions we made in the past.

    • tyleo 2 hours ago

      Tbh I’ve made peace with this world and I might even enjoy it more than the planned obsolescence one.

    • Analemma_ 28 minutes ago

      This is also the fault of shit-tastic middleware boxes which block any protocol they don't understand-- because, hey, it's "more secure" to default-fail, right?-- so every new type of application traffic until the end of time has to be tunneled over HTTP if it wants to work over the real Internet.

  • AshleysBrain 4 hours ago

    Cookies seem to be a big complicated mess, and meanwhile are almost impossible to change for backwards-compatibility reasons. Is this a case to create a new separate mechanism? For example a NewCookie mechanism could be specified instead, and redesigned from the ground-up to work consistently. It could have all the modern security measures built-in, a stricter specification, proper support for unicode, etc.

    • flotwig 4 hours ago

      It's funny that you mention NewCookie, there is actually a deprecated Set-Cookie2 header already: https://stackoverflow.com/q/9462180/3474615

    • RadiozRadioz 4 hours ago

      NewCookie is, roughly, what browser Local Storage is.

      At least for some use cases. Of course, it doesn't directly integrate with headers.

      • graypegg 4 hours ago

        I think one important use case we have for cookies is "Secure; HttpOnly" cookies. Making a token totally inaccessible from JS, but still letting the client handle the session is a use case that localStorage can't help with. (Even if there's a lot of JWTs in localStorage out there.)

        • emn13 2 hours ago

          However, potentially a localStorage (and sessionStorage!) compatible cookie-replacement api might allow for annotating keys with secure and/or HttpOnly bits? Keeping cookies and localStorage in sync is a hassle anyhow when necessary, so having the apis align a little better would be nice. Not to mention that that would have the advantage of partially heading off an inevitable criticism - that users don't want yet another tracking mechanism. After all, we already have localStorage and sessionStorage, and they're server-readable too now, just indirectly.

          On the other hand; the size constraints on storage will be less severe than those on tags in each http request, so perhaps this is being overly clever with risks of accidentally huge payloads suddenly being sent along with each request.

          • xp84 an hour ago

            I think if I were implementing a webapp from scratch today I'd use one single Session ID cookie, store sessions in Redis (etc) indefinitely (they really aren't that big), and for things meant to be stored/accessed on the frontend (e.g. "has dismissed some dumb popup") just use local storage. Dealing with anything to do with cookies is indeed incredibly painful.

          • mdaniel 2 hours ago

            > and they're server-readable too now, just indirectly.

            Could you point me to more reading about this? It's the first time I've heard of it

            • graypegg 2 hours ago

              I think they mean that you can always send back the content of a localstorage property with javascript grabbing the value and sending another request back with it in the body. Since the front end is going to run any javascript the server sends it (disregarding adblockers at least), it's sort of a more indirect version of Set-Cookie.

              • emn13 an hour ago

                Yeah, that's what I meant. There's no built in support; but it's indirectly readable since client-side JS can read it.

    • notatoad 4 hours ago

      i think the main problem there is that cookies are so intractibly tied up with tracking, any attempt to create better cookies now will get shut down by privacy advocates who simply don't want the whole concept to exist.

      we're stuck with cookies because they exist.

      • doctorpangloss 3 hours ago

        Every privacy advocate I know hands over exquisitely detailed private and personal information to Google and/or Apple. It seems unfair to generalize as “privacy advocates” so much as it is people who are anti-ads.

        Being anti-ads is a valid opinion. It has less intellectual cover than pro “privacy” though.

    • bob1029 3 hours ago

      The DOM & URL are the safest places to store client-side state. This doesn't cover all use cases, but it does cover the space of clicking pre-authorized links in emails, etc.

      I spend a solid month chasing ghosts around iOS Safari arbitrarily eating cookies from domains controlled by our customers. I've never seen Google/Twitter/Facebook/etc domains lose session state like this.

      • marumari an hour ago

        Safari is a lot more strict about cookies than Chromium or Firefox, it will straight up drop or ignore (or, occasionally, truncate) cookies that the other two will happily accept.

        I had hoped when writing this article that Google would look at Safari and see that it was always strict about feel comfortable about changing to be the same. But doing so now would unfortunately break too many things for too many users.

    • cruffle_duffle 4 hours ago

      Needs a better name than NewCookie though. Suggestions include SuperCookie, UltraCookie or BetterCookie

      Or to be slightly more serious avoid calling it a cookie and call it something else. Too much baggage surrounding the word cookie.

    • pavel_lishin 4 hours ago

      That feels like that XKCD comic about now there being 15 standards.

  • deathanatos an hour ago

    That is a bit of a minefield, I agree…

    The way around this, as a developer, is URL-safe-base64 encode the value. Then you have a bytes primitive & you can use whatever inner representation your heart desires. But the article does also note that you're not 100% in control, either. (Nor should you be, it is a user agent, after all.)

    I do wish more UAs opted for "obey the standard" over "bytes and an prayer on the wire". Those 400 responses in the screenshots … they're a conforming response. This would have been better if headers had been either UTF-8 from the start (but there are causality problems with that) or ASCII and then permitted to be UTF-8 later (but that could still cause issues since you're making values that were illegal, legal).

    • johnp_ an hour ago

      > URL-safe-base64

      And make sure to specify what exactly you mean by that. base64url-encoding is incompatible with base64+urlencoding in ~3% of cases, which is easily missed during development, but will surely happen in production.

      • Retr0id a minute ago

        Isn't it a lot more than 3%? I don't think I've heard anyone say url-safe-base64 and actually mean urlencode(base64(x))

  • paol 4 hours ago

    Cookie header parsing is a shitshow. The "standards" don't represent what actually exists in the wild, each back-end server and/or library and/or framework accepts something different, and browsers do something else yet.

    If you are in complete control of front-end and back-end it's not a big problem, but as soon as you have to get different stuff to interoperate it gets very stupid very fast.

  • jerf 4 hours ago

    And the article isn't even about the proliferation of attributes cookies have, that browsers honor, and in some cases are just mandatory. I was trying to explain SameSite to a coworker, and scrolled down a bit... https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies#co... wait, cookie prefixes? What the heck are those? The draft appears to date to 2016, but I've been trying to write secure cookie code for longer than that, hadn't heard of it until recently, and I can't really find when they went in to browsers (because there's a lot more drafts than there are implemented drafts and the date doesn't mean much necessarily), replies explaining that welcome.

    Seems like every time I look at cookies they've grown a new wrinkle. They're just a nightmare to keep up with.

    • marcosdumay 4 hours ago

      Well, prefixes are opt-in. You don't have to keep-up with them.

      The only recent large problem with cookies were to changes to avoid CSRF, those were opt-out, but they were also extremely overdue.

      All of the web standards are always gaining new random features. You don't have to keep-up with most of them. They do look like bad abstractions, but maybe it's just the problem that is hard.

    • minitech 4 hours ago

      > https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies#co... wait, cookie prefixes? What the heck are those?

      https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies#co...

      > For more information about cookie prefixes and the current state of browser support, see the Prefixes section of the Set-Cookie reference article.

      https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Se...

      (Cookie prefixes have been widely supported since 2016 and more or less globally supported since 2019.)

      They’re backwards-compatible, so if your cookie need meets the requirements for the `__Host-` prefix, you should use `__Host-`.

      • jerf 4 hours ago

        Did you just... quote my own link back to me?

        May I suggest that you try a little less hard to "get" people?

        Supported they may have been, but like a lot of things, word did not get around. Find me someone who knows every single last web technology. They exist, there's probably entire dozens of them. I'm not one of them and wouldn't dream of claiming to be.

        • minitech 4 hours ago

          I was answering your question about when they went into browsers with a link, and summarizing it in a parenthetical. So much for “replies explaining that welcome”, I guess.

          • olddustytrail 3 hours ago

            It's the first part of your reply they're responding to, where it looks like you've answered their rhetorical question with the exact link they used to illustrate it.

            I'd guess you just screwed up your copy paste and didn't notice.

  • 0x073 4 hours ago

    IT IS a mess, but I never saw json inside a cookie. For json I use local storage or indexeddb.

    • robgibbons 3 hours ago

      In both cases (cookie vs localStorage) you're really just storing your data as a string value, not truly a JSON object, so whether you use a cookie or localStorage is more dependent on the use case.

      If you only ever need the stored data on the client, localStorage is your pick. If you need to pass it back to the server with each request, cookies.

      • 0x073 2 hours ago

        Combine local storage with service worker, so you pass the data to the server if needed. Completely without setting cookies.

        • withinboredom an hour ago

          And if I don't want any javascript to see my values, ever? Or how do you handle CSRF?

      • recursive 3 hours ago

        JSON is explicitly a string serialization format.

        • robgibbons 2 hours ago

          Right, I meant it's not a JavaScript object. It's serialized into a string in any case, no matter which API you're stuffing it into. So it's a bit of a non-sequitur for the parent to suggest that it's somehow weird to store JSON in a cookie, but not in localStorage. It's all just strings.

          • recursive 36 minutes ago

            My point is that there really is no such thing as "truly a JSON object".

    • hinkley 2 hours ago

      Good way to hit max header length issues. Ask me how I know.

      • osrec 2 hours ago

        How?

        • hinkley an hour ago

          Well you see when a front end developer and a backend developer hate each other very much, they do a special hug and nine days later a 400 request header or cookie too large error is born.

          (Seriously though, someone trying to implement breadcrumbs fe-only)

        • mdaniel 2 hours ago

          I'm not them, but that 419 pattern in the logs is burned into my adrenaline response: https://duckduckgo.com/?t=ffab&q=nginx+419+cookie+header&ia=...

    • lambdaone 2 hours ago

      You're really going to hate it when you learn about JSON Web Tokens, which exist exactly to hack past this sort of problem.

    • ricardo81 3 hours ago

      Are they ubiquitous? I'm no client side guru, I know I could look at makeuseof etc, but why not ask some professionals instead.

      • loa_in_ 23 minutes ago

        At the very least localstorage is supported across the board

  • hinkley 4 hours ago

        Firefox accepts five characters which RFC recommends that servers not send:
    
        0x09 (horizontal tab)
        0x20 (spaces)
        0x22 (double quotes)
        0x2C (commas)
        0x5C (backslashes)
    
    I agree with at least some of these. Cookies without commas? Quotes?
    • remram 4 hours ago

      Quotes in the value when quotes delimit the value? Yeah that seems dangerous to me.

      • anamexis 3 hours ago

        Quotes don't delimit the value.

        • pimlottc 3 hours ago

          Per the section 4.1.1 rules quoted in the article, cookie values can be optionally quoted:

          > cookie-value = cookie-octet / ( DQUOTE cookie-octet DQUOTE )

          • anamexis 3 hours ago

            That is true, but in that case they are part of the value itself, they're not doing anything special:

            > Per the grammar above, the cookie-value MAY be wrapped in DQUOTE characters. Note that in this case, the initial and trailing DQUOTE characters are not stripped. They are part of the cookie-value, and will be included in Cookie header fields sent to the server.

            • pimlottc 2 hours ago

              Ah, thanks for the clarification!

  • AlienRobot 3 hours ago

    >everything behaves differently, and it's a miracle that [it] work at all.

    The web in a nutshell.

    • mdaniel 2 hours ago

      Browsers: what it would look like if Postel's Law were somehow made manifest in C++ and also essential to modern life

  • hinkley 2 hours ago

    One of the things I’ve always found frustrating about cookies is that you have to do your own encoding instead of the API doing it for you. I’m sure someone somewhere does but too often I’m doing my own urlencode calls.

    • mdaniel 2 hours ago

      Encoding is at least solvable, but every browser having their own cookie length versus some standard value makes that some nonsense. Kong actually has a plugin to split (and, of course, recombine) cookies just to work around this

      • hinkley 35 minutes ago

        But it's so solvable that I shouldn't have to solve it

  • karaterobot 2 hours ago

    > Handling cookies is a minefield

    I know! You gotta let them cool down first. Learned this the hard way.

  • dekhn 2 hours ago

    Everything about the web is a minefield. It's an exercise in "how many unnecessary layers can we put between users and their content"?

    • flockonus 36 minutes ago

      What are you implicitly comparing it against?

      • dekhn 33 minutes ago

        Native desktop development.

    • whatever1 an hour ago

      I have a solution! I just made one more framework!

  • jeffrallen 4 hours ago

    The article mocks Postel's law, but if the setter of the cookie had been conservative in what they sent, there would have been no need for the article...

    • Sohcahtoa82 4 hours ago

      > The article mocks Postel's law

      As they should. Postel's Law was a terrible idea and has created minefields all over the place.

      Sometimes, those mines aren't just bugs, but create gaping security holes.

      If your client is sending data that doesn't conform to spec, you have a bug, and you need to fix it. It should never be up to the server to figure out what you meant and accept it.

      • emn13 2 hours ago

        And yet the html5 syntax variation survived (with all it's weird now-codified quirks), and the simpler, stricter xhtml died out. I'm not disagreeing with out; it's just that being flexible, even if it's bad for the ecosystem is good for surviving in the ecosystem.

        • plorkyeran 2 hours ago

          There was a lot of pain and suffering along the way to html5, and html5 is the logical end state of postel's law: every possible sequence of bytes is a valid html5 document with a well-defined parsing, so there is no longer any room to be more liberal in what you accept than what the standard permits (at least so far as parsing the document).

          • emn13 an hour ago

            Getting slightly off topic, but I think it's hard to find the right terminology to talk about html's complexities. As you point out, it isn't really a syntax anymore now that literally every sequence is valid. Yet the parsing rules are obviously not as simple as a .* regex. It's syntactically simple, but structurally complex? What's the right term for the complexity represented by how the stack of open elements interacts with self-closing or otherwise special elements?

            Anyhow, I can't say I'm thrilled that some deeply nested subtree of divs for instance might be closed by a open-button tag just because they were themselves part of a button, except when... well, lots of exceptions. It's what we have, I guess.

            It's also not a (fully) solved problem; just earlier this year I had to work around an issue in the chromium html parser that caused IIRC quadratic parsing behavior in select items with many options. That's probably the most widely used parser in the world, and a really inanely simple repro. I wonder whether stuff like that would slip through as often were the parsing rules at all sane. And of course encapsulation of a document-fragment is tricky due to the context-sensitivity of the parsing rules; many valid DOM trees don't have an HTML serialization.

      • rendall 3 hours ago

        I interpret the "liberal" part Postel's Law to mean "do your best to understand it, but that less important than accepting it, possibly returning a helpful error message" and thus "The Go standard library couldn't parse the cookie, leading to cascading failures all the way up the stack" should never be a thing that happens.

      • SilasX 3 hours ago

        You could split the difference with a 397 TOLERATING response, which lets you say "okay I'll handle that for now, but here's what you were supposed to do, and I'll expect that in the future". (j/k it's an April Fool's parody)

        https://pastebin.com/TPj9RwuZ

    • marcosdumay 4 hours ago

      The problem with Postel's law is exactly that the sender is never conservative, and will tend to use any detail that most receivers accept.

  • ralmidani 3 hours ago

    Wait til you have a legacy system and a newer system and need to, among other things:

    - Implement redirects from the old login screen to the new one - Keep sessions in sync - Make sure all internal and external users know how to clear cookies - Remind everyone to update bookmarks on all devices - Troubleshoot edge cases

  • rendall 3 hours ago

    > What servers SHOULD send and what browsers MUST accept are not aligned, a classic example of the tragedy of following Postel's Law.

    "Be liberal in what you accept, and conservative in what you send" is precisely the opposite of "SHOULD send MUST accept". This would be an example of the tragedy of not following Postel's Law.

    If the specs followed Postel's guidance, it would then have read "Servers MUST send x and browsers SHOULD accept y".

  • joshstrange 2 hours ago

    > Apple Support

    Are we sure the website wasn't just broken normally? I kid, a bit, but good lord does Apple _suck_ at websites. Apple Developer and, more often, App Store Connect is broken for no good reason with zero or a confusing error message.

    Note: I'm typing this on a M3 Max MBP (via a Magic Keyboard and Magic Mouse) with an iPhone 16 Pro and iPad Mini (N-1 version) on the desk next to me with an Apple Watch Series 10 on my wrist and AirPods Pro in my pocket. I'm a huge Apple fanboy, but their websites are hot garbage.

    • mdaniel 2 hours ago

      But why wouldn't web pages written in ObjC be just awesome and easy to manage?!

      https://en.wikipedia.org/wiki/WebObjects

      I can still remember when they'd purposefully take down their store page for some godforsaken reason. The mind reels

  • jmull 4 hours ago

    > minefield

    Cookies are a bit of a mess, but if you're going to use them, you can follow the standard and all will be well. Not so much a minefield, but a hammer; you just need to take some care not to hit yourself on the thumb.

    I guess the confusion here is that the browser is taking on the role of the server in setting the cookie value. In doing so it should follow the same rules any server should in setting a cookie value, which don't generally allow for raw JSON (no double-quote! no comma!).

    Either use a decent higher-level API for something like this (which will take care of any necessary encoding/escaping), or learn exactly what low-level encoding/escaping is needed. Pretty much the same thing you face in nearly anything to do with information communication.

    • klysm 3 hours ago

      I don’t understand how that’s not a minefield, it’s easy to go astray?

  • TheRealPomax 4 hours ago

    [comment intended for a different post, but too old to delete]

    • recursive 4 hours ago

      None of this explicitly has anything specifically to do with HTML.

      • TheRealPomax 3 hours ago

        It sure doesn't, that was a comment for a completely different post. I have no idea why HN posted this comment on this article instead of the PHP 8.4 article I thought I was commenting on O_o

        • jdlshore 2 hours ago

          It’s happened enough that I suspect there’s a rarely-seen race condition somewhere in the Arc code that runs HN.