The best worst hack that saved our bacon

(jeffersonheard.ghost.io)

89 points | by mooreds 4 days ago ago

51 comments

  • CodesInChaos 7 hours ago

    I'd expect negative integer ids in an API to break even more integrations than unexpectedly large integers.

    Though I guess that likelyhood is influenced by the choice of protocol. For example when using protobuf the client code generated from the specification file will use a 32-bit integer, if that's how it was defined. While in JSON I'd generally assume it's a positive integer smaller than 2^53.

    • swiftcoder 29 minutes ago

      You don’t have to expose the negative to the customer - convert it to unsigned at the API layer, and bobs your uncle

    • OptionOfT 7 hours ago

      Right. You can have the best documentation:

      If they expose them as string and mention they're opaque? Then customers who parse them to uint will get bugs and be unhappy.

      Did they expose them as ints? Customers who used uints will be unhappy.

      At `jobs[-2]` the front-end parsed the ids (exposed as strings, but ints under the cover).

      The backend left them alone.

      That caused some issues when building out shared libraries.

      • Demiurge 5 hours ago

        What kind of API specifies that your number is int, uint, or bigint? According to a quick search, the formats for APIs are: JSON ~80%, XML ~15%, ~5% other.

        • lazide 5 hours ago

          Anyone storing them in a DB, or using them in internal fields will likely have a surprise on their hands. Unless they store them as opaque strings anyway, which is the saner thing to do in these situations anyway.

      • layer8 5 hours ago

        If you expose them as strings, you might as well convert them to unsigned at the conversion point.

  • sheepscreek 25 minutes ago

    This is engineering at its finest. Working within tight constraints to find solutions that minimize impact. An equally important part of the “solution” is communication - to the leadership, departments and customers. Start early, communicate often and you will almost always come out ahead, even if mistakes are made.

  • estimator7292 12 minutes ago

    As my last job was winding down (much to the disbelief and utter denial of the CEO) we'd ran out of money for Unity licenses and ran out of staff to use Unity. CEO decided that we absolutely must have a Unity demo that worked with the slightly newer generation of hardware I was wrapping up. Being the only programmer left, it was of course my problem to figure out. Oh and also this has to be ready for a show next week, so chop-chop.

    I ended up decompiling some android APKs our last Unity dev had built like eight months prior. I figured out how to extract our device driver library, then painstakingly rewrote the entire library to support new hardware while also maintaining a compatible ABI and stuffed it all back into the APK. I think I also had to forge some keys or something? It was a fucking mess. Anyway, that was the last work I ever did for him because he didn't pay me for about two months after that, and I quit the moment he gave me the wages he owed me.

    He's only got one employee and zero customers, but hey his stupid demo worked for all that mattered.

  • zeograd 10 hours ago

    I often see code relying on the increasing property of primary key (keeping track of processed vs unprocessed by the last processed pk only).

    This wrap into negative domain would wreck havoc for sure.

    • CodesInChaos 7 hours ago

      You generally can't rely on strict monotonicity of primary keys, since the order in which transactions commit isn't necessarily the order in which the ids were generated. But I have relied on primary keys being "monotonic enough" to sort output by creation time for display purposes.

      • OptionOfT 7 hours ago

        I've worked on invoicing software where we had to introduce a public, always +1 counter to ensure there are no gaps between invoices. Not +2, not +5.

        That way you couldn't make them disappear.

        • chiph 4 hours ago

          In the days when you used custom printed forms that had a number printed on them by the printer - when you loaded a new box of paper into your printer you had to input the first form number into the system so they'd match.

          If you opened boxes in "whatever" order you'd have invoice numbers that would run contiguous for 150 or so counts (the number of forms in the box), then skip to the next multiple of 150 to correspond to when the next (or previous!) box had been used.

        • stefs 6 hours ago

          That mustn't be the primary key, though, but a serial that counts (and is unique) per-customer.

          • OptionOfT 4 hours ago

            This was before the SaaS days.

            On-prem, single company who issued invoices to customers.

            When there was an audit the government could ask to see invoices in a certain range. If some of them were missing, what does that mean? Paid under the table?

            My wife worked at a place where they did manual PDFs, but there they had a tool to change properties of a PDF to change the creation time / last editing time, for when 'modifications' were needed.

            And this reminds me of the other post here where some people assume cash means shady. Definitely the case there.

    • veyh 9 hours ago

      Well, I'd imagine that before returning the value through their API they could just check that if the number is negative, then add 2^32 to it, which would make it look like an unsigned 32 bit integer.

      • conradfr 3 hours ago

        But isn't that exactly what they were trying to not do as their problem was the api users and not their internal use?

        • veyh 2 hours ago

          It was definitely a problem with their database but I suppose it's possible that the customers were also expecting 32 bit signed ints.

      • layer8 5 hours ago

        In most languages that support differently sized integer types and/or unsigned integer types, you wouldn’t have to check, but can just apply the appropriate modulo or bit operation on all values.

  • TheCowboy 11 hours ago

    > No-one really likes engineering war stories

    Is that really true? I did keep reading the entire piece. I think they're often interesting and can contain nuggets of wisdom or insight. Or sometimes they're just funny. When I meet someone who worked on something interesting, I often start trying to pry stories like this post out of them.

    • shermantanktop 10 hours ago

      Everyone likes engineering war stories!!! Never heard of an engineer who didn’t.

    • yk 10 hours ago

      No, but it is amazing first sentence. Everybody goes, this story is specifically for me, I'm very special.

      • eCa 9 hours ago

        I read the piece (and enjoyed it) despite the first sentence. I’ve become increasingly sensitive to this kind of fluff.

        It’s not a hook, it’s bad read-bait.

        • tclancy 5 hours ago

          Well then you are very special.

          Introverts hate this one weird trick!

        • bobthebuilders 8 hours ago

          Half the time I read the stories they're just a thinly disguised ad for some flavor the day SaaS, so at least in this instance the hook was somewhat useful. Now if everyone uses this to shill their SaaS, then maybe not.

    • derekcheng08 4 hours ago

      LOL came here to say this exactly. Everyone LOVES war stories in my experience :)

  • Demiurge 6 hours ago

    I don't understand, what was the issue with changing the column type from `int` to `bigint`? What does exposing the IDs have to do with how large those ints can be? This seems like a backend issue, if we're talking about HTTP/REST APIs. Now, if we're talking compiled C style APIs, then yes, obviously widening the types will cause issues. This is very important context that is missing from this article.

    • icedchai 5 hours ago

      The issue was probably database migration time. I was once at a startup that had close over 1 billion+ rows in MySQL. We were approaching the `int` limit in another year or so. Many tables would need to be migrated due to foreign key constraints. Migrating one of the tables required significant downtime (6 to 8 hours, IIRC) due to slow spinning disks. Some servers didn't have enough space to rebuild the tables, so we'd want to add disks just in case. There were several servers.

      A few "alter table" commands cascades to an operational PITA.

  • sorrythanks 10 hours ago

    maybe i'm too far gone, but this doesn't even feel hacky to me. the key needs to be a unique number, -1 and 1 are two different numbers.

    • slipperybeluga 5 hours ago

      Yeah but how many of those customers were relying on the key not being a negative number?

  • dusted 7 hours ago

    > No-one really likes engineering war stories,

    I love engineering war stories

  • thewisenerd 12 hours ago

    can't wait for solutions of a similar nature around 2038-01-19

    a free 68 more years!

    (hopefully nobody optimized for the 1 signed bit when allocating memory tho)

  • IshKebab 8 hours ago

    Hard to believe that all their customers had written their code to work with signed IDs though.

    Honestly I would expect that to break more users code (and in weirder ways) than just changing the type. It's unclear from the story how the type was exposed though.

    • gcanyon 7 hours ago

      Yeah, this was my immediate thought as well, but if the spec for the API says signed int, then at least you're defensible: you haven't broken the letter of the spec, even if you're pounding on the spirit of the spec pretty hard. You have a fairly reasonable likelihood that most/all of your customers have implemented to your spec, and therefore any negative consequences are down to secondary effects of how they handle the negative values, not directly because of failure to be able to store them.

      That said, to your point, there was almost certainly someone comparing IDs to determine recency, and during the transition from large-positive to large-negative, that would absolutely cause havoc.

      I'd be curious if their API spec actually said anywhere that the IDs increased consistently.

    • hn92726819 5 hours ago

      I'd believe it. Not sure when this is, but if it's a few years old and business software, they could probably asume everyone uses java, which doesn't even have unsigned integers.

      • swsieber 5 hours ago

        With $MY_JOB in java, that was my assumption

      • IshKebab 3 hours ago

        Right but just because it's `int id` doesn't mean all code that uses it will still work when it's negative.

    • crazygringo 8 hours ago

      Came here to say exactly this. Programming languages usually default to signed, but if you're storing these things in databases it's common to explicitly choose unsigned, since ID's are virtually always unsigned and it gives you twice the space until you run out.

      Like, instead of using negative primary keys, they could have also just have converted to an unsigned int32. I would assume both of those would break a bunch of customer implementations though.

      • sgarland 8 hours ago

        Postgres doesn’t have unsigned column types out of the box. There’s an extension that enables it, but you’d have to know about that (which you should, if you’re managing a DB, but I digress).

        MySQL does have unsigned ints out of the box, FWIW.

  • 1egg0myegg0 7 hours ago

    Whoever gets that magical -2,147,483,648 is going to be really surprised that things keep working

  • VladVladikoff 7 hours ago

    I wonder how many Unix timestamps are going to wrap around to negative in 2032?

    • OptionOfT 7 hours ago

      None!

      But 2038 is gonna be awesome!

      • 6stringmerc 7 hours ago

        I applaud your enthusiasm for life in the wasteland post US collapse / Balkanization and will meet you at the last functioning terminal in 2042!

  • zmj 2 hours ago

    If you're not doing math with it, it's a string.

  • nikanj 10 hours ago

    I wonder how many API users needed the attribute to be an integer (instead of just treating it as an opaque handle string), but didn't mind the integer turning negative

    • cesaref 9 hours ago

      I think the point is that the API doesn't specify that the returned integers are positive, or are monotonically increasing, then it's fine for the service to return any unique integer.

      If a client application makes an assumption about this, then their engineers will accept this as being their bad and will fix it.

      I'd defend this as being pragmatic - minimising disruption to clients instead of the more 'correct' solution of changing the API. I'm hoping that they managed to roll out the new API update alongside the old one and avoid a 'big bang' API change with this. Sometimes this isn't possible, but it's great when that works out.

      • CodesInChaos 7 hours ago

        I'm far more likely to assume that an integer-id I get from an API is non-negative or even positive than to assume that they're always smaller than 2^31. And I'd be far more likely to blame the API provider for violating the former assumption.

    • Kwpolska 8 hours ago

      Probably none needed it to be an integer. At the same time, if the API contract says {id: integer, name: string}, then you are likely to have developers, especially in statically-typed languages, that will create a class with an int32 field, and tell the JSON parsing library to create instances of that class when deserializing the API response.

  • valicord 8 hours ago

    I don't get it. How would switching to bigint break the existing integrations?

    • masklinn 8 hours ago

      Any call from a typed langage distinguishing between 32b and 64b integers (that being most popular typed languages I reckon) would break if it had assumed / used the smaller of the two.

      TBF using the negative range could also break callers distinguishing between signed and unsigned if they’d used the latter on their side depending how the API was documented.

    • Kwpolska 8 hours ago

      If the existing code was using int32, a switch to anything larger would cause integer overflows or JSON parsing errors in languages with strongly-typed fixed-width integer types.