Traps to Developers

(qouteall.fun)

143 points | by qouteall 10 hours ago ago

55 comments

  • mdaniel 3 hours ago

    > A method that returns Optional<T> may return null.

    projects that do this drive me bananas

    If I had the emotional energy, I'd open a JEP for a new @java.lang.NonNullReference and any type annotated with it would be a compiler error to assign null to it

      public interface Alpha {}
      @java.lang.NonNullReference
      public interface Beta {}
    
      Alpha a = null; // ok
      Beta b = null; // compiler error
    
    javac will tolerate this

      Beta b;
      if (Random.randBoolean()) {
        b = getBeta();
      } else {
        b = newBeta();
      }
    
    but I would need to squint at the language specification to see if dead code elimination is a nicety or a formality

      Beta b;
      if (true) {
        b = getBeta();
      } else {
        b = null; // I believe this will be elided and thus technically legal
      }
    • Spivak 3 hours ago

      I question the wisdom of even having Optional<T> in a language with nulls. It would raise some eyebrows if a function in Python returned an Optional type object rather than T | None. You have to do a check either way unless you're doing some cute monad-y stuff.

      • crooked-v 2 hours ago

        There's a lot of quality-of-life stuff enabled by it in Java, since the base language's equivalents to Optional.empty(), Optional.ofNullable(...).orElse(...), etc are painfully verbose by comparison.

      • singron 2 hours ago

        Maybe this is cute monady stuff, but there isn't an equivalent to Optional<Optional<T>> with only null/None. You usually don't directly write that, but you might incidentally instantiate that type when composing generic code, or a container/function won't allow nulls.

  • Someone 5 hours ago

    > Java, C# and JS use UTF-16-like encoding for in-memory string

    That’s incorrect for Java, possibly also for C# and JS.

    In any language where strings are opaque enough types [1], the in-memory representation is an implementation detail. Java has been such a language since release 9 (https://openjdk.org/jeps/254)

    [1] The ‘enough’ is because some languages have fully opaque types, but specify efficiency of some operations and through it, effectively proscribe implementation details. Having a foreign function interface also often means implementation details cannot be changed because doing that would break backwards compatibility.

    > JS use floating point for all numbers. The max accurate integer is 2⁵³−1

    That is incorrect. Much larger integers can be represented exactly, for example 2¹⁰⁰.

    What is true is that 2⁵³−1 is the largest integer n such that n-1, n, and n+1 can be represented exactly in an IEEE double. That, in turn, means n == n-1 and n == n+1 both will evaluate to false, as expected in ‘normal’ arithmetic.

    • debugnik 5 hours ago

      > possibly also for C# and JS

      The representation for C# is very much fixed, as it allows, and very commonly uses, direct access into the string buffer as a ReadOnlySpan<char> or a raw char pointer, where char is the type of UTF-16 codepoints.

      JS could maybe get away with it.

      • hinkley 2 hours ago

        When you have code that works a lot with strings the cost overhead of building an app on iso-latin-1 but encoding as utf-16 can be substantial.

        I think Java moved away from this back around 8, or possibly 9.

    • seangrogg 2 hours ago

      Yeah, I think they didn't mean max "accurate" integer and rather meant max "safe" integer.

    • mikojan an hour ago

      > > Java, C# and JS use UTF-16-like encoding for in-memory string

      >

      > That’s incorrect for Java,

      Maybe so, technically, but if you Base64 encode a string in a language that uses UTF-8 (or another UTF-16 with another endian) and decode it in Java, Java's UTF-16 representation will be the problem you will be dealing with.

    • scarface_74 4 hours ago

      I started to say something about C# strings and then I remembered the clusterfuck when it came to Windows development and strings and depending on which API you call, a string is represented by one of a dozen different ways.

      https://stackoverflow.com/questions/689211/interop-sending-s...

  • OptionOfT 3 hours ago

    > Some routers and firewall silently kill idle TCP connections without telling application. Some code (like HTTP client libraries, database clients) keep a pool of TCP connections for reuse, which can be silently invalidated. To solve it you can configure system TCP keepalive. For HTTP you can use Connection: keep-alive Keep-Alive: timeout=30, max=1000 header.

    Once a TCP connection has been established there is no state on routers in between the 2 ends of the connection. The issue here is firewalls / NAT entries timing out. And indeed, no RSTs are sent.

    We had the issue in K8s with the conntrack module set too low.

    Now, you can try to put in an HTTP Keep-Alive, but that will not help you. The HTTP Keep-Alive is merely for connection re-use at the HTTP level, i.e. it doesn't close the connection: https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/...

    An HTTP Keep-Alive does not generate any packages, it merely postpones the close.

    A TCP Keep-Alive generates packages which resets the timers.

  • andunie 8 hours ago

    That's a nice compendium of tips and useful information.

    I wonder if anyone can learn from this. I feel like I only understood what I already knew, or at least was very close to knowing. That's the same thing that happens with teaching manuals about any topic: they're organized in a way that makes sense and it's easy for people who already know the topics, but often very bad at teaching the same topics to an audience that doesn't know anything.

    • skydhash 8 hours ago

      > with teaching manuals about any topic: they're organized in a way that makes sense and it's easy for people who already know the topic

      I think that the reason for a manual existence. To have a written record so we don't have to trust our memory. This is what most unix manuals are. You already know what the software can do, you just need to remember the specificity on how to get something done.

      > often very bad at teaching the same topics to an audience that doesn't know anything.

      What you need then is a tutorial (beginner seeking to learn) or a guide (beginner/intermediate seeking to do). Manuals in this case only serve to have better questions (Now you know what you don't know).

  • jmull 5 hours ago

    This looks like not so much traps, but a list of things the author has learned.

    Much of it would only apply in certain relatively narrow contexts, but the contexts aren't necessarily mentioned.

    Some of it appears to be just wrong.

    I guess I'm saying: I would not take this literally, but as something almost like a stream-of-consciousness.

  • nayuki 2 hours ago

    Largely a good listicle. Some feedback:

    > Unicode unification. Different characters in different language use the same code point. Different languages' font variants render the same code point differently. 語

    This isn't a trap. The given example character means the same thing in Chinese and Japanese, and the Japanese version was imported from China. People from both languages recognize both font variants as the same conceptual character.

    The author is making it sound like the letter 'A' in English should have a different code point than an 'A' in French. Or that a lowercase 'a' with the top tail should be a different character than a lowercase 'a' without the top tail.

    Anyway, this is discussed at length in https://en.wikipedia.org/wiki/Han_unification

    > There is a negative zero -0.0 which is different to normal zero. The negative zero equals zero when using floating point comparision. Normal zero is treated as "positive zero".

    And there are two ways to distinguish negative zero from normal zero: By their integer bit patterns, or by the fact that 1.0/-0.0 == -Inf vs. 1.0/0.0 == +Inf.

    > It's recommended to configure the server's time zone as UTC.

    Big yes. I use UTC for servers, logs, photos, and anything that is worth archiving and timestamping properly. Local time is only for colloquial use.

    > For integer (low + high) / 2 may overflow. A safer way is low + (high - low) / 2

    Yes, but if low and high could be negative numbers, then you've just shifted the overflow to a different range. This matters for general binary search over an integer range, as opposed to unsigned binary search over an array.

    > C/C++

    I'm going to throw in one of my lists of pitfalls - just using integer types and arithmetic correctly in C/C++ is a massive developer trap. That's like the most basic thing in programming. https://www.nayuki.io/page/summary-of-c-cpp-integer-rules

    > Rebase can rewrite history

    "Can" is a weasel word; rebase does nothing but rewrite history.

  • skobes 8 hours ago

    The first "trap" on the page says "min-width: auto makes min width determined by content", but this is false outside of flex/grid.

    From MDN: "For block boxes, inline boxes, inline blocks, and all table layout boxes auto resolves to 0."

    https://developer.mozilla.org/en-US/docs/Web/CSS/min-width

    • jfengel 6 hours ago

      CSS cascade for text properties more or less makes sense.

      I have been unable to comprehend CSS layout from any perspective: page designer, implementer, user, anything. It must have someone in mind but I have no idea who I that is.

      • chrisweekly 6 hours ago

        https://every-layout.dev has by far the best explanations and coherent usage of CSS I've encountered since I started doing webdev for a living in 1998.

        • lemonberry 6 minutes ago

          Every Layout changed how I look at and do CSS. Great resource with a good philosophy behind it: CubeCSS. It really made CSS fun for me again.

      • skobes 6 hours ago

        Layout is more bazaar than cathedral. It has had many ideas mixed in by different contributors over decades.

    • diggan 8 hours ago

      I guess the first trap should really be: "You cannot read any CSS property in isolation, as just like what the name implies, defaults and what values end up doing cascades through all the rules your document ends up using"

  • FFFXXX 7 hours ago

    The part about C# volatile accesses using release-acquire ordering seems to be wrong if I read the C# docs correctly.

    "There is no guarantee of a single total ordering of volatile writes as seen from all threads of execution"

    https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...

    • charleslmunger 7 hours ago

      >A volatile write operation prevents earlier memory operations on the thread from being reordered to occur after the volatile write. A volatile read operation prevents later memory operations on the thread from being reordered to occur before the volatile read

      Looks like release/acquire to me? A total ordering would be sequential consistency.

      • FFFXXX 6 hours ago

        I think you are quoting from https://learn.microsoft.com/en-us/dotnet/api/system.threadin...

        "In C#, using the volatile modifier on a field guarantees that every access to that field is a volatile memory operation"

        This makes it sound like you are right and the volatile keyword has the same behaviour as the Volatile class which explicitly says it has acquire-release ordering.

        But that seems to contradict "The volatile keyword doesn't provide atomicity for operations other than assignment, doesn't prevent race conditions, and doesn't provide ordering guarantees for other memory operations." from the volatile keyword documentation?

        • charleslmunger 2 hours ago

          I too interpretat those docs as contradictory, and I wonder if, like how Java 5 strengthened volatile semantics, this happened at some point in C# too and the docs weren't updated? Either way the specification, which the docs say is definitive, says it's acquire/release.

          https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...

          "When a field_declaration includes a volatile modifier, the fields introduced by that declaration are volatile fields. [...] For volatile fields, such reordering optimizations are restricted:

              A read of a volatile field is called a volatile read. A volatile read has “acquire semantics”; that is, it is guaranteed to occur prior to any references to memory that occur after it in the instruction sequence.
          
              A write of a volatile field is called a volatile write. A volatile write has “release semantics”; that is, it is guaranteed to happen after any memory references prior to the write instruction in the instruction sequence."
        • judofyr 3 hours ago

          Acquire-release ordering provides ordering guarantees for all memory operations. If an acquire observes a releases, the thread is also guaranteed to see all the previous writes done by the other thread - regardless of the atomicity of those writes. (There still can't be any other data races though.)

          This volatile keyword appears to only consider that specific memory location whereas the Volatile class seem to implement acquire-release.

    • dataflow 6 hours ago

      Somewhat off topic, but what is a realistic example of where you need atomics with sequential consistency? Like, what useful data structure or pattern requires it? I feel like I've seen every other ordering except that one (and consume) in real world code.

      • judofyr 3 hours ago

        A mutex would be the most trivial example. I don't believe that is possible to implement, in the general case, with only acquire-release.

        Sequential consistency mostly become relevant when you have more than two threads interacting with both reads and writes. However, if you only have single-consumer (i.e. only one thread reading) or single-producer (i.e. only one thread writing) then the acquire-release semantics ends up becoming sequential since the single-consumer/producer implicitly enforces a sequential ordering. I can potentially see some multi-producer multi-consumer queues lock-free queues needing sequential atomics.

        I think it's rare to see atomics with sequential consistency in practice since you typically either choose (1) a mutex to simplify the code at the expense of locking or (2) acquire-release (or weaker) to minimize the synchronization.

  • ngruhn 8 hours ago

    A recent trap for me:

    Regex semantics is subtly different across languages. E.g. a{,3} matches between 0 and 3 "a" characters in Python. In JavaScript it matches the literal string "a{,3}".

    • skydhash 8 hours ago

      Regex is more a technique than an actual specification. It would be best to find the time to go and read an introductory book about Theory of Computation where they explain the underlying mechanism.

      • ryandv 7 hours ago

        > Theory of Computation

        Computer science? Seriously? What a fucking waste of time. Better just take a bootcamp and get the LLM to write your regexes for you. Cut four years into eight weeks.

        Time to get with the times, gramps. The singularity is near.

        • skydhash 7 hours ago

          It's half a chapter in most books I know. Or a subset of this 1h MIT videos [0], but the instructor also explains Finite Automata which is the basic mechanism that does all the stuff.

          [0]: https://www.youtube.com/watch?v=9syvZr-9xwk

        • jraph 7 hours ago

          I'll assume sarcasm (from your comment history) but for people actually believing this first degree: good luck debugging an incorrect regex if you haven't practiced regexes. Especially if it was generated by an llm.

    • danhau 8 hours ago

      I always use regex101 to develop my regexes. It allows you to switch between different engines.

    • PhilipRoman 8 hours ago

      Honorable mention to [a-z], gotta be my favorite trap

      • dataflow 6 hours ago

        What's the trap for this one? I can't think of any engine that parses this to mean anything other than the letters a through z.

        • PhilipRoman 5 hours ago

          In some common implementations if $LANG is set to certain values, it will fail to match some ASCII letters. This is because not all latin character using languages put Z last in the alphabet.

          Try this (you probably need to enable and generate the locale first)

              echo y | LANG=lv_LV.UTF-8 grep '[a-z]'
          
          Locales in general should be considered a "trap", just look at Windows CSV separator handling, etc.
          • dataflow a few seconds ago

            [delayed]

          • 1718627440 21 minutes ago

            Not in general, but using locales for something different than affecting presentation.

        • dpkirchner 6 hours ago

          It depends on its use, ultimately, but if your goal is to find a string of letters (a common use IMO), you'll want to use something like \p{L} to ensure you don't miss non-ASCII characters.

          eta: fixed regex, I had typed \L, shared from my faulty memory.

        • accoil 5 hours ago

          [A-z] though is a fun one though as it includes a few extra symbols between upper and lowercase.

          • 1718627440 19 minutes ago

            Does it? I thought Regex are defined on character classes not on numeric ASCII values. What would a Regex do on a different encoding then?

  • koromak 4 hours ago

    Does anyone truly understand all the little edge cases with CSS?

    I've write tons and tons of CSS, have done for a decade. I don't sit and think about the exact interactions, I just know a couple things that might work if I'm getting something unexpected.

    I don't really see it possible to commit that to memory, unless I literally start working on an interpreter myself.

    • yurishimo 3 hours ago

      I think there can be a different way to think about CSS that can help with that feeling of never understanding it all. Recently I’ve heard people influential in the CSS world describe it as a “suggestion” to the browser. The browser has its own styles, the user might have some custom stylesheet on top of the browser’s version, extensions, etc etc and at some point CSS is really more a long list of “suggestions” about how the site should look.

      If you embrace that idea to the fullest, you can create some interesting designs/patterns that can be more resilient. The “downside” is that this way of writing css will likely made the pixel perfect head of the marketing department hate you unless they also write code.

      I think it’s also okay to say that some ways of writing css just aren’t relevant anymore. A good parallel in mind is building construction and general carpentry. These days, a quick 2x4 stud wall or insulated concrete forms is fast, cheap, and standardized around the world. However, many craftspeople still exist that will create beautiful joinery for what is ultimately a simple thing, but we can appreciate that art standalone. With CSS, I don’t suspect we will ever need to go back to floats or crazy background images or whatever but it’s nice that those tools are still there for not only the sake of back compat, but also as a way to tinker and “craft” something bespoke for a special project or just because you like it. Education will eventually catch up and grid and flexbox will keep gaining popularity until we decide that it’s too complicated and come up with some new algorithm. That can all be true though and you can bring value as a developer without knowing every single aspect to the public API.

      • 1718627440 22 minutes ago

        But you need to, you know, actually float something in a text. I think to do it with flexbox/grid you need JS that calculates heights and than manually splits the text into boxes with heights, so essentially you are doing rendering.

        Also is there another way to position boxes side-by-side in an inline context without float?

  • upghost 3 hours ago

    > Unset variables. If DIR is unset, rm -rf $DIR/ becomes rm -rf /. Using set -u can make bash error when encountering unset variable.

    sweet mercy :O

    Someone call the Inquisition

    • AnimalMuppet 2 hours ago

      Instead, say

        rm -rf $DIR
      
      That is, skip the trailing slash. Then if $DIR is not set, it becomes an invalid command, because no file names were supplied.
      • Terr_ an hour ago

        Better to make the requirement explicit, instead of relying on the argument-parsing details of rm or some other command:

            # Default message
            $ rm -rf "${DIR:?}"
            bash: DIR: parameter null or not set
        
            # Custom message
            $ rm -rf "${DIR:?It is not set OMG}"
            bash: DIR: It is not set OMG
  • QuadmasterXLII 8 hours ago

    CSS and C++ both have the “pick a subset and enforce that, or suffer” nature. On my to-do list: make a github action that requires manual override to merge any pull request with a css attribute not already present

    • dschuessler 3 hours ago

      I am unsure how this is supposed to work for CSS. To my knowledge, most CSS properties cannot be substituted for each other. If the subset to be enforced is "CSS properties already present", what is a developer supposed to do if their CSS property is not already present? Change the design?

      • QuadmasterXLII 3 hours ago

        Well, (like C++) new css attributes are constantly added. This means you constantly have to choose between the old way or the new way: either is fine, but “pick old or new at random on a per pull request basis” isn’t.

        • dschuessler 3 hours ago

          You seem to assume that old CSS properties can be substituted for new ones. But as I said, to my knowledge this isn’t possible in most cases. Can you give an example of two CSS properties where 'either is fine, but only one should be used'?

          Or do you mean something else altogether by 'CSS attributes'?

          • QuadmasterXLII 2 hours ago

            The specific case that inspired this comment was a random mix of margin and gap

  • bradfitz 5 hours ago

    > Golang use UTF-8 for in-memory string.

    Nope. It’s just bytes with no encoding.

    https://go.dev/blog/strings

    • ivanjermakov 4 hours ago

      There is no such thing as "just bytes" when it comes to Unicode. UTF-8 is a way to represent Unicode codepoints in binary.

      But I agree that author's statement is wrong. Go stings are equivalent to byte slices.