Using floating point numbers as hash keys (2017)

(readafterwrite.wordpress.com)

16 points | by jstrieb 8 days ago ago

4 comments

  • baobun 2 days ago

    > it’s hard or impossible to define hash functions that map approximately equal keys to identical hash values. Why? Let’s take a look at what happens with the common epsilon comparison, for example: two numbers a, b are considered equal if |a – b| < ε. With this definition of equality all numbers between -ε/2; and ε/2 are considered equal and therefore must have the same hash value h. But the numbers between 0 and ε are also equal to each other, so their hash value must be h as well.

    > If we continue in this manner we see that all all numbers must have the same hash value h regardless of the choice of ε. The idea of approximate comparison is unfortunately hard to reconcile with the non-approximate nature of hash functions.

    I think there is at least one non-sequitur in there? The only thing you prove is that directly modeling hash keys on top of that notion of equality is not useful / the model is weak. I don't think equality is typically associative with epsilon comparison.

    (The license of this comment forbids it from being used as support in favor of floats as hash keys)

  • Terr_ 2 days ago

    This is calling up dim old memories of Java (which I haven't used in years) and boxing primitive float values into Float objects which provide their own implementation of equals() and hashCode().

    AFAICT the pre-hash values come from this method [0] which returns an integer which is a 1:1 representation of the float bytes except that NaNs are all collapsed to one canonical form. So at least at this phase, it doesn't quantize any virtually-equal floats together.

    Skimming an implementation of HashMap [1], I didn't notice any obvious "do something special for Floats" code.

    [0] https://docs.oracle.com/en/java/javase/17/docs/api/java.base...

    [1] https://github.com/openjdk/jdk/blob/master/src/java.base/sha...

  • anematode 2 days ago

    JavaScript has a whole definition of equality for this particular case called SameValueZero: https://tc39.es/ecma262/multipage/abstract-operations.html#s... Everything is compared bitwise equal (so to speak), except 0 == -0.

        > new Map().set(-0, -0).set(0, 0).set(NaN, NaN)
        Map { 0 → 0, NaN → NaN }
    
    I don't really understand the reason for this instead of Object.is equality (aka SameValue), which would distinguish -0 and 0.
  • advael 2 days ago

    Dang, the author never got to C++, and I was curious. Guess I'll have to dig into that one myself