Show HN: Super Simple CRC32 Implementation

(github.com)

14 points | by dch82 7 days ago ago

16 comments

  • rurban 3 days ago

    A normal table-based crc32. Since we don't see the generator, we cannot see the polynom. You usually put the generator in the code

  • jart 3 days ago

    If you don't want a hard coded table, you could always do this.

        unsigned kCastagnoli[256];
    
        void InitializeCrc32(unsigned table[256], unsigned polynomial) {
          unsigned d, i, r;
          for (d = 0; d < 256; ++d) {
            r = d;
            for (i = 0; i < 8; ++i)
              r = r >> 1 ^ (r & 1 ? polynomial : 0);
            table[d] = r;
          }
        }
    
        unsigned Castagnoli(unsigned h, unsigned long w, long n) {
          long i;
          static int once;
          if (!once) {
            InitializeCrc32(kCastagnoli, 0x82f63b78);
            once = 1;
          }
          for (i = 0; i < n; ++i) {
            h = h >> 8 ^ kCastagnoli[(h & 255) ^ (w & 255)];
            w >>= 8;
          }
          return h;
        }
    
    That does the same thing as the crc32 instructions in x86.
    • epcoa 3 days ago

      I’d probably just go ahead on the original implementation and this and use uint32_t. The submitters because you might as well be more cache friendly on 64 bit systems, and here unsigned only guarantees 16 bits. People still use AVRs and the like where sizeof unsigned == 2. Granted, you’d probably use a 16 bit table there, but the portability costs nothing and gains clarity.

      This version has the caveat that it is technically not thread safe.

      • jart 3 days ago

        What if you've got a Symbolics 3600 which has 36 bit words?

        Your suggestion will break on that.

        We clearly want uint_least32_t.

        • epcoa 2 days ago

          Fair enough (though a Unisys 2200 might be a stronger example, people still use them, but neither support C99 AFAIK). But by “break” here, it's only fair to point out, if such a compiler even existed, it would have the property of definitely failing with an error at compile time for uint32_t not existing, it’s not going to compile and overflow at runtime. (And yes you will practically get a warning with the pasted code, and shame on anyone for not using -Werror, but pedantically this isn't even guaranteed sadly enough). They are completely different classes of portability defects and only one is a correctness issue.

          Setting aside whether there is even a C99 compiler for anything Symbolics, or anything where uint32_t doesn’t exist (already assuming C99 or above here). Is there? Meanwhile there are a few existing supported gcc architectures where sizeof int == 2.

          Pick what level of pragmatism you prefer I guess. I prefer avoiding the standard integer types unless I can comfortably fit in their guaranteed sizes. I don’t really care about obsolete machines that will never have a C99 compiler outside a novelty.

          • jart 2 days ago

            > I don’t really care about obsolete machines that will never have a C99 compiler outside a novelty.

            And you know what? I don't care about people who use C on 16-bit computers, unless I'm being paid to.

            I've written plenty of assembly they can consume, like https://justine.lol/sectorlisp2/

            • epcoa 2 days ago

              Not sure why you're so triggered. I didn't even say you were wrong or anything. I originally thought the more pressing issue is filling half a table up with 0s. It's some idle commentary on a piece of code in a comment, relax?

  • Someone 7 days ago

    I suggest you write a few test cases to verify that it works reliably. I have my doubts about that, given this code fragment:

        // stop hashing when there is nothing left to hash
        while (scanf("%d", &ch) != EOF) {
            // get the current byte
            ch = getchar();
    • dch82 7 days ago

      I replaced that code with

        // loop until an EOF is read
        while ((ch = getchar()), ch != EOF) {
      
      Would this be better?
    • dch82 7 days ago

      How would you suggest I replace that? I am a C novice but I have heard of the buffer overflows caused by scanf.

  • palsecam 3 days ago

    FTR, an implementation in “low-level” JavaScript:

      /** Precomputed CRC-32 lookup table for half-bytes (aka “nibbles”).
       * Trade more compute time for less memory and less code to transmit.
       * @see https://create.stephan-brumme.com/crc32/#half-byte
       */
      const CRC32_NIBBLE_TABLE = new Uint32Array([
                 0, 0x1DB71064, 0x3B6E20C8, 0x26D930AC, 0x76DC4190, 0x6B6B51F4, 0x4DB26158, 0x5005713C,
        0xEDB88320, 0xF00F9344, 0xD6D6A3E8, 0xCB61B38C, 0x9B64C2B0, 0x86D3D2D4, 0xA00AE278, 0xBDBDF21C
      ]);
    
      /** @return {number} CRC-32 (polynomial 0x04C11DB7) of the input data.
       * @param {!BufferSource} data  The input data.
       * @param {number=} previousValue  The previous CRC value, if resuming a computation.
       * @see https://en.wikipedia.org/wiki/Cyclic_redundancy_check
       */
      function crc32(data, previousValue = 0) {
        const bytes = ArrayBuffer.isView(data)
          ? new Uint8Array(data.buffer, data.byteOffset, data.byteLength)
          : new Uint8Array(data);
        let crc = ~previousValue;
    
        for (let i = 0; i < bytes.length; i++) {
          crc = (crc >>> 4) ^ CRC32_NIBBLE_TABLE[0x0f & (crc ^ bytes[i])];
          crc = (crc >>> 4) ^ CRC32_NIBBLE_TABLE[0x0f & (crc ^ (bytes[i] >> 4))];
        }
        return ~crc;
      }
    
    From https://GitHub.com/PaulCapron/pwa2uwp/blob/master/src/zip.js
  • fargle 3 days ago

    this simply matches the `crc_reflected()` in section 18 of "A Painless Guide to CRC Error Detection Algorithms" [1] with hardcoded init and final xor's to match one specific CRC-32 spec. nothing wrong with that, but it isn't original at all either.

    i recommend the "painless guide" for anyone constructing CRC algorithms in software. it breaks down the entire algorithm, including various trade-offs and choices, as well as for different polynomials and other parameters.

    then you also have the catalogs of parameters [2] and [3]

    [1] https://www.zlib.net/crc_v3.txt [2] https://reveng.sourceforge.io/crc-catalogue/ [3] https://users.ece.cmu.edu/~koopman/crc/

  • tzs 3 days ago

    OT: I copied the error detection code that AcuRite uses in their temperature sensors for use in my rain gauge. (I got the code by taking a look at the decoder for AcuRite temperature sensors that is included in rtl_433).

    At the time I thought it was a CRC, but later I realized that it isn't. I've been trying to find out what the name is for that kind of code but have failed. I wonder if anyone here happens to know?

    Here is the code:

      byte check_code(byte const message[], unsigned nbytes, byte gen, byte key)
      {
        byte sum = 0;
        for (unsigned k = 0; k < nbytes; ++k) {
            byte data = message[k];
            for (int i = 7; i >= 0; --i) {
                if ((data >> i) & 1)
                    sum ^= key;
    
                if (key & 1)
                    key = (key >> 1) ^ gen;
                else
                    key = (key >> 1);
            }
        }
        return sum;
      }
    
    I thought it was a CRC because I saw it shifting and saw it xoring when it shifted out a 1. That's quite reminiscent of the method of CRC computation that treats bit strings as representing polynomials in GF(2) and computes the CRC by doing polynomial division essentially the same way we would do it by hand.

    But when computing a CRC that way what you xor the data with is a constant (the representation of the generator polynomial). In the algorithm above the constant is not xor'ed with the data. In fact nothing is xor'ed with the data.

    The constant, named 'gen' which is another reason I thought at first it was a CRC, is actually the feedback polynomial for a Galois linear feedback shift register (LFSR), which is seeded with the 'key' parameter. That LFSR generates one byte for every bit of the message, and the bytes that are generated for the message bits that are set are xor'ed together, and it is the result of that that is the check code.

    In higher level terms the general approach it is using can be described like this in a Pythonish pseudocode, assuming PRNG() is a psuedorandom byte generator, and seed_PRNG() is a function to seed that generator:

      def check_code(message, key):
        sum = 0
        seed_PRNG(key)
        foreach bit in message:
          next_random = PRNG()
          if bit == 1:
            sum ^= next_random
        return sum
    
    Pick a Galois LSFR with feedback polynomial 'gen' as your PRNG and that matches the AcuRite code.
  • Sianko 4 days ago

    So there are many different flavors of crc32. Usually you can always change the initialization value. This algorithm is only one use case.

    Check this table out https://www.crccalc.com/?crc=123456789&method=&datatype=0&ou...