Roaring Bitmap Compression

(arxiv.org)

31 points | by hambandit 7 days ago ago

3 comments

  • o11c 4 hours ago

    Title is confusing: this is not about the original "Roaring", but an extension of it called "Roaring+Run".

    Here, "bitmap" = "set of sometimes-compact integers". The "uncompressed" and several "rle" implementations are obvious. Hm, except this only seems to be talking about a particularly-naive RLE approach (suitable for storage but not computation)? If you're doing computation I expect you to use absolute offsets rather than relative ones which means you can just do binary search (the only downside of this is that you can't use variable-length integers now).

    Roaring is just a fixed 2-level trie, where the outer node is always an array of pointers and where the inner nodes can be either uncompressed bitvectors (if dense) or an array of low-half integers (if sparse). Also, it only works for 32-bit integers at a fundamental level; significant changes are needed for 64-bit integers.

    This paper adds a third representation for the inner node, the bsearch'able absolute RLE method I mentioned earlier (before even reading the paper beyond the abstract).

    Overall there's neither anything novel nor anything particularly exciting about this paper, but if you ignore all the self-congratulations it might work as a decent intro to the subject? Except maybe not since there are a lot of things it fails to mention (the ping-pong problem, deduplicated tries, the approach of using a separate set for sparse values in the same range, the "overall sparse but locally semi-dense" representation that uses fixed-size single-word bitsets, ...)

    • Drup 2 hours ago

      You seem well versed into that corner. Do you have a good (and reasonably complete) introduction/exploration for these memory-efficient data-structure for computation ?

      I've been working on memory representation of algebraic data types quite a bit, and I've always wondered if we could combine them with succinct data-structures.

  • pncnmnp 3 hours ago

    Hey everyone, if you're looking for a more approachable guide on bitmap compression, I wrote a blog post on it this year: https://pncnmnp.github.io/blogs/roaring-bitmaps.html. It covers run-length encoding, BBC, WAH, Concise, and even Roaring Bitmaps.