Packed Data Support in Haskell

(arthi-chaud.github.io)

69 points | by matt_d 20 hours ago ago

11 comments

nine_k 18 hours ago

> Introducing the ‘packed’ data format, a binary format that allows using data as it is, without the need for a deserialisation step. A notable perk of this format is that traversals on packed trees is proven to be faster than on ‘unpacked’ trees: as the fields of data structures are inlines, there are no pointer jumps, thus making the most of the L1 cache.

That is, a "memory dump -> zero-copy memory read" of a subgraph of Haskell objects, allowing to pass such trees / subgraphs directly over a network. Slightly reminiscent of Cap'n Proto.

[-]

carterschonwald 3 hours ago

One thing that sometimes gets tricky in these things is handling Sub term sharing. I wonder how they implemented it.

Zolomon 13 hours ago

They mention this in the article.

spockz 12 hours ago

It reminds me more of flat buffers though. Does protobuf also have zero allocation (beyond initial ingestion) and no pointer jumps?

90s_dev 14 hours ago

We are always reinventing wheels. If we didn't, they'd all still be made of wood.

NetOpWibby 2 hours ago

Is this like MessagePack for Haskell?

tlb 9 hours ago

> the serialised version of the data is usually bigger than its in-memory representation

I don’t think this is common. Perhaps for arrays of floats serialized as JSON or something. But I can’t think of a case where binary serialization is bigger. Data types like maps are necessarily larger in memory to support fast lookup and mutability.

[-]

IsTom 5 hours ago

If you use a lot of sharing in immutable data it can grow a lot when serializing. A simple pathological example would be a tree that has all left subtrees same as the right ones. It takes O(height) space in memory, but O(2^height) when serialized.

nine_k 8 hours ago

I suppose all self-describing formats, like protobuf, or thrift or, well, JSON are bigger than the efficient machine representation, because they carry the schema in every message, one way or another.

lordleft 4 hours ago

This was very well written. Excellent article!

gitroom 7 hours ago

honestly i wish more stuff worked this way - fewer hops in memory always makes me happy