A short introduction to Interval Tree Clocks

(blog.separateconcerns.com)

58 points | by LAC-Tech 2 days ago ago

17 comments

mackross 12 minutes ago

I used these in a distributed sync in about 2013-2014. When Postgres got JSON support we ended up going centralized for that product and a simple logic clock was all we needed. Nevertheless, I still think they’re very cool.

rdtsc 6 hours ago

Thanks for sharing. I liked the short video.

These have been around for a few years. Wonder what people built with them since. Anyone know of any popular distributed systems using them nowadays? (I noticed the company the author mentioned shut down https://medium.com/@MeetLima/this-time-its-goodbye-5573a97be...)

As an additional resource I liked the description of ITC by Fred Hebert: https://ferd.ca/interval-tree-clocks.html. He is the author of the wonderful Learn You Some Erlang book. There is an ITC library he maintains as well: https://github.com/ferd/interval-tree-clocks, started by Paulo Sergio Almeida.

[-]

catwell 3 hours ago

Lima did not use ITC in its product anyway, we used classic version vectors. We considered them for future evolutions but never implemented it. It's funny that Fred Herbert's use case was a "peer-to-peer Dropbox", because it is basically what Lima was.

I don't know many systems that use ITC. Version vectors are simpler and sufficient in most cases. One of the authors of the original paper mentioned in a later presentation (https://cbaquero.github.io/web/pdf/SDLtime2021.pdf) that they were used for tracing, in particular in a 2015 system called Pivot Tracing.

[-]

rdtsc 3 hours ago

Thank you for answering, That's very interesting! Yeah, I recall being excited about ITCs, but it doesn't seems like they made their way into too many practical distributed systems.

[-]

catwell 2 hours ago

Also I had forgotten but here is a more recent follow-up to Fred Hebert's post: https://ferd.ca/a-bridge-over-a-river-never-crossed.html

quelltext 7 hours ago

Several things remain unclear:

"The beauty of this scheme is that a node only has to know about its share of the interval"

The article doesn't explain curve changes in much detail, but I assume it increases the portion of the curve "owned" by the node.

With unique identifiers all a node needs to know its identifier. So that can't be what's interesting about these interval portions.

Also:

- How that curve is initially drawn isn't clear at all. Is it flat and becomes complex over time by forking (+ data modification)?

- Why are interval boundaries real-value in a system that cannot actually express real numbers?

- How are the intervals / portions decided? Is that simpler than generating UUIDs?

- How does comparison work?

"Comparison works similarly to Version Vectors: if the curve of a stamp is above the other one, it descends it, otherwise the curves intersect and the stamps are concurrent."

But now you have events with curves and intervals where one event might miss a portion. It's not immediately clear what happens in that comparison. It's maybe obvious to some readers but clearly not an audience that needed introduction for the other things initially explained by the article.

In terms of conclusions:

My understanding is that the main benefit is that the overall complexity of the "vector" becomes simpler in light of actor explosion due to the merging mechanism. Whereas UUIDs (or even monotonous index indexed vectors) would grow indefinitely, making tracking them on events a challenge.

This intro article fails to make this stuff clear.

[-]

catwell 3 hours ago

Hello, author here. Sorry about the lack of clarity, this article is the transcript of a 5 min lightning talk and it was really hard fitting all the relevant content in that little time :) (In retrospect that was a poorly chosen topic. When I picked it I thought the talk would be 10 min, 5 min is too short to explain a subject like this.)

> How that curve is initially drawn isn't clear at all. Is it flat and becomes complex over time by forking (+ data modification)?

Yes, the initial curve is typically constant 0.

> Why are interval boundaries real-value in a system that cannot actually express real numbers?

Like snthpy said "real" is a shortcut to say infinitely subdivisible. The numbers themselves are actually rationals.

> How are the intervals / portions decided? Is that simpler than generating UUIDs?

Nodes are forked from an existing node, that node decides which portion of its interval it gives to the new node. You pick the splitting point to keep complexity low.

Regarding comparison: you always know the values of the whole curve. When I say "a node only has to know about its share of the interval" I only mean the ID space. In a version vector there is a direct link between identifiers and counters, whereas here outside of your share of the interval you don't know who owns what or how many devices there are at any given point.

rdtsc 5 hours ago

> Why are interval boundaries real-value in a system that cannot actually express real numbers?

They can be represented as a binary tree, that is a nested list or tuples, for instance. The whole interval (1) could be split into (1,0) and (0,1)). Then (1,0) split into ((1,0),0) and ((0,1),0). And (0,1) split into (0,(1,0)) and (0,(0,1) and so on.

[-]

snthpy 4 hours ago

I guess his point remains though that if all you need is infinite divisibility then using the rational numbers between 0 and 1 would be sufficient. I take it as that was what was meant and "real" numbers was just shorthand for that.

josephg 9 hours ago

Interval tree clocks seem really cool, and I’d love to use them for something. But unfortunately I can’t think of a way to reliably have a system automatically subdivide an interval to bring new servers up without getting pathological behaviour sometimes. Or, if an identifier (interval) goes offline indefinitely, I can’t think of any good way to clear it.

In practice, vector clocks / version vectors built using uuid identifiers for machines seem easier to work with, and they’re very predictable. Slightly less efficient - but versions take up a tiny percentage of network traffic and disk space that I don’t think it matters.

I really want to use interval tree clocks for something. It’s such a cool algorithm. But I don’t know what I’d use them for - beyond maybe a set of manually provisioned servers syncing data or something.

[-]

catwell 3 hours ago

I don't really know what you mean regarding pathological subdivisions, but the case where nodes go offline without merging first is a real issue. There are ways to work around it but if you have to do so you lose a lot of its advantage compared to other schemes.

When I did that talk I was at the point you are now, trying to find a real use case. I had considered them for Lima but we were actually find with version vectors.

I think the case where ITCs can really shine is when you have a lot of short-lived nodes. That could be containers or serverless functions, for instance. I didn't think about the tracing use case but it makes a lot of sense, you can fork a new node per request.

[-]

josephg 36 minutes ago

> I don't really know what you mean regarding pathological subdivisions

I mean - lets say I'm making a collaborative text editor. If I use a CRDT, I need clients to be able to emit changes with unique IDs. The obvious way to do that with ITC is to have a server (or set of servers) hand out part of their interval to clients. And the normal way to hand out part of the server's interval is to split it in two each time.

If you have N clients join the server, the server will end up with 1/2^n of the keyspace - since it keeps halving each time a client joins. And some of those clients will just - go away, indefinitely, without "giving the key back". So the server's interval gets smaller and smaller, and the key (interval) that new clients are given will take more and more bits over time. Linearly more bits.

With random IDs, the client never needs to give the ID back. You still have problems that the version grows over time - but I can think of a bunch of ways to deal with that. I haven't figured out how to solve that problem with interval tree clocks.

catlifeonmars 8 hours ago

Merging actually seems quite simple given an external coordinator. All you need is some periodic process that can observe the state of the clock, find gaps, and communicate the merge with one of the adjacent nodes. It is something that can be eventually consistent. Splits could occur similarly. Sure you have to ensure single writes from the coordinating process, but that’s easier to solve bc of the relative infrequency that the coordinator needs to run, leaving lots of time for consistency in between updates.

The key assumption I’m making here is that nodes communicate much more frequently than nodes are added and removed from the system. We’re optimizing for the latency of the former at the cost of latency of the latter.

[-]

josephg 33 minutes ago

I'm thinking about this in the context of an eventually consistent CRDT or something. In that case, you need to handle the use case of a client going offline (eg the person is on holiday) and they make changes but aren't online for 2 weeks.

If a peer isn't emitting changes, I don't see how you can safely merge their keyspace - since they might still be using that keyspace, but be offline, and you can't know.

hedgehog 3 hours ago

The special thing about ITC is you can do the splitting (to add a member) and joining (to remove one) without any external coordination. They otherwise are used in the same way as vector clocks. I find the linked post more confusing than I remember the original paper being (I wrote an implementation):

http://hydra.azilian.net/Papers/Interval%20Tree%20Clocks.pdf

convolvatron 10 hours ago

the link in the paper referenced goes to some auto-scrolling page with all kinds of content. I found a direct link here:

http://hydra.azilian.net/Papers/Interval%20Tree%20Clocks.pdf

[-]

catwell 3 hours ago

Yes, the original link is dead, and that one doesn't work for me currently. I'll edit the post if I find a working one.

EDIT: I found where the paper has moved, I have updated the article.