In rsync the entire diff is sent in one round trip, but this can be massive if there are lots of gaps in the data. Negentropy employs a divide and conquer strategy where it recursively splits the ranges in half and fixes up subranges in batches. It’s more efficient and the communication rounds scales O(log n) which is why it’s increasing so slowly from 1 million to 1 billion.

The article explains this very well in more detail with visualizations and everything.

Reply to this note

Please Login to reply.

Discussion

Thanks! Will read the article.

Its slighty nerdy 🤓

I do data engineering/analytics in Python and many flavors of SQL… but I never actually studied low level Comp Sci or algorithms, outside of stats. Starting to get more interested in this stuff 🤓🤙