Avatar
Clay Ferguson
89482c018acff70163b55de55690783a5f30dcf27fd142f93a089610589342cd
Software Developer. Dallas TX. Creator of Quanta Web Platform. https://quanta.wiki

`By: Clay Ferguson, May 14, 2023`

# Nostr Merkel+Modulo Balanced Query (MMBQ)

`Caveat: There will be a new version of this doc that uses MurmurHash instead Modulo for generating buckets, but that's a trivial alteration of the algo.`

The following is an explanation of two ways to deal with retrieval of Nostr Events from multiple relays, designed to get all information from all relays, and ensure all information is gotten, and to ensure no duplicate data is transferred from multiple relays.

In other words we want to avoid getting the same data from multiple relays, in cases where all relays are in sync relative to the results of a given query.

I'm not claiming to have invented the concept of using Merkle Trees for syncing, I'm just attempting to lay out the precise and simplest possible way I can see this technique of hashing a set of hashes (which I loosely equate to the term Merkle) being applied to Nostr Relays.

No one in the Nostr community is going to want to jump to a big merkle tree for all data, *but* applying the optimization I describe below seems like a great way to distribute load (load balance) across Relays *and* check for "in sync" or "not in sync" across relays too.

https://quanta.wiki/mobile/api/bin/6461649f07725530e5c83cc2?nodeId=646154a307725530e5c8394e

## Note 3:

#### DB Sharding v.s. Nostr Relay Sharding

The MMBQ concept is similar to "Database Sharding".

* https://aws.amazon.com/what-is/database-sharding/

## Note 3:

#### DB Sharding v.s. Nostr Relay Sharding

The MMBQ concept is similar to "Database Sharding".

* https://aws.amazon.com/what-is/database-sharding/

## Note 3:

#### DB Sharding / Nostr Relay Sharding

The MMBQ concept is similar to "Database Sharding".

* https://aws.amazon.com/what-is/database-sharding/

Sharding at a deeper (non-protocol level) could also be done just by having relays that "internally" (without the outside world knowing it), cooperate by participating in a network of other relays who are sharing for example one `single distributed MongoDB DB where the "Sharding" is done at the DB level`. This is true Database Sharding, and not 'protocol level sharding'.

So, multiple relays can be 'cooperating' in this way too, but you'd *sill* need the kind of 'hash bucket' strategy (or other sharding strategy) mentioned in this doc (MMBQ) if you wanted to ensure each Nost Relay would send back a unique part of the result set.

#### Nostr Sharded Relay Group

It should be obvious that we could theoretically invent `"Sharded Relays" for Nostr ` where a given set of independent relays function as a whole, and where each relay `only accepts NostrIDs that hash into a particular MurMurHash bucket`. This is similar to a "Distributed Hash Table", but I think is different.

## Note 3:

#### DB Sharding / Relay Sharding

The MMBQ concept is similar to "Database Sharding".

* https://aws.amazon.com/what-is/database-sharding/

## Note 3: DB Sharding / Relay Sharding

The MMBQ concept is similar to "Database Sharding".

* https://aws.amazon.com/what-is/database-sharding/

This hash-bucket-based sharding would be what I'd call `"Sharding at the Protocol Level"`. If Relays were sharding in this way then client apps would have an even higher probability (or expectation) that the MMBQ algo described in this doc will indeed perform well at the goal of retrieving from multiple relays simultaneously with a guarantee of no duplicate Event data being sent back from any of the relays in the pool.

The only challenge is that the clients need to know which relays are in a Sharded Set, so that they get records that are `"Shard Complete"`, or in other words if you query from a sharded relay you need to know that and query from all relays in the "Sharded Set"

#### Nostr Sharded Relay Group

It should be obvious that we could theoretically invent "Sharded Relays" in Nostr where a given set of independent relays function as a whole, but where each relay only accepts NostrIDs that hash into a particular MurMurHash bucket. This is similar to a "Distributed Hash Table", but I think is different.

In `DB Sharding` the individual databases don't store records they're not responsible for, but only a subset of all records. `Nostr Relays don't normally work this way`, however. Nostr Relays hold everything, or let's just say "aren't sharded".

## DB Sharding / Relay Sharding

The MMBQ concept is similar to "Database Sharding".

* https://aws.amazon.com/what-is/database-sharding/

`By: Clay Ferguson, May 14, 2023`

# Nostr Merkel+Modulo Balanced Query (MMBQ)

`Caveat: There will be a new version of this doc that uses MurmurHash instead Modulo for generating buckets, but that's a trivial alteration of the algo.`

The following is an explanation of two ways to deal with retrieval of Nostr Events from multiple relays, designed to get all information from all relays, and ensure all information is gotten, and to ensure no duplicate data is transferred from multiple relays.

In other words we want to avoid getting the same data from multiple relays, in cases where all relays are in sync relative to the results of a given query.

I'm not claiming to have invented the concept of using Merkle Trees for syncing, I'm just attempting to lay out the precise and simplest possible way I can see this technique of hashing a set of hashes (which I loosely equate to the term Merkle) being applied to Nostr Relays.

No one in the Nostr community is going to want to jump to a big merkle tree for all data, *but* applying the optimization I describe below seems like a great way to distribute load (load balance) across Relays *and* check for "in sync" or "not in sync" across relays too.

https://quanta.wiki/mobile/api/bin/6461649f07725530e5c83cc2?nodeId=646154a307725530e5c8394e

## Psudo Code

In case this approach seems complicated, note that the basic code for adding this algorighm to any relay would be nothing more complex than this:

```

const bucket = getBucket(event.hash)

if (bucket == targetBucket) {

results.push(event);

}

else {

merkleStr[bucket] += "-"+event.hash;

}

then...

merkleHashes: string[] = merkleStr.map(v => sha256(v));

```

The above is a very simplified way of generating the Merkel (hash of hashes), but there are other ways like using a "Digest" and adding each event.hash to the digest for a given bucket and then finalizing the digest at the end.

`By: Clay Ferguson, May 14, 2023`

# Nostr Merkel+Modulo Balanced Query (MMBQ)

`Caveat: There will be a new version of this doc that uses MurmurHash instead Modulo for generating buckets, but that's a trivial alteration of the algo.`

The following is an explanation of two ways to deal with retrieval of Nostr Events from multiple relays, designed to get all information from all relays, and ensure all information is gotten, and to ensure no duplicate data is transferred from multiple relays.

In other words we want to avoid getting the same data from multiple relays, in cases where all relays are in sync relative to the results of a given query.

I'm not claiming to have invented the concept of using Merkle Trees for syncing, I'm just attempting to lay out the precise and simplest possible way I can see this technique of hashing a set of hashes (which I loosely equate to the term Merkle) being applied to Nostr Relays.

No one in the Nostr community is going to want to jump to a big merkle tree for all data, *but* applying the optimization I describe below seems like a great way to distribute load (load balance) across Relays *and* check for "in sync" or "not in sync" across relays too.

https://quanta.wiki/mobile/api/bin/6461649f07725530e5c83cc2?nodeId=646154a307725530e5c8394e

## Addendum

### Note 1:

Using 'modulo' was actually not correct. We just need a hashing function to map any NostrId into N buckets where number of buckets is the number of Relays. Regardless of this mistake the above algo will still work.

* https://stats.stackexchange.com/questions/26344/how-to-uniformly-project-a-hash-to-a-fixed-number-of-buckets

Candidate replacement for modulo to use a bucket instead:

* https://en.wikipedia.org/wiki/MurmurHash

Candidate Impl (getBucket)

https://gist.github.com/raycmorgan/588423#file-murmurhash2-js-L68

## Clarification/Example

Here's more on how the multi-relay query would work. Say you want to do a query on multiple relays. You would arbitrarily pick the three relays R1, R2, R3, and randomly assign each one a bucket B1, B2, B3. Then your query to each relay would simply need to designate which bucket you want that relay to be, and how many total buckets you have.

So as any given relay is going thru the result of it's own query all it needs to know is it's hashing into 1 of N buckets (where it knows what N is) and it needs to know if it's functioning as the first bucket, it needs to send bucket 1 records back. If the relay was told it's doing a bucket 2 of N it should send back things that hash into bucket 2, and so on.

So for a 3 Relay system (N=3) we have:

* Relay 1 sends back:

- All Bucket 1 Events (as query results array)

- Merkelized Hash for all Bucket 2 Events (as a string)

- Merkelized Hash for all Bucket 3 Events (as a string)

* Relay 2 sends back:

- All Bucket 2 Events (as query results array)

- Merkelized Hash for all Bucket 1 Events (as a string)

- Merkelized Hash for all Bucket 3 Events (as a string)

* Relay 3 sends back:

- All Bucket 3 Events (as query results array)

- Merkelized Hash for all Bucket 1 Events (as a string)

- Merkelized Hash for all Bucket 2 Events (as a string)

So again it's arbitrary how many Relays you query from and it's arbitrary which relay you randomly make 1, which one is 2, and which one is 3. The only important thing is each relay returns just it's bucket of results, and the merklized hash for all other buckets.

In reality the actual NIP would probably need to say all the Merkelized Hashes can just go in a single array of strings, but it's more clear to enumerate it the way I did above, to make it clear.

So the only additional query criteria parameter would be something like this bucket prop:

```json

{

...

"bucket-target": 1,

"bucket-count": 3

}

```

This would tell the relay: "Query as if you're bucket 1 of 3", and send back only bucket 1 Events, and merkel hashes for buckets 2, 3.

`By: Clay Ferguson, May 14, 2023`

# Nostr Merkel+Modulo Balanced Query (MMBQ)

`Caveat: There will be a new version of this doc that uses MurmurHash instead Modulo for generating buckets, but that's a trivial alteration of the algo.`

The following is an explanation of two ways to deal with retrieval of Nostr Events from multiple relays, designed to get all information from all relays, and ensure all information is gotten, and to ensure no duplicate data is transferred from multiple relays.

In other words we want to avoid getting the same data from multiple relays, in cases where all relays are in sync relative to the results of a given query.

I'm not claiming to have invented the concept of using Merkle Trees for syncing, I'm just attempting to lay out the precise and simplest possible way I can see this technique of hashing a set of hashes (which I loosely equate to the term Merkle) being applied to Nostr Relays.

No one in the Nostr community is going to want to jump to a big merkle tree for all data, *but* applying the optimization I describe below seems like a great way to distribute load (load balance) across Relays *and* check for "in sync" or "not in sync" across relays too.

https://quanta.wiki/mobile/api/bin/6461649f07725530e5c83cc2?nodeId=646154a307725530e5c8394e

## Addendum

* Note 1: Using 'modulo' was actually not correct. We just need a hashing function to map any NostrId into N buckets where number of buckets is the number of Relays. Regardless of this mistake the above algo will still work.

* https://stats.stackexchange.com/questions/26344/how-to-uniformly-project-a-hash-to-a-fixed-number-of-buckets

Candidate replacement for modulo to use a bucket instead:

* https://en.wikipedia.org/wiki/MurmurHash

Candidate Impl (getBucket)

https://gist.github.com/raycmorgan/588423#file-murmurhash2-js-L68

`By: Clay Ferguson, May 14, 2023`

# Nostr Merkel+Modulo Balanced Query (MMBQ)

`Caveat: There will be a new version of this doc that uses MurmurHash instead Modulo for generating buckets, but that's a trivial alteration of the algo.`

The following is an explanation of two ways to deal with retrieval of Nostr Events from multiple relays, designed to get all information from all relays, and ensure all information is gotten, and to ensure no duplicate data is transferred from multiple relays.

In other words we want to avoid getting the same data from multiple relays, in cases where all relays are in sync relative to the results of a given query.

I'm not claiming to have invented the concept of using Merkle Trees for syncing, I'm just attempting to lay out the precise and simplest possible way I can see this technique of Merkle Sync being applied to Nostr Relays. No one in the Nostr community is going to want to jump to a big merkle tree for all data, *but* applying the optimization I describe below seems like a great way to distribute load (load balance) across Relays *and* check for "in sync" or "not in sync" across relays too.

https://quanta.wiki/mobile/api/bin/6461649f07725530e5c83cc2?nodeId=646154a307725530e5c8394e