Nostr Web Client

What's the best relational database model for Nostr? Just event and tag tables or tags as JSON objects inside the events table? Or maybe best to use a native array column for tags?

Does the randomness of the hex keys (event id, pubkey) affect the indexing performance?

Reply to this note

Please Login to reply.

Discussion

ᴀᴅᴀꜱʜᴇᴋ 2y ago

Free or Paid?

fiatjaf 2y ago

On https:/github.com/fiatjaf/relayer I've done just event table, with fancy Postgres indexes for the tags, but I have no idea about performance. I believe #[1] used a separate table for tags on nostr-rs-relay, but I am not sure. #[0]?

As far as I know randomness of keys affect insert performance, not read.

denis2342 2y ago

is that enough info?

Cameri🐦‍🔥 2y ago

Nostream uses a jsonb field for tags. But a tags table with correct indexes is more performant.

cc #[5]

Kieran 2y ago

I have about 20M tags on my DB it's hungry on CPU because tag quieries are usually the most common, threads and reactions, notifications etc, author lookups are much cheaper

Vitor Pamplona 2y ago

That was my main concern... I think I will do a separate table to make queries slightly faster.

Kieran 2y ago

Correction: 36M

𝚗𝚘𝚍𝚎𝚌𝚒𝚝𝚒𝚣𝚎𝚗 2y ago

I’ve learned to generally avoid jsonb fields as much as possible. It never seems like a mistake until it does.

𝚗𝚘𝚍𝚎𝚌𝚒𝚝𝚒𝚣𝚎𝚗 2y ago

Wow thanks for the zap my friend 🫡

Mike Dilger ☑️ 2y ago

I don't know what is "best". For reference gossip has event, event_tag (fk to event), event_relationship (relationship being reply, quote, reaction, deletion, etc), event_flags (user-supplied like 'viewed' or 'mute'), event_hashtag, event_relay (where it was seen), person, person_relay, settings, local_settings (ones not synced to the network).

It also reads from the database sparingly preferring memory maps with a sort of write-through sematic (although inconsistently implemented at present), and only queries the database in fairly rare circumstances. I'm entirely unsure if using that memory caching is better than the database's own memory caching, or if it is entirely unecessary and wasteful. Ideally a database would mmap and use kernel paging, however we talk to it using SQL queries which must be planned and executed which is not free. Preplanning all the queries in a global map, and not trying to do any memory caching, might be a better strategy but I'm far down this road.

ringo 2y ago

does gossip have a new version yet? the initial build was pretty rough, but it did load and it did run. debian 11 here

w3ird_ 2y ago

mongo is web scale

Josua Schmid 2y ago

Randomness of keys should not affect index performance on read if they’re fixed length. But it most likely does on write.

If you‘d like an introduction read about why, I think „radix sort“ is a good topic to start.

mutatrum 2y ago

One table per kind and denormalized to the bones.

Drawback is that this can't store out of spec events.

Vitor Pamplona 2y ago

It also requires searching all tables for the ID you have.

mutatrum 2y ago

Bloom filter per kind?

Don't know, havent build a relay. So don't know the specific queries that are needed.

mutatrum 2y ago

Is it possible to trace the requests on a relay? That would be really cool for a benchmark set.

Blake 2y ago

As we grow the next question is around partitioning data and how to cache in front of the db better. Likely based on kind and created_at to trial for partitioning.

I’ve split a fair chunk out, but have jsonb event data in events table too. Size is mostly due to indexes (but also 13MM events) - I should likely time bound them. My workload is less as a relay however.

It will likely come down to what’s your db workload like and what’s the best store for that case - often storing the same data lots of times in different ways (eg. elasticsearch, caching, data warehouse, etc.

Also testing Cassandra for query performance. Need more time.