What's the best relational database model for Nostr? Just event and tag tables or tags as JSON objects inside the events table? Or maybe best to use a native array column for tags?

Does the randomness of the hex keys (event id, pubkey) affect the indexing performance?

Reply to this note

Please Login to reply.

Discussion

Free or Paid?

On https:/github.com/fiatjaf/relayer I've done just event table, with fancy Postgres indexes for the tags, but I have no idea about performance. I believe #[1] used a separate table for tags on nostr-rs-relay, but I am not sure. #[0]?

As far as I know randomness of keys affect insert performance, not read.

is that enough info?

Nostream uses a jsonb field for tags. But a tags table with correct indexes is more performant.

cc #[5]

I have about 20M tags on my DB it's hungry on CPU because tag quieries are usually the most common, threads and reactions, notifications etc, author lookups are much cheaper

That was my main concern... I think I will do a separate table to make queries slightly faster.

Correction: 36M

I’ve learned to generally avoid jsonb fields as much as possible. It never seems like a mistake until it does.

Wow thanks for the zap my friend 🫡

I don't know what is "best". For reference gossip has event, event_tag (fk to event), event_relationship (relationship being reply, quote, reaction, deletion, etc), event_flags (user-supplied like 'viewed' or 'mute'), event_hashtag, event_relay (where it was seen), person, person_relay, settings, local_settings (ones not synced to the network).

It also reads from the database sparingly preferring memory maps with a sort of write-through sematic (although inconsistently implemented at present), and only queries the database in fairly rare circumstances. I'm entirely unsure if using that memory caching is better than the database's own memory caching, or if it is entirely unecessary and wasteful. Ideally a database would mmap and use kernel paging, however we talk to it using SQL queries which must be planned and executed which is not free. Preplanning all the queries in a global map, and not trying to do any memory caching, might be a better strategy but I'm far down this road.

does gossip have a new version yet? the initial build was pretty rough, but it did load and it did run. debian 11 here

mongo is web scale

Randomness of keys should not affect index performance on read if they’re fixed length. But it most likely does on write.

If you‘d like an introduction read about why, I think „radix sort“ is a good topic to start.

One table per kind and denormalized to the bones.

Drawback is that this can't store out of spec events.

It also requires searching all tables for the ID you have.

Bloom filter per kind?

Don't know, havent build a relay. So don't know the specific queries that are needed.

Is it possible to trace the requests on a relay? That would be really cool for a benchmark set.

As we grow the next question is around partitioning data and how to cache in front of the db better. Likely based on kind and created_at to trial for partitioning.

I’ve split a fair chunk out, but have jsonb event data in events table too. Size is mostly due to indexes (but also 13MM events) - I should likely time bound them. My workload is less as a relay however.

It will likely come down to what’s your db workload like and what’s the best store for that case - often storing the same data lots of times in different ways (eg. elasticsearch, caching, data warehouse, etc.

Also testing Cassandra for query performance. Need more time.