maybe some ZK magic, idk. But I know they guy to ask, which is nostr:nprofile1qqst3axzay8sm4n8zg2n84acmt7hwwztpdg9r7p89e2f83v007f7zjcpzfmhxue69uhky6t5wd6xzcmt9eshquqpp4mhxue69uhkummn9ekx7mq00upqh

Reply to this note

Please Login to reply.

Discussion

Always looking for more ways to leverage ZK for Bitcoin and Nostr. What is the need exactly ?

There is NIP-45, that is used to count how many events match a certain filter.

So the proof ideally would prove that X (the returned count) events with valid signatures actually matched the filter(s).

i must make a note to implement count on #orly - the database is designed to avoid needing to decode events before it is certain they are result candidates, so it can actually do it, at least for author/kind/timestamp filters, without decoding a single event stored in the database. probably i need to add a new index that covers the gap with tags, like pubkey/tag/timestamp, and the count function never has to decode any events.

my main objection to count had to do with the excessive work required that was basically the same as if you actually just fetch the event and do that processing locally. but with the necessary indexes that doesn't have to happen.

trading off the size of the database for the speed of search and count is what i'm talking about.

there definitely is algorithms that would run a lot faster if they could do counts

> my main objection to count had to do with the excessive work required that was basically the same as if you actually just fetch the event and do that processing locally

For a client these aren't even comparable. 100k events + signature checks vs 1 COUNT

yes but if the relay has to decode all 100k of those events, there's not really any gain. count searches need to be as efficient as possible because of extreme numbers like 100k.

you simply wouldn't even request that much if you wanted the whole thing so it's mad if the relay's database literally has to parse the whole damn thing, just for a headcount.

ah, okay you were looking from the relay perspective. Ofc you are right.

In my own SQLite implementation I just use the SELECT COUNT, and it's quite efficient.

yeah, i build my DB from scratch using a key/value store so such a search index has to be added. it has part of it already but it doesn't cover the full filter query space, only pubkey/id/timestamp. it also needs tags and kinds.

the extra index eliminates the need to decode events by allowing queries to effectively search by those criteria and not sift through such inordinately large volumes of data.

i've chalked in a mental note to implement count properly for the near future, it's not a big task really.

ah got it. Yeah ofc in a key-value store you have to implement that from scratch.

Btw if you ever need a SQLITE package for nostr events, I wrote this

https://github.com/pippellia-btc/nastro

It provides bare bone tables, but in the constructor you are free to add other schemas, or triggers, and you can even change or modify the default Query builder to take advantage of new tables, indices and so on.

well, i'm so used to building badger based databases i have a bunch of libraries for it, to manage building keys and and generating indexes. doesn't take me that much longer to make an embedded, dedicated database with some particular set of tables this way compared to learning some yet another new thing to work with antiquated sql query engines.

i mean, it's just my thing, building things like this so yeah. i write the encoders and the database engines and the network schedulers. offloading stuff to inherently slower, less efficient, less scalable fixed-purpose flexible database engines just doesn't make sense for me, and when you see how fast it returns results the reason is clear.

IMO, nostr is a unique enough type of network connected database protocol, so open ended, that using generic general purpose parts to build it is a big sacrifice in performance.

This is a fair take

If you are interested, you can open a PR to nastro adding your own badger based DB. I would actually really appreciate it.

The prerequisite is just to satisfy the store interface, expose the internals (so people can make queries directly if needed), and allow for customisation.

I think it would be quite beneficial to the ecosystem to move away from the event store fiatjaf packages, which is quite crappy

yeah, the event store interface is kinda wrong too. it should be sorting the results before sending them, just not depending on the insertion time of the record to be a match with it (this breaks when someone uploads a bunch of old events to a relay). i mean, at least, where sqlite or postgresql or others are used, the results are sorted in the engine, but the badger engine that fiatjaf made does not, and it runs WAY too many threads.

create an issue on the repo and tag me as assigned to the issue of adding a badger based DB engine implementation. i'll get to it in the next month. it's not a difficult task, just looking at the interface, creating all the stub implementations of the methods and then putting the right bits into them based on the existing engine i have built. probably a day or so work maybe.

> i mean, at least, where sqlite or postgresql or others are used, the results are sorted in the engine, but the badger engine that fiatjaf made does not

oooof, not good.

> and it runs WAY too many threads

typical...

> create an issue on the repo and tag me as assigned to the issue of adding a badger based DB engine implementation

That's great, thank you sir, I'll do ASAP!

just following up, the implementation is done and a PR is awaiting you

i added a few helpers and whatsits into #orly tag code mainly that will also help with transitioning.

orly event and filter codecs use binary data where the field represents binary data. event IDs, pubkeys, signatures all are in that form already after they have been unmarshaled. i think it probably helps a little with memory and speed since comparing a byte is faster than comparing two bytes.

anyway, i hope it helps you make your analytics even more efficient, i have it in my mind to look into it, after a few people have put it into their apps. now i'm gonna be in the credits i want to know what the show looks like :)

Thanks mate, I'll look into it tomorrow.

btw, #ORLY now implements this, for anyone looking. the pubkey/kind/created_at index enables counting events without decoding them.

you'll see in the relayinfo that it claims support for the count nip

It’s definitely doable but can take quite some time and resources, so it might be a fit only for archiving old data. Eg for 422k likes it would be several hours and up to 1TB of ram which would cost $6-8 per hour. The cost can be justified if the events are quite sensitive but def not for ordinary posts:) We do expect 10-100x perf improvements in the nearest future though so it might become viable at some point

Check this sir nostr:nprofile1qqs0dqlgwq6l0t20gnstnr8mm9fhu9j9t2fv6wxwl3xtx8dh24l4auspzamhxue69uhhyetvv9ujuurjd9kkzmpwdejhgtcpzemhxue69uhkzat5dqhxummnw3erztnrdakj7us2nqv . Michael is from my team. We are cooking some cool shit that might unlock this in the future.

Looking forward to it guys.

Maybe it doesn't help, but I'll throw it out there. HyperLogsLogs give an estimate of the cardinality of a set, and are one of the most efficient ways to reduce memory/storage requirement when it comes to counting unique elements.

If you don't know, they work by keeping track of the max of the number of consecutive 0s (or 1s, doesn't matter) in the hash of the elements.

example

"test1" --> hash --> 01001111 (4 consecutive "1"s)

"test2" --> hash --> 10101101 (1 consecutive "1")

Maybe there is a ZK version of an HLL, where I can prove that the maximum number of consecutive "1"s was X, which allow me to estimate the cardinality of the set