My query yo hundreds of relays for thousands of pubkeys is still running after more than a day.. but is about to finish I hope tonight
I downloaded gigabytes of days from millions of events .. let’s see what I find out
My query yo hundreds of relays for thousands of pubkeys is still running after more than a day.. but is about to finish I hope tonight
I downloaded gigabytes of days from millions of events .. let’s see what I find out
The 80% be like
"Thanks me later"
No creas eh.. ya mañana me dedicaré a ver bien
Contá que tal! Esto es todo lo q esta a 3 grados de relación de tu pibkey?
Si nada más que ahora lo estoy haciendo bien y por eso me está tardando más. Pero voy a ver varias cosas ahí que iré compartiendo
Por ejemplo en cuantos relays están mis notas.. cuáles relays tienen mayor cantidad de eventos.. y luego ver cosas de los que sigo.. quienes son los que más escriben y como es la distribución etc
Si hay algo que te da curiosidad decime y veo si lo puedo sacar
En principio ver si se confirma la regla de pareto, el 20% del userbase representa el 80% de las notas
What sort of data lake are you using?
Also, curious how you're querying it all, Rust?
Just with my laptop, storing the data in a PostgresDB. I’m using python picking from different repos that I found online. Happy to share more details
You going to make some cool metrics/dashboard or something? I imagine nostr is hard to map without something like this.
How are you doing relay discovery?
You just ping this endpoint and will retrieve all online relays https://api.nostr.watch/v1/online
I’ll do some stuff with that data .. not with a specific plan but I have some ideas
For now I collected all my follows pubkeys, their follows and their follows’ follows which totaled around 30k pubkeys
I finished querying online relays and I was able to extract 45 days of all events from those pubkeys from about 200 relays .. there were some relays I couldn’t connect and some that didn’t have any data
I will now explore things like: how my posts propagates in relays I didn’t publish.. or what are the relays with most of the activity or what are the pubkeys in those 30k that make 80% of the activity .. etc etc
So if you have any idea happy to consider including it
I may.. if I have time .. clean that script and make it public so others can play around if they want to download about 5 gb of events :)
Que experimento andarás haciendo!
Curiosidad nada más para ver cómo se distribuye la info entre relays.. ver cómo se distribuyen los eventos entre usuarios y varias cosas más .. todavía no se bien.. solo estoy mandando un query de más o menos 30 mil pubkeys a más de 200 relays y le estoy pidiendo todo lo que tienen de los últimos 45 días