I’m thinking I’ll work on some type of npub authenticity rating next. Don’t have it figured out yet, but planning to use some of these:

- how repetitive is their kind 1 content?

- Kind distribution

- Tag distribution (only replies vs only root posts, mentions, what is normal mix?)

- Age of account

- Age of oldest follow, age of newest follow and the delta

- Centralized NIP-05 provider w/ bad reputation

- Profile very similar to existing one (impersonators)

- Mute lists

- Kind 1984 reports (spam only)

- Link in every post? Known bad links?

Other ideas?

Reply to this note

Please Login to reply.

Discussion

Wow. That was pretty thorough.

You can also check the time delta on the replies. Bots usually reply too fast with a lot of text.

I started writing them down a few days ago. That’s a good one, speed of reply vs time of root post. Maybe even required words per minute to be able to make that reply!

As long as it doesn’t bias against new users too much.

I’m still trying to figure this out but that’s the goal. I’m trying to focus less on “popularity” metrics and more on “does this look like a normal user?”

- Replies or reactions from reputable or non reputable npubs

- Reputable Followers

I want to include these as well but it starts to turn the score in to a popularity rank vs authenticity and I want to make sure new users can still be “authentic”.

Probably just have to be careful how we weight things.

Also! Bots don’t sleep. Humans do.

YES! That’s a great one. Some type of daily/weekly time off measurement. Once we figure out what’s normal that should be a great bot indicator.

sidetrack a little - what are your thoughts on relay marketplace ? This marketplace will incorporate all relays including yours, and categorise them by free, private, community base, channel base etc. Each relays can have its features listed out - I remember you had many diff relays for diff purposes as well.

In this case, users gets to hop on to this marketplace, pick and choose the kind of "feel" they want to have on their nostr, based on the relays. Hence it indirectly helps users understand Nostr and build a real Nostr experience

For example

- if they want a more FB like/community based relays - maybe they pick a private based + clients like coracle.

- If they want twitter based perhaps free + private + clients like primal and damus.

- If they want Ig based maybe exclusive relays with influencers or fashion based with clients like Nostrgram or Iris with the image tiles.

For a relay marketplace developer this could benefit in terms of additional revenue as well as promoting the relay concept. Eventually we may have many relay marketplace it doesn't matter but for now it may help to subconsciously assist users on building their Nostr world

I'm just putting this thought out there maybe to you and nostr:npub1nlk894teh248w2heuu0x8z6jjg2hyxkwdc8cxgrjtm9lnamlskcsghjm9c if you have more suggestions and thoughts, or if anybody wants to build it - I'm sure there are pros and cons to this idea - I don't know who are the relay developers here anymore tbh

A relay marketplace sounds cool, perhaps it could go on a nostr informational site for new users (like nostr.how or similar).

If they’re interested, this could be a collaboration with Nostr.watch. They already have a great listing of relays and the ability to check latency to your location, etc.

could be useful for spam fighting but looks like a lot of work. in the future if the spammers get clever, more data would be needed yes.

That’s what I’d like to use it for yes. Right now both of our simple rules are still working 😁

I think you can also add some gibberish (non sense random words or random string) detection. I have seen some model or space in huggingface. This can be helpful for detecting non sophisticated spammer bot which use random char or random non sense words.

Good one, thank you. I looked in to this some a few months back when some kid was experimenting with randomly generated spam but never implemented anything.

It might be that we end up making this so complicated that it makes more sense to just label some data and train an own model instead of managing infinite rules. Fortunately, thats one of the many things nostr:npub1qlkwmzmrhzpuak7c2g9akvcrh7wzkd7zc7fpefw9najwpau662nqealf5y can handle if we get there.

Yes, probably you can make it as separate independent service (plugin) instead of rules for strfry policy. Especially if the process takes more than one second to classify one event content. Let the event comes first, process them in queue, and delete it later if it was detected as high probably gibberish spam.

Yes, i think you and nostr:npub1qlkwmzmrhzpuak7c2g9akvcrh7wzkd7zc7fpefw9najwpau662nqealf5y can manage that easily for those problem 🙂

Build a "originality" index, a "educational" index and a "positiveness" index. Originality should praise people with new types of views/content. Educational should prase people with not necessarily new views but with good delivery of such content. And positiveness should measure how grumpy posts are.

I would definitely subscribe to the grumpy feed.

Cool ideas. We can refine the topic classifier to help with these. Have a lot of work to do on that to improve accuracy and reduce cost.

What if instead of building it yourself, you offered a marketplace of algos that data scientists out there could create and run on your local db? They could them export their results as a paid DVM and paid you with those profits.

Cool in theory. In practice I don’t think anyone has ever even made a visualization in our elasticsearch so I’m skeptical how much participation I should expect 😂

If you build it they will come etc etc though so I’m open to it.

I use your elastic search :)

I just don't think you advertise it enough.

Also, they need to see the connection with money.

Imagine how many students could do a project in this if it was easy. Maybe an integration with Jupyter?

Anyway, I think there is an opportunity for relays to sell access to data to other devs that can't be bothered building a relay themselves.

I like that you think big. Honestly even with just elasticsearch I bet a savvy user could put together advanced queries (outside of kibana) that do a lot of cool things.

Maybe as a starting point I can come up with a simple way to expose the ES Search API directly (perhaps for a small fee) so that anyone can build more advanced queries than our search API/kibana/NIP-50.

Might need to beef up that machine though :)

is this the search api on wine or is there more?

There is a Kibana instance here as well for the more visually inclined: https://search.nostr.wine

Thank you. This is awesome 🤙