Replying to Avatar DefiantDandelion

Ok here is my idea:

Within the client allow these rules to be modifiable in settings.

Then for each note, score the note across these dimensions which are proxy’s for notes that contain valuable information. +1 when the note passes the rule, 0 if it fails to pass. Where x is variable chosen by the user. With sane defaults chosen by you.

1. Number of unique character symbols over x

2. Message Length count on non-white space characters over x

3. Number of Likes over x

4. Subscribed (is the note from a npub that is on a predefined list that the user is interested in)

5. Zaps total over x

6. Count of unique words over x

7. Predefined list of hashtags of interest

8. Contains a link

9. Contains strings .jpg .png .gif

10. Number of zaps over x

11. Number of comments on the note

Then allow the user to set a point threshold max is 11 min is 0 to show the note otherwise it is hidden.

Once you have a client that can do this the next iteration would be to put a learning algorithm like xgboost or simpler in the client to train on how these rules are correlated to what the user likes or zaps or comments.

This enables custom weights for the rules. Instead of just a +1 for each pass.

Then just like above you score every note with the algorithm and allow the user to set a threshold.

Some UI ideas would be to allow the user to see the scores on the notes. And preview the feed so they can toggle things on and off and see how it changes the feed.

Allow reset to sane default.

Allow the toggling of all settings so the user is in charge of their algorithm.

Happy to discuss more if interested.

This is great thank you. I’ll try to create the filter mostly based on scanning the post content because fetching all the likes/zaps can become a bit intensive with many posts.

Reply to this note

Please Login to reply.

Discussion

Yeah I figured they would have different data and computational penalties. Even some of the scanning ones may be too much, for example one additional rule I left out or this could be a standalone alternative could be similarity to other notes. As the Levenshtein distance created from the notes after removing punctuation, white space, and making all letters uppercase. The distance matrix gives you the uniqueness from measuring against all other notes. Show only the most unique notes from the time window. But I figured that would be too heavy on cpu but perhaps not?

You could also do unique word comparisons with a similar weight matrix that might be faster?

Strip a note down to its set of unique words. Then compare the sets in a matrix of all notes compared to all other notes in the matrix, Count the words in common between two sets. Divide the count by the number of unique words or something else to normalize.

There are a lot of directions you could go in.