Makes sense!

Next annoying question:

Which topics do new users / all users want available to select? How many is too many?

I know you run a "new user" questionnaire - how many responses have you been getting, and do you think they're close to representative of the userbase now / in the future?

Reply to this note

Please Login to reply.

Discussion

There are lots of topics (millions?) in the replies. Most people want a mix between very general things like "science" when they don't know much about it all they way to specific things like "sha256" when they are in that field. But we can make it work with whatever the bot outputs.

Maybe the bot can add the 5 most representative labels for each post?

Offering many very specific labels - a vector database like Weaviate or FAISS coupled with an LLM can do it and do it well, but man... We are talking datacenter level of resources here.

Not many organisations can offer that, and I'm not sure I want them curating my feed.

Offering a couple of dozen general categories though, that we could run in the client or on a modest VPS, with maybe a BM25 fulltext search for specific terms (will be slow).

General categories we can do with a BoW filter fed forward into a modest CNN.

Specific categories need a serious LLM and vector database of context, or else accuracy will be hilarious

We can run multiple bots, each using a different pubkey. People will decide to follow whatever works best for them. We could have multiple algorithms running in parallel.

We could. There will be a limited number of actors able to finance such a service, however.

If I may make a suggestion, we could run the model in the client, using notes already downloaded.

Default topic model downloaded on first run, or be bundled with app (~50 MB).

Menu of topic models somewhere in settings:

- L.I.V's Mad Science Topic Model

- Leserin's Overthinking Everyday Topic Model

- Onyx's I Know What Boys Like topic model

Etc.

Building a model is less of a commitment than hosting one, and the processing is offloaded to the client instead of angel funding or whatever

You can also put the labels behind a private relay only paying customers can access.

That works, too