Offering many very specific labels - a vector database like Weaviate or FAISS coupled with an LLM can do it and do it well, but man... We are talking datacenter level of resources here.

Not many organisations can offer that, and I'm not sure I want them curating my feed.

Offering a couple of dozen general categories though, that we could run in the client or on a modest VPS, with maybe a BM25 fulltext search for specific terms (will be slow).

Reply to this note

Please Login to reply.

Discussion

General categories we can do with a BoW filter fed forward into a modest CNN.

Specific categories need a serious LLM and vector database of context, or else accuracy will be hilarious

We can run multiple bots, each using a different pubkey. People will decide to follow whatever works best for them. We could have multiple algorithms running in parallel.

We could. There will be a limited number of actors able to finance such a service, however.

If I may make a suggestion, we could run the model in the client, using notes already downloaded.

Default topic model downloaded on first run, or be bundled with app (~50 MB).

Menu of topic models somewhere in settings:

- L.I.V's Mad Science Topic Model

- Leserin's Overthinking Everyday Topic Model

- Onyx's I Know What Boys Like topic model

Etc.

Building a model is less of a commitment than hosting one, and the processing is offloaded to the client instead of angel funding or whatever

You can also put the labels behind a private relay only paying customers can access.

That works, too