Amethyst、あぁアメジスト、アメジスト
Discussion
↑ここでようやく日本語判定された。おそらく文字の割合を見ていて、英数字よりも日本語文字が多くないと lang=ja で出て来ない。
#[2] What does nostrich/universe collecting Japanese text with lang=ja ? In my short trying, it seems to be included when the number of Japanese characters is greater than the number of alphanumeric characters.
It uses FastText to do the classification. I think it depends on the training model.
Ah, I see. I'll switch the relay to relay-jp.nostr.wirednet.jp for the bot. Thank you.
The classification effect of long articles will be better, and short content needs to be pre-processed first.