なんか universe に日本語判定されてないな。なんでや。

Reply to this note

Please Login to reply.

Discussion

Amethyst、あぁAmethyst、Amethyst

↑これも日本語判定されてない。

Amethyst、あぁAmethyst、あめじすと

Amethyst、あぁアメジスト、アメジスト

↑ここでようやく日本語判定された。おそらく文字の割合を見ていて、英数字よりも日本語文字が多くないと lang=ja で出て来ない。

#[2] What does nostrich/universe collecting Japanese text with lang=ja ? In my short trying, it seems to be included when the number of Japanese characters is greater than the number of alphanumeric characters.

It uses FastText to do the classification. I think it depends on the training model.

Ah, I see. I'll switch the relay to relay-jp.nostr.wirednet.jp for the bot. Thank you.

The classification effect of long articles will be better, and short content needs to be pre-processed first.