Nostr Web Client

#[10] see above notes on non-latin alphabet hashtags

I spent a non-trivial amount of time unsuccessfully trying to solve it in Damus by extending the hashtag parsing code to pull in the icu4c library to leverage multilocale Unicode parsing. The API was too confusing so I stopped. Happy to have someone else try. It’s possible my approach isn’t the best.

Reply to this note

Please Login to reply.

Discussion

Vivek 2y ago 💬 1

Hey nostr:npub1yaul8k059377u9lsu67de7y637w4jtgeuwcmh5n7788l6xnlnrgs3tvjmf

The following solution may possibly work. I am not an expert in iOS app development (in fact never done it). So, please take what I say with a pinch of salt :D

The call flow in damus goes like this from my preliminary analysis of the code on Github 👇

```

damus.c :: parse_hashtags ==>

cursor.h :: consume_until_boundary ==>

cursor.h :: is_boundary

```

I believe the fix is to change how the `is_boundary` function is implemented in damus.

When we look at the implementation of hashtags from the other clients, the regular expression based matching does not have any check for alphanumerics, but instead just a set of 'prohibited characters' (a black list)

Amethyst regex is this:

```java

"#([^\\s!@#\$%^&*()=+./,\\[{\\]};:'\"?><]+)(.*)"

```

Snort regex is this:

```js

/(#[^\s!@#$%^&*()=+.\/,\[{\]};:'"?><]+)/

```

Both of these regexes try to just write a blacklist of chars that can't be there in a hashtag, instead of whitelisting alphanumerics.

If, in c code, we can come up with a function that will check for blacklisted characters and return a boolean, we could use this to implement international hashtags in damus too.

Not sure if my theory is correct.

Terry Yiu 2y ago 💬 1

I could get behind that idea. Feel free to take a stab at it.

Vivek 2y ago

Unfortunately i have no setup for iOS dev. Can anyone who has the setup take it?

Montzstar 2y ago 💬 1

Maybe Apple’s Natural Language framework could help here. It can tokenize text in many languages and determine word boundaries.

https://developer.apple.com/documentation/naturallanguage

Terry Yiu 2y ago

I had a conversation with #[11] a while ago about this problem and he asserts that Swift parsing code is slow on large notes, which is why it's done in C. nostr:note1r30jgsyepxu7he7zcxr703s7tdpvgeksurl7taldp0r5n5q2v8tql5x7kq