[bugfix] Extend parser to handle more non-Latin hashtags (#3700)

* Allow marks after NFC normalization

Includes regression test for the Tamil example from #3618

* Disallow just numbers + marks + underscore as hashtag
This commit is contained in:
Vyr Cossont 2025-01-31 02:42:55 -08:00 committed by GitHub
commit b9e0689359
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 48 additions and 37 deletions

View file

@ -177,7 +177,7 @@ func (p *hashtagParser) Parse(
// Ignore initial '#'.
continue
case !isPlausiblyInHashtag(r) &&
case !isPermittedInHashtag(r) &&
!isHashtagBoundary(r):
// Weird non-boundary character
// in the hashtag. Don't trust it.