In C#, \\p{Han} matches both Chinese characters and Japanese hiragana and katakana. I want to distinguish between them, so what do I do? Turn each char into unicode then detect whether the character is in the range?
//For chinese chars
public bool IsChinese(string text)
{
return text.Any(c => c >= 0x20000 && c <= 0xFA2D);
}
//For japanese chars
private static IEnumerable<char> GetCharsInRange(string text, int min, int max)
{
return text.Where(e => e >= min && e <= max);
}
Usage:
var romaji = GetCharsInRange(searchKeyword, 0x0020, 0x007E);
var hiragana = GetCharsInRange(searchKeyword, 0x3040, 0x309F);
var katakana = GetCharsInRange(searchKeyword, 0x30A0, 0x30FF);
var kanji = GetCharsInRange(searchKeyword, 0x4E00, 0x9FBF);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.