简体   繁体   English

如何在C#中使用正则表达式匹配汉字而不匹配日语假名?

[英]How do I use regex in C# to match Chinese characters without also matching Japanese kana?

In C#, \\p{Han} matches both Chinese characters and Japanese hiragana and katakana. 在C#中,\\ p {Han}匹配汉字和日语的平假名和片假名。 I want to distinguish between them, so what do I do? 我想区分它们,那我该怎么办? Turn each char into unicode then detect whether the character is in the range? 将每个字符转换为unicode,然后检测字符是否在范围内?

//For chinese chars
public bool IsChinese(string text)
{
    return text.Any(c => c >= 0x20000 && c <= 0xFA2D);
}

//For japanese chars
private static IEnumerable<char> GetCharsInRange(string text, int min, int max)
{
    return text.Where(e => e >= min && e <= max);
}

Usage: 用法:

var romaji = GetCharsInRange(searchKeyword, 0x0020, 0x007E);
var hiragana = GetCharsInRange(searchKeyword, 0x3040, 0x309F);
var katakana = GetCharsInRange(searchKeyword, 0x30A0, 0x30FF);
var kanji = GetCharsInRange(searchKeyword, 0x4E00, 0x9FBF);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM