简体   繁体   中英

c# how to check that the string that the user input in the textbox is Chinese?

How to check that the string that the user input in the textbox is Chinese? Anyone can guide me?

Probably you need to use a simple statistical method. Count the number of characters whose code is within range of the Chinese UTF-8 symbols, and the number of characters that are not. Base your decision on whether one group is greater than the other.

Note, this will not work for people who would type in romanized Chinese. For that case, you should probably apply a dictionary count method to see how many English word matches there are. If most of the words do not match you could assume that it is not English.

If the input contains unicode characters in the range 4E00-9FFF, then it contains Chinese characters, so the language probably is Chinese, Japanese or Korean.

In order to guess whether it is Chinese, you might want to check whether some of the most frequent characters in Chinese language occur in the input (see for example http://www.zein.se/patrick/3000char.html ). Or alternatively, check whether Hiragana (3040–309F), Katakana (30A0–30FF), or Hangeul (1100–11FF) characters occur in the input; they only occur in Japanese and Korean; if they occur in the input, you do not have Chinese language text even though the text contains Chinese characters.

You can easily check to see if the code points used are Han ideographs. Those regions are defined in the Unicode character database .

// Warning, this code only works for common Han ideographs inside the BMP. (Surrogate code points will need special care, and additional ranges within the BMP contain rare, historic, and uncommon characters.)
const double hannessThreshold = 0.25d;
const char lowestHanCodepoint = '\u4E00';
const char highestHanCodepoint = '\u9FFF';
string text = myTextBox.Text;
int hanCharacterCount = 0;
foreach (char c in text)
    if (lowestHanCodepoint <= c && c <= highestHanCodepoint)
        hanCharacterCount++;
double hannessScore = (double)hanCharacterCount / text.Length;
if (hannessScore >= hannessThreshold)
    MessageBox.Show("You are typing in Chinese, Japanese, or Korean!");

However, this is not enought to determine if it is Chinese exactly. Unicode unifies the ideographs used for Chinese, Japanese, and Korean, so a linguistic analysis of some kind would be necessary to distinguish them.

More help could be provided if you told us why you want to do this. Perhaps some other approach would be better.

My guess would be to check the char-set being used, if they are Chinese character being input, I guess that would be Chinese. However it's a pretty hazy thing to check I suppose. What if the Chinese words are being written with the western alphabet? Not sure how else you'd check something like that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM