简体   繁体   English

比较C#中的日文字符

[英]Comparing Japanese Characters in C#

I am checking a Japanese string for spaces and replace them with "_". 我正在检查日语字符串的空格并用“_”替换它们。 This is what I am doing: 这就是我在做的事情:

string input1="abc  dfg";
string input2="尾え れ";
if(input1.Contains(" "))
{
  Console.WriteLine(input1.Replace(" ","_"));
}
Console.WriteLine("------------------");
if(input2.Contains(" "))
{
  Console.WriteLine(input2.Replace(" ","_"));
}

Here is the Output on this code 这是此代码的输出

abc__dfg
------------------

It replaces spaces with "_" in the simple English string but in the Japanese string it does not. 它在简单的英文字符串中用“_”替换空格,但在日语字符串中则不是。

because the look-like-space in your input2 is not really a space , just check the ascii code of it 因为input2的look-like-space不是真正的空格 ,只需检查它的ascii代码即可

Console.WriteLine(Convert.ToInt32(' ')); // output: 12288
Console.WriteLine(Convert.ToInt32(' ')); // output: 32

string input1 = "abc  dfg";
string input2 = "尾え れ"; // a space
string input3 = "尾え れ"; // not a space
if (input1.Contains(" "))
{
    Console.WriteLine(input1.Replace(" ", "_"));
}
Console.WriteLine("------------------");
if (input2.Contains(" "))
{
    Console.WriteLine(input2.Replace(" ", "_"));
}
Console.WriteLine("------------------");
if (input3.Contains(" "))
{
    Console.WriteLine(input3.Replace(" ", "_"));
}

@Ronan Thibaudau 's original explanation: @Ronan Thibaudau的原始解释:

Because it's not a space, it's not the same character, copy what you call a "space" from the input 2 string and paste it in the input2.replace method and it will work, it's just not the same character as the space you typed (even when i try to select it here on stackoverflow it's twice as large as the spaces in your input1 so it can't be the same character) 因为它不是一个空格,它不是同一个字符,从输入2字符串复制你所谓的“空格”并将其粘贴到input2.replace方法中它会起作用,它与你键入的空格不同(即使我尝试在stackoverflow上选择它,它的大小是input1中的空格的两倍,因此它不能是相同的字符)

If you don't want to worry yourself with ASCII code or copy-pasting characters you don't know how to expect, just do something like this: 如果您不想担心使用ASCII码或复制粘贴字符,您不知道如何期待,只需执行以下操作:

//using System.Linq;
string input1 = "abc  dfg";
string input2 = "尾え れ";
if (input1.Any(char.IsWhiteSpace))
{

   Console.WriteLine(new string(input1.Select(x=> char.IsWhiteSpace(x) ? '_' : x).ToArray()));
}
Console.WriteLine("------------------");
if (input2.Any(char.IsWhiteSpace))
{

     Console.WriteLine(new string(input2.Select(x => char.IsWhiteSpace(x) ? '_' : x).ToArray()));
}

Most likely your Console font does not support and/or the (default) code page does not support the Japanese characters. 很可能您的Console字体不支持和/或(默认)代码页不支持日语字符。

Try 尝试

     Console.WriteLine(Console.OutputEncoding.EncodingName);
     Console.WriteLine(Console.OutputEncoding.CodePage);
     Console.WriteLine(input2);
     Debug.Write(input2);

for comparison. 为了比较。 Select a font and codepage which supports Japanese characters, eg 选择支持日文字符的字体和代码页,例如

Console.OutputEncoding = Encoding.UTF8;

In order to change the default codepage of your Console, please check this answer: Unicode characters in Windows command line - how? 要更改控制台的默认代码页,请检查以下答案: Windows命令行中的Unicode字符 - 如何操作?

Regarding the string itself: Copy/Paste your string 尾え れ to this side: Unicode code converter . 关于字符串本身:将字符串尾え れ复制/粘贴到这一边: Unicode代码转换器 The Unicode codepoints are U+5C3E U+3048 U+3000 U+308C Unicode代码点是U+5C3E U+3048 U+3000 U+308C

U+3000 is an Ideographic Space , not the "normal" space U+0020 . U+3000是一个表意空间 ,而不是“正常”空间U+0020

Use this piece of code for the second string and it will work . 将这段代码用于第二个字符串,它将起作用。 Tested and it is returning the correct output . 经过测试,它返回正确的输出。

    if (input2.Contains(string.Empty))
    {
        string cleanedString = System.Text.RegularExpressions.Regex.Replace(input2, @"\s+", "_");
        Console.WriteLine(cleanedString);
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM