[英]Comparing Japanese Characters in C#
I am checking a Japanese string for spaces and replace them with "_". 我正在检查日语字符串的空格并用“_”替换它们。 This is what I am doing: 这就是我在做的事情:
string input1="abc dfg";
string input2="尾え れ";
if(input1.Contains(" "))
{
Console.WriteLine(input1.Replace(" ","_"));
}
Console.WriteLine("------------------");
if(input2.Contains(" "))
{
Console.WriteLine(input2.Replace(" ","_"));
}
Here is the Output on this code 这是此代码的输出
abc__dfg
------------------
It replaces spaces with "_" in the simple English string but in the Japanese string it does not. 它在简单的英文字符串中用“_”替换空格,但在日语字符串中则不是。
because the look-like-space in your input2
is not really a space , just check the ascii code of it 因为input2
的look-like-space不是真正的空格 ,只需检查它的ascii代码即可
Console.WriteLine(Convert.ToInt32(' ')); // output: 12288
Console.WriteLine(Convert.ToInt32(' ')); // output: 32
string input1 = "abc dfg";
string input2 = "尾え れ"; // a space
string input3 = "尾え れ"; // not a space
if (input1.Contains(" "))
{
Console.WriteLine(input1.Replace(" ", "_"));
}
Console.WriteLine("------------------");
if (input2.Contains(" "))
{
Console.WriteLine(input2.Replace(" ", "_"));
}
Console.WriteLine("------------------");
if (input3.Contains(" "))
{
Console.WriteLine(input3.Replace(" ", "_"));
}
@Ronan Thibaudau 's original explanation: @Ronan Thibaudau的原始解释:
Because it's not a space, it's not the same character, copy what you call a "space" from the input 2 string and paste it in the input2.replace method and it will work, it's just not the same character as the space you typed (even when i try to select it here on stackoverflow it's twice as large as the spaces in your input1 so it can't be the same character) 因为它不是一个空格,它不是同一个字符,从输入2字符串复制你所谓的“空格”并将其粘贴到input2.replace方法中它会起作用,它与你键入的空格不同(即使我尝试在stackoverflow上选择它,它的大小是input1中的空格的两倍,因此它不能是相同的字符)
If you don't want to worry yourself with ASCII code or copy-pasting characters you don't know how to expect, just do something like this: 如果您不想担心使用ASCII码或复制粘贴字符,您不知道如何期待,只需执行以下操作:
//using System.Linq;
string input1 = "abc dfg";
string input2 = "尾え れ";
if (input1.Any(char.IsWhiteSpace))
{
Console.WriteLine(new string(input1.Select(x=> char.IsWhiteSpace(x) ? '_' : x).ToArray()));
}
Console.WriteLine("------------------");
if (input2.Any(char.IsWhiteSpace))
{
Console.WriteLine(new string(input2.Select(x => char.IsWhiteSpace(x) ? '_' : x).ToArray()));
}
Most likely your Console font does not support and/or the (default) code page does not support the Japanese characters. 很可能您的Console字体不支持和/或(默认)代码页不支持日语字符。
Try 尝试
Console.WriteLine(Console.OutputEncoding.EncodingName);
Console.WriteLine(Console.OutputEncoding.CodePage);
Console.WriteLine(input2);
Debug.Write(input2);
for comparison. 为了比较。 Select a font and codepage which supports Japanese characters, eg 选择支持日文字符的字体和代码页,例如
Console.OutputEncoding = Encoding.UTF8;
In order to change the default codepage of your Console, please check this answer: Unicode characters in Windows command line - how? 要更改控制台的默认代码页,请检查以下答案: Windows命令行中的Unicode字符 - 如何操作?
Regarding the string itself: Copy/Paste your string 尾え れ
to this side: Unicode code converter . 关于字符串本身:将字符串尾え れ
复制/粘贴到这一边: Unicode代码转换器 。 The Unicode codepoints are U+5C3E U+3048 U+3000 U+308C
Unicode代码点是U+5C3E U+3048 U+3000 U+308C
U+3000
is an Ideographic Space , not the "normal" space U+0020
. U+3000
是一个表意空间 ,而不是“正常”空间U+0020
。
Use this piece of code for the second string and it will work . 将这段代码用于第二个字符串,它将起作用。 Tested and it is returning the correct output . 经过测试,它返回正确的输出。
if (input2.Contains(string.Empty))
{
string cleanedString = System.Text.RegularExpressions.Regex.Replace(input2, @"\s+", "_");
Console.WriteLine(cleanedString);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.