[英]How do I remove invisible unicode characters from the beginning and the end of a string?
How do I in a reliable way remove invisible characters from the beginning and the end of a string? 如何以可靠的方式删除字符串开头和结尾的不可见字符? In my case, my string starts with a Left-to-Right Embedding [LRE] character.
就我而言,我的字符串以从左到右嵌入[LRE]字符开头。 However,
string.Trim()
doesn't remove it, as you can see below: 但是,
string.Trim()
不会将其删除,如下所示:
var myString = "\u202atest";
myString.Trim();
// Prints:
// "test"
myString.Trim().ToCharArray();
// Prints:
// {char[5]}
// [0]: 8234 ''
// [1]: 116 't'
// [2]: 101 'e'
// [3]: 115 's'
// [4]: 116 't'
Is there a function in the .NET Framework API that would trim all such characters? .NET Framework API中是否有功能可以修剪所有此类字符? I assume there are more than this one, and I would like to avoid having to specify each one manually.
我认为不止一个,而且我想避免必须手动指定每个。
Invisible is ill-defined. 不可见是不明确的。 A Unicode-compliant solution: characters in the regex class general categories
[\\p{Control}\\p{Format}\\p{Nonspacing_Mark}\\p{Enclosing_Mark}\\p{Line_Separator}\\p{Paragraph_Separator}]
have no display width. 一种符合Unicode的解决方案:regex类常规类别
[\\p{Control}\\p{Format}\\p{Nonspacing_Mark}\\p{Enclosing_Mark}\\p{Line_Separator}\\p{Paragraph_Separator}]
字符没有显示宽度。 Replace them with nothing. 一无所有。
$ length "\x{202a}test" =~ s/[\p{Cc}\p{Cf}\p{Mn}\p{Me}\p{Zl}\p{Zp}]//r
4
In C#: 在C#中:
public static string RemoveCharactersWithoutDisplayWidth(this string str)
{
var regex = new Regex(@"[\p{Cc}\p{Cf}\p{Mn}\p{Me}\p{Zl}\p{Zp}]");
return regex.Replace(str, "");
}
You can try: 你可以试试:
mystring = myString.Trim('\u202a');
If you have more similar characters to trim, you can define these characters as an array. 如果要修剪更多类似的字符,可以将这些字符定义为数组。
char[] trimChars = {'\u202a','\u202b'};//More chars as your wish
mystring = myString.Trim(trimChars);
You can try to analyze the bytes: 您可以尝试分析字节:
var s = "\u202atest";
string s2 = null;
byte[] bytes = new byte[s.Length * sizeof(char)];
Buffer.BlockCopy(s.ToCharArray(), 0, bytes, 0, bytes.Length);
if (bytes[0] == 0x2a && bytes[1] == 0x20)
{
char[] c = new char[(bytes.Length - 2) / sizeof(char)];
Buffer.BlockCopy(bytes, 2, c, 0, bytes.Length - 2);
s2 = new string(c);
}
var c2 = s2.ToCharArray();
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.