简体   繁体   中英

How do I remove invisible unicode characters from the beginning and the end of a string?

How do I in a reliable way remove invisible characters from the beginning and the end of a string? In my case, my string starts with a Left-to-Right Embedding [LRE] character. However, string.Trim() doesn't remove it, as you can see below:

var myString = "\u202atest";
myString.Trim();
// Prints:
// "‪test"
myString.Trim().ToCharArray();
// Prints:
// {char[5]}
//     [0]: 8234 '‪'
//     [1]: 116 't'
//     [2]: 101 'e'
//     [3]: 115 's'
//     [4]: 116 't'

Is there a function in the .NET Framework API that would trim all such characters? I assume there are more than this one, and I would like to avoid having to specify each one manually.

Invisible is ill-defined. A Unicode-compliant solution: characters in the regex class general categories [\\p{Control}\\p{Format}\\p{Nonspacing_Mark}\\p{Enclosing_Mark}\\p{Line_Separator}\\p{Paragraph_Separator}] have no display width. Replace them with nothing.

$ length "\x{202a}test" =~ s/[\p{Cc}\p{Cf}\p{Mn}\p{Me}\p{Zl}\p{Zp}]//r
4

In C#:

public static string RemoveCharactersWithoutDisplayWidth(this string str)
{
    var regex = new Regex(@"[\p{Cc}\p{Cf}\p{Mn}\p{Me}\p{Zl}\p{Zp}]");
    return regex.Replace(str, "");
}

You can try:

mystring = myString.Trim('\u202a');

If you have more similar characters to trim, you can define these characters as an array.

char[] trimChars = {'\u202a','\u202b'};//More chars as your wish
mystring = myString.Trim(trimChars);

You can try to analyze the bytes:

var s = "\u202atest";
string s2 = null;
byte[] bytes = new byte[s.Length * sizeof(char)];
Buffer.BlockCopy(s.ToCharArray(), 0, bytes, 0, bytes.Length);
if (bytes[0] == 0x2a && bytes[1] == 0x20)
{
    char[] c = new char[(bytes.Length - 2) / sizeof(char)];
    Buffer.BlockCopy(bytes, 2, c, 0, bytes.Length - 2);
    s2 = new string(c);
}
var c2 = s2.ToCharArray();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM