I found this question but it removes all valid utf-8
characters also (returns me a blank string, while there are valid utf-8
characters plus control characters). As I read about utf-8
, there's not a specific range for control characters
and each character set has its own control characters
.
How can I modify above solution to only remove control characters
?
This is how I roll:
Regex.Replace(evilWeirdoText, @"[\u0000-\u001F]", string.Empty)
This strips out all the first 31 control characters. The next hex value up from \ is \ AKA the space. Everything before space is all the line feed and null nonsense.
To believe me on the characters: http://donsnotes.com/tech/charsets/ascii.html
I think the following code will work for you:
public static string RemoveControlCharacters(string inString)
{
if (inString == null) return null;
StringBuilder newString = new StringBuilder();
char ch;
for (int i = 0; i < inString.Length; i++)
{
ch = inString[i];
if (!char.IsControl(ch))
{
newString.Append(ch);
}
}
return newString.ToString();
}
If you plan to use the string as a query string, you should consider using the Uri.EscapeUriString()
or Uri.EscapeDataString()
before sending it out. Note: You might still need to pull out anything from char.IsControl() first?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.