[英]Remove unwanted unicode characters from string
I have looked at quite a number of related SO posts pertaining to this.我已经查看了很多与此相关的相关 SO 帖子。 I have this malformed string that contains unicode characters which I want to strip away.我有一个格式错误的字符串,其中包含 unicode 个字符,我想将其删除。
string testString = "\0\u0001\0\0\0����\u0001\0\0\0\0\0\0\0\u0011\u0001\0\0\0\u0004\0\0\0\u0006\u0002\0\0\0\u0005The\u0006\u0003\0\0\0\u0017boy\u0006\u0004\0\0\0\tKicked\u0006\u0005\0\0\0\u0013the Ball\v";
I would like the following output:我想要以下 output:
The boy kicked the Ball
How can I achieve this?我怎样才能做到这一点?
I have looked at the below (With not much success):我看过下面的(没有太大的成功):
testString = Regex.Replace(testString, @"[\ -\\ -\\Ā-\]", "");
要么
testString = Regex.Replace(testString, @"[^\\t\\r\\n -~]", "");
public string ReturnCleanASCII(string s)
{
StringBuilder sb = new StringBuilder(s.Length);
foreach (char c in s)
{
if ((int)c > 127) // you probably don't want 127 either
continue;
if ((int)c < 32) // I bet you don't want control characters
continue;
if (c == '%')
continue;
if (c == '?')
continue;
sb.Append(c);
}
return sb.ToString();
}
Try this:试试这个:
string s = "søme string";
s = Regex.Replace(s, @"[^\u0000-\u007F]+", string.Empty);
Hope it helps.希望能帮助到你。
为什么不尝试删除 unicode 字符,而是提取所有 ASCII 字符:
var str = string.Join(" ",new Regex("[ -~]+").Matches(testString).Select(m=>m.Value));
我使用这个正则表达式来过滤掉文件名中的坏字符。
Regex.Replace(directory, "[^a-zA-Z0-9\\:_\- ]", "")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.