使用unicode从字符串中删除特殊字符

Question

我发现这个问题最受欢迎的答案是：

Regex.Replace(value, "[^a-zA-Z0-9]+", " ", RegexOptions.Compiled);

但是，如果用户在结算时键入非英语名称，此方法将考虑这些非特殊字符并将其删除。

因为我的网站是多语言的，所以我们可以为大多数用户构建。

Answer 1

使其识别Unicode：

var res = Regex.Replace(value, @"[^\p{L}\p{M}p{N}]+", " ");

如果您打算只保留常规数字，请保留[0-9] 。

正则表达式匹配除Unicode字母（ \\p{L} ），变音符号（ \\p{M} ）和数字（ \\p{N} ）之外的一个或多个符号。

您可以考虑var res = Regex.Replace(value, @"\\W+", " ") ，但它将保留_因为下划线是“单词”字符。

Answer 2

我发现自己，实现这一目标并使用所有语言的最佳方法是创建一个包含所有被禁字符的字符串，看看这段代码：

    string input = @"heya's #FFFFF , CUL8R M8 how are you?'"; // This is the input string
string regex = @"[!""#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~]"; //Banned characters string, add all characters you don´t want to be displayed here.


Match m;

while ((m = Regex.Match(input, regex)) != null)
{
    if (m.Success) 
        input = input.Remove(m.Index, m.Length);
    else // if m.Success is false: break, because while loop can be infinite
        break;
}
input = input.Replace("  ", " ").Replace("  "," "); //if string has two-three-four spaces together change it to one
MessageBox.Show(input);

希望它有效！

PS：正如其他人在这里发布的那样，还有其他方法。 但我个人更喜欢那个，即使它更多的代码。 选择您认为最适合您需要的那个。

使用unicode从字符串中删除特殊字符

问题描述

2 个解决方案

解决方案1
5 已采纳 2016-01-08 23:19:25

解决方案2
0 2016-01-08 23:18:42

使用unicode从字符串中删除特殊字符

问题描述

2 个解决方案

解决方案1 5 已采纳 2016-01-08 23:19:25

解决方案2 0 2016-01-08 23:18:42

解决方案1
5 已采纳 2016-01-08 23:19:25

解决方案2
0 2016-01-08 23:18:42