C# regex remove special characters but leave alphanumerics

Question

return Regex.Replace(source, "[^a-zA-Z0-9% @$\"!#%&'()*+,./:;<>=?-]", string.Empty);

如何也允许ČĆŽPŠĐ等字符或带有变音符的德语字符，...

Answer 1

You could use character classes as shown here .

A simplified version of your code: new Regex("[^\\\\p{L}0-9 ]").Replace("this is a test ČĆŽPŠĐ ä 244 $%^&*", String.Empty);

This Yields: this is a test ČĆŽPŠĐ ä 244 .

\\\\p{L} in this case denote the character sets which marks letters across different languages.

Answer 2

Just try this :

Regex.Replace(source, "[^a-zA-Z0-9% @$\"!#%&'()*+,./:;<>=?\u0100-\u01FF-]", string.Empty);

German is : U+0100 -> U+01FF

Answer 3

'Č', 'Ć', 'Ž', 'Š' and 'Đ' are all part of the Unicode category 'Letter, Uppercase'. You can use \\p{..} to match against characters from a specific Unicode category, where .. is the (short) category name.

For example, \\p{Lu} matches all characters in the 'Letter, Uppercase' category, and \\p{Ll} matches all characters in the 'Letter, Lowercase' category.

So just replace az with \\p{Ll} and AZ with \\p{Lu} (just make sure you escape the \\ , or use a verbatim string literal for your expression).

See http://msdn.microsoft.com/en-us/library/20bw873z%28v=vs.110%29.aspx#CategoryOrBlock for more information, and http://msdn.microsoft.com/en-us/library/20bw873z%28v=vs.110%29.aspx#SupportedUnicodeGeneralCategories for all supported categories.

C# regex remove special characters but leave alphanumerics

Question

3 answers

solution1
2 2014-12-31 12:42:07

solution2
2 2014-12-31 12:44:58

solution3
1 2014-12-31 12:44:13

C# regex remove special characters but leave alphanumerics

Question

3 answers

solution1 2 2014-12-31 12:42:07

solution2 2 2014-12-31 12:44:58

solution3 1 2014-12-31 12:44:13

solution1
2 2014-12-31 12:42:07

solution2
2 2014-12-31 12:44:58

solution3
1 2014-12-31 12:44:13