简体   繁体   中英

C# regex remove special characters but leave alphanumerics

return Regex.Replace(source, "[^a-zA-Z0-9% @$\"!#%&'()*+,./:;<>=?-]", string.Empty);

如何也允许ČĆŽPŠĐ等字符或带有变音符的德语字符,...

You could use character classes as shown here .

A simplified version of your code: new Regex("[^\\\\p{L}0-9 ]").Replace("this is a test ČĆŽPŠĐ ä 244 $%^&*", String.Empty);

This Yields: this is a test ČĆŽPŠĐ ä 244 .

\\\\p{L} in this case denote the character sets which marks letters across different languages.

Just try this :

Regex.Replace(source, "[^a-zA-Z0-9% @$\"!#%&'()*+,./:;<>=?\u0100-\u01FF-]", string.Empty);

German is : U+0100 -> U+01FF

'Č', 'Ć', 'Ž', 'Š' and 'Đ' are all part of the Unicode category 'Letter, Uppercase'. You can use \\p{..} to match against characters from a specific Unicode category, where .. is the (short) category name.

For example, \\p{Lu} matches all characters in the 'Letter, Uppercase' category, and \\p{Ll} matches all characters in the 'Letter, Lowercase' category.

So just replace az with \\p{Ll} and AZ with \\p{Lu} (just make sure you escape the \\ , or use a verbatim string literal for your expression).

See http://msdn.microsoft.com/en-us/library/20bw873z%28v=vs.110%29.aspx#CategoryOrBlock for more information, and http://msdn.microsoft.com/en-us/library/20bw873z%28v=vs.110%29.aspx#SupportedUnicodeGeneralCategories for all supported categories.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM