return Regex.Replace(source, "[^a-zA-Z0-9% @$\"!#%&'()*+,./:;<>=?-]", string.Empty);
如何也允许ČĆŽPŠĐ等字符或带有变音符的德语字符,...
You could use character classes as shown here .
A simplified version of your code: new Regex("[^\\\\p{L}0-9 ]").Replace("this is a test ČĆŽPŠĐ ä 244 $%^&*", String.Empty);
This Yields: this is a test ČĆŽPŠĐ ä 244
.
\\\\p{L}
in this case denote the character sets which marks letters across different languages.
Just try this :
Regex.Replace(source, "[^a-zA-Z0-9% @$\"!#%&'()*+,./:;<>=?\u0100-\u01FF-]", string.Empty);
German is : U+0100
-> U+01FF
'Č', 'Ć', 'Ž', 'Š' and 'Đ' are all part of the Unicode category 'Letter, Uppercase'. You can use \\p{..}
to match against characters from a specific Unicode category, where ..
is the (short) category name.
For example, \\p{Lu}
matches all characters in the 'Letter, Uppercase' category, and \\p{Ll}
matches all characters in the 'Letter, Lowercase' category.
So just replace az
with \\p{Ll}
and AZ
with \\p{Lu}
(just make sure you escape the \\
, or use a verbatim string literal for your expression).
See http://msdn.microsoft.com/en-us/library/20bw873z%28v=vs.110%29.aspx#CategoryOrBlock for more information, and http://msdn.microsoft.com/en-us/library/20bw873z%28v=vs.110%29.aspx#SupportedUnicodeGeneralCategories for all supported categories.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.