I want to match any special characters that are not numbers or letters (that people use to write words). I want to include underscore because underscore is neither a number nor a letter that is used in words. But I do not want to include space.
In short, I want to match everyone below except the last two.
12345_678
12345*678
12345-678
12345&678
12345-678
12345あ678
12345 678
I could not use [^a-zA-Z0-9]
because it does not include non-Latin letters such as Japanese. \\d+(\\W|_)\\d+
got the unwanted space. What would be the best regular expression for this?
使用以下也忽略日语字母:
[^a-zA-Z\d\s-ゟ゠-ヿ一-龯]
The following regex will match any character that is neither an alphanumeric character (including characters of different alphabets such as those used in Japan or Korea) nor a space.
([^\w ]|_)
Note the alteration explicitly matching the underscore character, which is necessary since the underscore is part of the \\w character class and thus would not be matched by [^\\w ]
alone. (Also note that the pattern possesses a space character after \\w)
If not just simple space characters but any other white-space characters (such as the tab character, for example) should be excluded from the match, too, then the following slightly modified pattern might be more appropriate:
([^\w\s]|_)
You may want to look at Unicode Character Categories . It seems that you need to match for Symbols and Punctuation .
var regexPattern = @"[\p{S}\p{P}]";
Symbols include +, -, =, <, $, ^, ¦, § etc
Punctuation include _, —, (, {, ", », !, ?, #, * etc
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.