简体   繁体   中英

Match any special characters (including underscore, but not space) that are not letters

I want to match any special characters that are not numbers or letters (that people use to write words). I want to include underscore because underscore is neither a number nor a letter that is used in words. But I do not want to include space.

In short, I want to match everyone below except the last two.

12345_678
12345*678
12345-678
12345&678
12345-678
12345あ678
12345 678

I could not use [^a-zA-Z0-9] because it does not include non-Latin letters such as Japanese. \\d+(\\W|_)\\d+ got the unwanted space. What would be the best regular expression for this?

使用以下也忽略日语字母:

[^a-zA-Z\d\s぀-ゟ゠-ヿ一-龯]

The following regex will match any character that is neither an alphanumeric character (including characters of different alphabets such as those used in Japan or Korea) nor a space.

([^\w ]|_)

Note the alteration explicitly matching the underscore character, which is necessary since the underscore is part of the \\w character class and thus would not be matched by [^\\w ] alone. (Also note that the pattern possesses a space character after \\w)

If not just simple space characters but any other white-space characters (such as the tab character, for example) should be excluded from the match, too, then the following slightly modified pattern might be more appropriate:

([^\w\s]|_)


( See here for an example of the latter pattern in action on regexstorm.net, including Hiragana and Hangul characters )

You may want to look at Unicode Character Categories . It seems that you need to match for Symbols and Punctuation .

var regexPattern = @"[\p{S}\p{P}]";

Symbols include +, -, =, <, $, ^, ¦, § etc

Punctuation include _, —, (, {, ", », !, ?, #, * etc

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM