First of all,i want to remove all punctuations of a String.I wrote the following code.
Pattern pattern = Pattern.compile("\\p{Punct}");
Matcher matcher = pattern.matcher("!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~(hello)");
if (matcher.find())
System.out.println(matcher.replaceAll(""));
after repalcement i got the output: (hello)
so the pattern matches the One of,"#$%&'()*+.-:/;?<=>:@[]^_`{|}~ which is in accord with the official Docs:https.//docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
But i want to remove "(" Fullwidth Left Parenthesis U+FF08*
and ")" Fullwidth Right Parenthesis U+FF09
as well,so i change my code to this:
Pattern pattern = Pattern.compile("(?U)\\p{Punct}");
Matcher matcher = pattern.matcher("!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~()");
if (matcher.find())
System.out.println(matcher.replaceAll(""));
after repalcement i got the output: $+<=>^`|~
The matcher indeed match "(" Fullwidth Left Parenthesis U+FF08*
and ")" Fullwidth Right Parenthesis U+FF09
But miss $+<=>^`|~
I am so confused why did that happen? Can anyone give some help? Thanks in advance!
Unicode (that is when you use (?U)
) and POSIX (when not using (?U)
) disagrees on what counts as a punctuation.
When you don't use (?U)
, \p{Punct}
matches the POSIX punctuation character class , which is just
!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
When you use (?U)
, \p{Punct}
matches the Unicode Punctuation category , which does not include some of the characters in the above list, namely:
$+<=>^`|~
For example, the Unicode category for $
is "Symbol, Currency", or Sc. See here .
If you want to match $+<=>^`|~, plus all the Unicode punctuations, you can put them both in a character class. You can also just directly use the Unicode category "P", rather than turning on Unicode mode with (?U)
.
Pattern pattern = Pattern.compile("[\\p{P}$+<=>^`|~]");
Matcher matcher = pattern.matcher("!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~()");
// you don't need "find" first
System.out.println(matcher.replaceAll(""));
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.