i have this code:
String s=" //wont won't won't ";
String[] w = s.split("[\\s+\\/,\\.!_\\-?;:]++");
i don't the ' to be removed from won't as it is part of the word. help would be appreciated but //wont i do want // to be removed.
so my question is the following- how do I utilize regex in java to get a certain punctuation not to be removed if its part of a word like "won't" where we have ' , but at the same time keep this-
"[\\s+\\/,\\.!_\\-?;:]++"
You can use
String[] w = s.split("[\\s+/,.!_\\-?;:]+|\\B'|'\\B");
See the regex demo . Details :
[\\s+/,.!_\\-?;:]+
- one or more whitespaces, +
, /
, ,
, .
, !
, _
, -
, ?
, ;
or :
|
- or \\B'
- '
that is at the start of string or immediately preceded with a non-word char |
- or '\\B
- '
that is at the end of string or immediately followed with a non-word char. See the Java demo :
String s =" //wont won't won't ";
String[] w = s.split("[\\s+/,.!_\\-?;:]+|\\B'|'\\B");
System.out.println(Arrays.toString(w));
// => [, wont, won't, won't]
You may get rid of the empty entries at the start if you remove all matches at the start of the string first:
String regex = "[\\s+/,.!_\\-?;:]+|\\B'|'\\B";
String[] w2 = s.replaceFirst("^(?:"+regex+")+", "").split(regex);
System.out.println(Arrays.toString(w2));
// => [wont, won't, won't]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.