I have a question similar to How to split a string, but also keep the delimiters? . How would I split a String using a regex, keeping some types of delimiters, but not others? Specifically, I want to keep the non-whitespace delimiters, but not the whitespace delimiters.
To make this concrete:
"a;b c" | ["a", ";", "b", "c"]
"a; ; bb c ;d" | ["a", ";", ";", "bb", "c", ";", "d"]
Can this be done cleanly with a regex, and if so how?
Right now I'm working around this by splitting on the character to keep, and then again on the other one. I can stick with this approach if the regex cannot do so, or cannot do so cleanly:
Arrays.stream(input.split("((?<=;)|(?=;))"))
.flatMap(s -> Arrays.stream(s.split("\\s+")))
.filter(s -> !s.isEmpty())
.toArray(String[]::new); // In practice, I would generally use .collect(Collectors.toList()) instead
You can do it this way:
System.out.println(String.join("-", "a; ; b c ;d".split("(?!\\G) *(?=;)|(?<=;) *| +")));
details:
(?!\\G) # not contiguous to a previous match and not at the start of the string
[ ]* # optional spaces
(?=;) # followed by a ;
| # OR
(?<=;) # preceded by a ;
[ ]* # optional spaces
| # OR
[ ]+ # several spaces
Feel free to change the literal space to \\\\s
. To avoid an empty item (at the beginning of the resulting array when the string starts with a whitespace) , you need to trim the string first.
Obviously, without the constraint of splitting, @alphabravo way is the most simple.
I found a regex that works:
(\\s+)|((?<=;)(?=\\S)|(?<=\\S)(?=;))
public static void main(String argss[]){
System.out.println(Arrays.toString("a; ; b c ;d"
.split("(\\s+)|((?<=;)(?=\\S)|(?<=\\S)(?=;))")));
}
Will print out:
[a, ;, ;, b, c, ;, d]
您想要在空格上或在字母和非字母之间拆分:
str.split("\\s+|(?<=\\w)(?=\\W)|(?<=\\W)(?=\\w)");
After realizing Java doesn't support adding captured split char's to the
split array elements, thought I'd try a split solution without that
capability.
Basically there are only 4 permutations involving whitespace and the colon.
Finally, there is just the whitespace.
Here is the regex.
Raw: \\s+(?=;)|(?<=;)\\s+|(?<!\\s)(?=;)|(?<=;)(?!\\s)|\\s+
Stringed: "\\\\s+(?=;)|(?<=;)\\\\s+|(?<!\\\\s)(?=;)|(?<=;)(?!\\\\s)|\\\\s+"
And the expanded regex with permutation's explained.
Good luck!
\s+ # Required, suck up wsp before ;
(?= ; ) # ;
| # or,
(?<= ; ) # ;
\s+ # Required, suck up wsp after ;
| # or,
(?<! \s ) # No wsp before ;
(?= ; ) # ;
| # or,
(?<= ; ) # ;
(?! \s ) # No wsp after ;
| # or,
\s+ # Required wsp
Edit
To stop a split on whitespace at BOS, use this regex.
Raw: \\s+(?=;)|(?<=;)\\s+|(?<!\\s)(?=;)|(?<=;)(?!\\s)|(?<!^)(?<!\\s)\\s+
Stringed: "\\\\s+(?=;)|(?<=;)\\\\s+|(?<!\\\\s)(?=;)|(?<=;)(?!\\\\s)|(?<!^)(?<!\\\\s)\\\\s+"
Explained:
\s+ # Required, suck up wsp before ;
(?= ; ) # ;
| # or,
(?<= ; ) # ;
\s+ # Required, suck up wsp after ;
| # or,
(?<! \s ) # No wsp before ;
(?= ; ) # ;
| # or,
(?<= ; ) # ;
(?! \s ) # No wsp after ;
| # or,
(?<! ^ ) # No split of wsp at BOS
(?<! \s )
\s+ # Required wsp
Borrowing @CasimiretHippolyte \\G
trick you may want to split on
\\s+|(?!\\G)()
Note: no delimiters are specified.
Based on avoiding split on very first spaces:
(?m)(?<!^|\\s)(\\s+|)(?!$)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.