I have a problem regex matching an upper case letter possibly followed by a lower case letter. I want to break after any such matches, but I just can't seem to get it to work.
To make it more general - I want to split before and after any matches in regex.
Example string "TeSTString"
Wanted result -> [Te, S, T, St, ring]
I have tried anything I can think of, but I'm getting tricked by look-ahead or behind.
First I tried [AZ][az]?
, and that matches perfect, but removes it...
result -> [ring]
after this I did positive look-ahead (?=([AZ][az]?))
giving me something close...
result -> [Te, S, T, String]
and look-behind (<=?([AZ][az]?))
giving nothing at all...
result -> [TeSTString]
even tried reversing the look-behind (<=?([az]?[AZ]))
, in a desperate attempt, but this was fairly unsuccessful.
Can anyone give a good pointer in the right direction before I lose my mind?
Here's one convoluted pattern that will match the expected result.
String test = "TeSTStringOne";
System.out.println(
Arrays.toString(
// | preceded by lowercase
// | | followed by uppercase
// | | | or
// | | || preceded and followed by uppercase
// | | || | or
// | | || || preceded by uc
// | | || || AND lowercase
test.split("(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z])|(?<=[A-Z][a-z])")
)
);
Output
[Te, S, T, St, ring, On, e]
Note
Replace [az]
with \\\\p{Ll}
and [AZ]
with \\\\p{Lu}
to use with accented letters.
Try with:
(?<=[A-Z][a-z])|(?=(?<!^)[A-Z])
(?<=[AZ][az])
= positive lookbehind for upper case followed by lower case, (?=(?<!^)[AZ])
- positive lookahead for upper case, if not preceded by beginnig of a line,
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.