简体   繁体   English

java字符串拆分正则表达式保留定界符

[英]java string split regular expression retain delimiter

Give an input string such as 输入一个输入字符串,例如

"abbbcaababbbcaaabbca"

I want to split such a string into an array of groups " bca " " ab " " a " and " b " 我想将这样的字符串分成组“ bca ”,“ ab ”,“ a ”和“ b ”的数组

So the above example would return 所以上面的例子会返回

"ab", "b", "bca", "ab", "ab", "b", "bca", "a", "ab", "bca".

I have a 29 line piece of code of nested loops that accomplish this task (returns ArrayList). 我有29行嵌套循环的代码来完成此任务(返回ArrayList)。 However, it would be nice to get this done with a one line regular expression. 但是,最好使用一行正则表达式来完成此操作。

Can this task be accomplished using the following method? 可以使用以下方法完成此任务吗?

stringVar.split("regEX") 

Not an one-liner, but you can do it using Matcher.find with a loop.: 不是Matcher.find ,但是您可以使用Matcher.find和循环来实现:

ArrayList<String> result = new ArrayList<String>();
String s = "abbbcaababbbcaaabbca";
Matcher m = Pattern.compile("bca|ab|a|b").matcher(s);
while (m.find())
    result.add(m.group());

DEMO 演示

It can be accomplished using lookaround assertions , but @falsetru's answer is preferred over split ting. 可以使用环视断言来实现,但是@falsetru的答案比split ting更可取。

String[] ss = "abbbcaababbbcaaabbca".split("(?<=bca|ab)|(?<=a(?=ab))|(?<=b(?=bca))");
System.out.println(Arrays.toString(ss)); //=> [ab, b, bca, ab, ab, b, bca, a, ab, bca]

If the string contains letters only, you could shorten this using a backreference. 如果字符串仅包含字母,则可以使用后向引用来缩短它。

String[] ss = "abbbcaababbbcaaabbca".split("(?<=bca|ab)|(?<=(.)(?=\\1))")

It looks like you are trying to split between identical characters. 您似乎正在尝试在相同的字符之间进行拆分。 In that case you can use 在这种情况下,您可以使用

stringVar.split("(?<=(\\w))(?=\\1)") 

but it will result in ab, b, bca, abab, b, bca, a, ab, bca , which means that abab will not be split. 但这将导致ab, b, bca, abab, b, bca, a, ab, bca ,这意味着abab将不会被分割。

If you want you can manually add case where you can decide that after ab or bca you also want to split via 如果需要,可以手动添加大小写,以便可以确定在abbca您还希望通过

stringVar.split("(?<=(\\w))(?=\\1)|(?<=ab|bca)") 

which now will return ab, b, bca, ab, ab, b, bca, a, ab, bca 现在将返回ab, b, bca, ab, ab, b, bca, a, ab, bca

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM