[英]java string split regular expression retain delimiter
Give an input string such as 输入一个输入字符串,例如
"abbbcaababbbcaaabbca"
I want to split such a string into an array of groups " bca
" " ab
" " a
" and " b
" 我想将这样的字符串分成组“ bca
”,“ ab
”,“ a
”和“ b
”的数组
So the above example would return 所以上面的例子会返回
"ab", "b", "bca", "ab", "ab", "b", "bca", "a", "ab", "bca".
I have a 29 line piece of code of nested loops that accomplish this task (returns ArrayList). 我有29行嵌套循环的代码来完成此任务(返回ArrayList)。 However, it would be nice to get this done with a one line regular expression. 但是,最好使用一行正则表达式来完成此操作。
Can this task be accomplished using the following method? 可以使用以下方法完成此任务吗?
stringVar.split("regEX")
Not an one-liner, but you can do it using Matcher.find
with a loop.: 不是Matcher.find
,但是您可以使用Matcher.find
和循环来实现:
ArrayList<String> result = new ArrayList<String>();
String s = "abbbcaababbbcaaabbca";
Matcher m = Pattern.compile("bca|ab|a|b").matcher(s);
while (m.find())
result.add(m.group());
It can be accomplished using lookaround assertions , but @falsetru's answer is preferred over split
ting. 可以使用环视断言来实现,但是@falsetru的答案比split
ting更可取。
String[] ss = "abbbcaababbbcaaabbca".split("(?<=bca|ab)|(?<=a(?=ab))|(?<=b(?=bca))");
System.out.println(Arrays.toString(ss)); //=> [ab, b, bca, ab, ab, b, bca, a, ab, bca]
If the string contains letters only, you could shorten this using a backreference. 如果字符串仅包含字母,则可以使用后向引用来缩短它。
String[] ss = "abbbcaababbbcaaabbca".split("(?<=bca|ab)|(?<=(.)(?=\\1))")
It looks like you are trying to split between identical characters. 您似乎正在尝试在相同的字符之间进行拆分。 In that case you can use 在这种情况下,您可以使用
stringVar.split("(?<=(\\w))(?=\\1)")
but it will result in ab, b, bca, abab, b, bca, a, ab, bca
, which means that abab
will not be split. 但这将导致ab, b, bca, abab, b, bca, a, ab, bca
,这意味着abab
将不会被分割。
If you want you can manually add case where you can decide that after ab
or bca
you also want to split via 如果需要,可以手动添加大小写,以便可以确定在ab
或bca
您还希望通过
stringVar.split("(?<=(\\w))(?=\\1)|(?<=ab|bca)")
which now will return ab, b, bca, ab, ab, b, bca, a, ab, bca
现在将返回ab, b, bca, ab, ab, b, bca, a, ab, bca
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.