简体   繁体   English

使用正则表达式在Java中拆分字符串数组

[英]Split string array in Java using regex

I'm trying to split this string : 我正在尝试拆分这个字符串:

aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6) ABA(2)BB(52)GC(4)d(2)的Fe(14)F(6)G(8)H(4)5(6)

so it looks like this array : 所以它看起来像这个数组:

[ a, b, a(2), b, b(52), g, c(4), d(2), f, e(14), f(6), g(8) ] [a,b,a(2),b,b(52),g,c(4),d(2),f,e(14),f(6),g(8)]

Here are the rules, it can accept letters a to g , it can be a letter alone but if there is parentheses following it, it has to include them and its content. 这是规则,它可以接受字母a到g ,它可以是一个字母,但如果有括号,它必须包括它们及其内容。 The content of the parentheses must be a numeric value . 括号的内容必须是数值

This is what I tried : 这是我试过的:

content = "aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)";
        a = content.split("[a-g]|[a-g]\\([0-9]*\\)");
        for (String s:
             a) {
            System.out.println(s);
        }

And here's the output 这是输出

(2) (2)

(52) (52)

(4) (2) (4)(2)

(14) (6) (8)h(4)5(6) (14)(6)(8)h(4)5(6)

Thanks. 谢谢。

It is easier to match these substrings: 匹配这些子串更容易:

String content = "aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)";
Pattern pattern = Pattern.compile("[a-g](?:\\(\\d+\\))?");
List<String> res = new ArrayList<>();
Matcher matcher = pattern.matcher(content);
while (matcher.find()){
    res.add(matcher.group(0)); 
} 
System.out.println(res);

Output: 输出:

[a, b, a(2), b, b(52), g, c(4), d(2), f, e(14), f(6), g(8)]

See the Java demo and a regex demo . 请参阅Java演示正则表达式演示

Pattern details 图案细节

  • [ag] - a letter from a to g [ag] - 来自ag的一封信
  • (?:\\(\\d+\\))? - an optional non-capturing group matching 1 or 0 occurrences of - 一个匹配1或0次出现的可选非捕获组
    • \\( - a ( char \\( - a ( char
    • \\d+ - 1+ digits \\d+ - 1+位数
    • \\) - a ) char. \\) - a ) char。

If you want to use the split method only, here is an approach you could follow too, 如果您只想使用拆分方法,这里也是您可以遵循的方法,

import java.util.Arrays;

public class Test 
{
   public static void main(String[] args)
   {
        String content = "aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)";
        String[] a = content.replaceAll("[a-g](\\([0-9]*\\))?|[a-g]", "$0:").split(":");
        // $0 is the string which matched the regex

        System.out.println(Arrays.toString(a));

   }

}

Regex : [ag](\\\\([0-9]*\\\\))?|[ag] matches the strings you want to match with (ie a, b, a(5) and so on) 正则表达式: [ag](\\\\([0-9]*\\\\))?|[ag]匹配你想要匹配的字符串(即a,b,a(5)等)

Using this regex I first replace those strings with their appended versions (appended with :). 使用这个正则表达式我首先用附加的版本替换这些字符串(附加:)。 Later, I split the string using the split method. 后来,我使用split方法拆分字符串。

Output of the above code is, 输出上面的代码是,

[a, b, a(2), b, b(52), g, c(4), d(2), f, e(14), f(6), g(8), h(4)5(6)]

NOTE: This approach would only work with a delimiter that is known to not be present in the input string. 注意:此方法仅适用于已知在输入字符串中不存在的分隔符。 For example, I chose a colon because I assumed it won't be a part of the input string. 例如,我选择冒号,因为我认为它不会是输入字符串的一部分。

Split is the wrong approach for this, as it is hard to eliminate wrong entries. 拆分是错误的方法,因为很难消除错误的条目。

Just "match", whatever is valid and process the result array of found matches: 只需“匹配”,无论什么是有效的并处理找到的匹配的结果数组:

[a-g](?:\(\d+\))?

正则表达式可视化

Debuggex Demo Debuggex演示

You can try the following regex: [ag](\\(.*?\\))? 您可以尝试以下正则表达式: [ag](\\(.*?\\))?

  • [ag] : letters from a to g required [ag] :需要从a到g的字母
  • (\\(.*?\\))? : any amout of characters between ( and ) , matching as as few times as possible :字符之间的任何大写金额() ,如如几次匹配尽可能

You can view the expected output here . 您可以在此处查看预期输出。

This answer is based upon Pattern , an example: 这个答案基于Pattern ,一个例子:

String input = "aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)";

Pattern pattern = Pattern.compile("[a-g](?:\\(\\d+\\))?");
Matcher matcher = pattern.matcher(input);
List<String> tokens = new ArrayList<>();
while (matcher.find()) {
    tokens.add(matcher.group());
}

tokens.forEach(System.out::println);

Resulting output: 结果输出:

a
b
a(2)
b
b(52)
g
c(4)
d(2)
f
e(14)
f(6)
g(8)

Edit: Using [ag](?:\\((.*?)\\))? 编辑:使用[ag](?:\\((.*?)\\))? you can also easily extract the inner value of a bracket: 您还可以轻松提取括号的内部值:

while (matcher.find()) {
    tokens.add(matcher.group());
    tokens.add(matcher.group(1)); // the inner value or null if no () are present 
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM