简体   繁体   中英

Split string array in Java using regex

I'm trying to split this string :

aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)

so it looks like this array :

[ a, b, a(2), b, b(52), g, c(4), d(2), f, e(14), f(6), g(8) ]

Here are the rules, it can accept letters a to g , it can be a letter alone but if there is parentheses following it, it has to include them and its content. The content of the parentheses must be a numeric value .

This is what I tried :

content = "aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)";
        a = content.split("[a-g]|[a-g]\\([0-9]*\\)");
        for (String s:
             a) {
            System.out.println(s);
        }

And here's the output

(2)

(52)

(4) (2)

(14) (6) (8)h(4)5(6)

Thanks.

It is easier to match these substrings:

String content = "aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)";
Pattern pattern = Pattern.compile("[a-g](?:\\(\\d+\\))?");
List<String> res = new ArrayList<>();
Matcher matcher = pattern.matcher(content);
while (matcher.find()){
    res.add(matcher.group(0)); 
} 
System.out.println(res);

Output:

[a, b, a(2), b, b(52), g, c(4), d(2), f, e(14), f(6), g(8)]

See the Java demo and a regex demo .

Pattern details

  • [ag] - a letter from a to g
  • (?:\\(\\d+\\))? - an optional non-capturing group matching 1 or 0 occurrences of
    • \\( - a ( char
    • \\d+ - 1+ digits
    • \\) - a ) char.

If you want to use the split method only, here is an approach you could follow too,

import java.util.Arrays;

public class Test 
{
   public static void main(String[] args)
   {
        String content = "aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)";
        String[] a = content.replaceAll("[a-g](\\([0-9]*\\))?|[a-g]", "$0:").split(":");
        // $0 is the string which matched the regex

        System.out.println(Arrays.toString(a));

   }

}

Regex : [ag](\\\\([0-9]*\\\\))?|[ag] matches the strings you want to match with (ie a, b, a(5) and so on)

Using this regex I first replace those strings with their appended versions (appended with :). Later, I split the string using the split method.

Output of the above code is,

[a, b, a(2), b, b(52), g, c(4), d(2), f, e(14), f(6), g(8), h(4)5(6)]

NOTE: This approach would only work with a delimiter that is known to not be present in the input string. For example, I chose a colon because I assumed it won't be a part of the input string.

Split is the wrong approach for this, as it is hard to eliminate wrong entries.

Just "match", whatever is valid and process the result array of found matches:

[a-g](?:\(\d+\))?

正则表达式可视化

Debuggex Demo

You can try the following regex: [ag](\\(.*?\\))?

  • [ag] : letters from a to g required
  • (\\(.*?\\))? : any amout of characters between ( and ) , matching as as few times as possible

You can view the expected output here .

This answer is based upon Pattern , an example:

String input = "aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)";

Pattern pattern = Pattern.compile("[a-g](?:\\(\\d+\\))?");
Matcher matcher = pattern.matcher(input);
List<String> tokens = new ArrayList<>();
while (matcher.find()) {
    tokens.add(matcher.group());
}

tokens.forEach(System.out::println);

Resulting output:

a
b
a(2)
b
b(52)
g
c(4)
d(2)
f
e(14)
f(6)
g(8)

Edit: Using [ag](?:\\((.*?)\\))? you can also easily extract the inner value of a bracket:

while (matcher.find()) {
    tokens.add(matcher.group());
    tokens.add(matcher.group(1)); // the inner value or null if no () are present 
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM