简体   繁体   中英

How to match string within parentheses (nested) in Java?

I would like to match a string within parentheses like:

(i, j, k(1))
^^^^^^^^^^^^

The string can contain closed parentheses too. How to match it with regular expression in Java without writing a parser, since this is a small part of my project. Thanks!

Edit:

I want to search out a string block and find something like u(i, j, k) , u(i, j, k(1)) or just u(<anything within this paired parens>) , and replace them to __u%array(i, j, k) and __u%array(i, j, k(1)) for my Fortran translating application.

As I said, contrary to popular belief (don't believe everything people say) matching nested brackets is possible with regex.

The downside of using it is that you can only up to a fixed level of nesting . And for every additional level you wish to support, your regex will be bigger and bigger.

But don't take my word for it. Let me show you. The regex:

\([^()]*\)

Matches one level . For up to two levels , you'd need:

\(([^()]*|\([^()]*\))*\)

And so on. To keep adding levels, all you have to do is change the middle (second) [^()]* part to ([^()]*|\\([^()]*\\))* ( check three levels here ). As I said, it will get bigger and bigger.

Your problem:

For your case, two levels may be enough. So the Java code for it would be:

String fortranCode = "code code u(i, j, k) code code code code u(i, j, k(1)) code code code u(i, j, k(m(2))) should match this last 'u', but it doesnt.";
String regex = "(\\w+)(\\(([^()]*|\\([^()]*\\))*\\))"; // (\w+)(\(([^()]*|\([^()]*\))*\))
System.out.println(fortranCode.replaceAll(regex, "__$1%array$2"));

Input:

code code u(i, j, k) code code code code u(i, j, k(1)) code code code u(i, j, k(m(2))) should match this last 'u', but it doesnt.

Output:

code code __u%array(i, j, k) code code code code __u%array(i, j, k(1)) code code code u(i, j, __k%array(m(2))) should match this last 'u', but it doesnt.

Bottom line:

In the general case, parsers will do a better job - that's why people get so pissy about it. But for simple applications, regexes can pretty much be enough.

Note: Some flavors of regex support the nesting operator R (Java doesn't, PCRE engines like PHP and Perl do), which allows you to nest arbitrary number of levels. With them, you could do: \\(([^()]|(?R))*\\) .

Separate your job. Have the regex be:

([a-z]+)\((.*)\)

The first group will contain the identifier, the second the parameters. Then proceeed as such:

private static final Pattern PATTERN = Pattern.compile("([a-z]+)\\((.*)\\)");

// ...

final Matcher m = Pattern.matcher(input);

if (!m.matches())
    // No match! Deal with it.

// If match, then:

final String identifier = m.group(1);
final String params = m.group(2);

// Test if there is a paren
params.indexOf('(') != -1;

Replace [az]+ with whatever an identifier can be in Fortran.

Please check this answer as it does basically what you try to do (in short it's not really possible with regexps)

Regular Expression to match outer brackets

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM