简体   繁体   中英

Regex getting wrong output in Java

I have a string that looks something like:

" 'a 'b '(dfg (1 2)) '(3 4) (ad) d "

And what I am trying to do is match so I get this output:

'a, 'b, '(dfg (1 2)), '(3 4), (ad), d

I am currently using:

"'\(.*\)|\(\.*\)|'\w+|\w+"

But there is a problem i've runned into using this, for example if I write

'(abc) (df)

it will return

'(abc) (df)

instead of

'(abc), (df)

So my question is if there is a way to solve this with regex or do I have to solve this an other way?

The answer is no.

The language you are trying to parse is not regular , it's context-free . So you are not able to parse it with regex.

If you're interested, here is the grammar:

 S->SS|e;
 S->'(A);
 A-> AA|(A)|w+;

It's not a regular since you can't build FSM to represent it, which is true, in case you can recursively include bracket structures.

Well, whatever. Let's answer the question "How?". Traverse the string from the first character. Once you find a hyphen, start counting brackets. Opening counts for +1, closing counts for -1. Once you hit a closing bracket with zero resulting counter, insert a comma after that bracket. Problem solved:

 'a 'b '(d f g (1 2)) '(3 4) (a d) d
        |      |   ||
        |      |   |+-- counter = 0 on closing bracket, insert comma
        |      |   +--- counter = 1
        |      +------- counter = 2
        +-------------- start counting, counter = 1

etc.

如果您使用的是PCRE等,则可以使用类似以下的表达式:

'?(?:\w+|(\(?:([^()]+|(?1))*\)))

If I understand correctly, you want to add a comma before every space, except in parentheses. Is that right?

If so, there might be a way to do it in regex using lookaheads and lookbehinds but it's going to get messy fast. Better to split up all the terms first and then add commas just after the one's you want.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM