简体   繁体   中英

regular expression for key=(value) syntax

I am currently writing a java program with regular expression but I am struggling as I am pretty new in regex.

KEY_EXPRESSION = "[a-zA-z0-9]+";
VALUE_EXPRESSION = "[a-zA-Z0-9\\*\\+,%_\\-!@#\\$\\^=<>\\.\\?';:\\|~`&\\{\\}\\[\\]/ ]*";
CHUNK_EXPRESSION = "(" + KEY_EXPRESSION + ")\\((" + VALUE_EXPRESSION + ")\\)";

The target syntax is key(value)+key(value)+key(value) . Key is alphanumeric and value is allowed to be any combination.

This has been okay so far. However, I have a problem with '(' , ')' in value. If I place '(' or ')' in the value, value includes all the rest.

eg number(abc(kk)123)+status(open) returns key:number , value:abc(kk)123)+status(open
It is supposed to be two pairs of key-value.

Can you guys suggest to improve the expression above?

Not possible with regular expressions at all, sorry. If you want to count opening and closing parantheses, regular expressions are, in general, not good enough. The language you are trying to parse is not a regular language .

Of course, there may be ways around that limitation. We cannot know that if you give us as little context as you did.

Get the matched group from index 1 and 2

([a-zA-Z0-9]+)\((.*?)\)(?=\+|$)

Here is online demo

The above regex pattern looks of for )+ as delimiter between keys and values.

Note: The above regex pattern will not work if value contains )+ for example number(abc(kk)+123+4+4)+status(open)

在此处输入图片说明

Sample code:

String str = "number(abc(kk)123)+status(open)";
Pattern p = Pattern.compile("([a-zA-Z0-9]+)\\((.*?)\\)(?=\\+|$)");
Matcher m = p.matcher(str);
while (m.find()) {
    System.out.println(m.group(1) + ":" + m.group(2));
}

output:

number:abc(kk)123
status:open

Someone posted an answer with a working solution regex: ([a-zA-z0-9]+)\\((.*?)\\)(?=\\+|$) - This works great. When I tested on online regex tester site and came back, the post had gone. Is it right solution? I am wondering why the answer has been deleted.

See this golfed regex:

 
 
 
  
  ([^\\W_]+)\\((.*?)\\)(?![^+])
 
  
  • You can use a shorthanded character class [^\\W_] instead of [a-zA-Z0-9] .
  • You can use a negative lookahead assertion (?![^+]) to match without backtracking.

However, this is not a practical solution as )+ within inner elements will break: number(abc(kk)+5+123+4+4)+status(open)

This is the case where Java, which has the regex implementation that doesn't support recursion, is disadvantaged. As I mentioned in this thread , the practical approach would be to use a workaround (copy-paste regex), or build your own finite state machine to parse it.

Also, you have a typographical error in your original regex. [a-zA-z0-9]+ has a range " Az ". You meant to type " AZ ".

I'll do a little assumption that you're able to add a + at the end of your chunk ie number(abc(kk)123)+status(open)+

If it is possible you'll have it work with:

KEY_EXPRESSION = "[a-zA-z0-9]+";
VALUE_EXPRESSION = "[a-zA-Z0-9\\*\\+,%_\\-!@#\\$\\^=<>\\.\\?';:\\|~`&\\{\\}\\[\\]\\(\\)/ ]*?";
CHUNK_EXPRESSION = "(" + KEY_EXPRESSION + ")\\((" + VALUE_EXPRESSION + ")\\)+";

The changes are on line 2 adding the ( ) with escaping and replacing * by *?

The ? turn off the greedy matching and try to keep the shortest match (reluctant operator).

On line 3 adding a + at the end of the mask to help separate the key(value) fields.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM