简体   繁体   English

key =(value)语法的正则表达式

[英]regular expression for key=(value) syntax

I am currently writing a java program with regular expression but I am struggling as I am pretty new in regex. 我目前正在编写一个带有正则表达式的Java程序,但由于正则表达式很新,所以我很努力。

KEY_EXPRESSION = "[a-zA-z0-9]+";
VALUE_EXPRESSION = "[a-zA-Z0-9\\*\\+,%_\\-!@#\\$\\^=<>\\.\\?';:\\|~`&\\{\\}\\[\\]/ ]*";
CHUNK_EXPRESSION = "(" + KEY_EXPRESSION + ")\\((" + VALUE_EXPRESSION + ")\\)";

The target syntax is key(value)+key(value)+key(value) . 目标语法为key(value)+key(value)+key(value) Key is alphanumeric and value is allowed to be any combination. 键是字母数字,并且值可以是任何组合。

This has been okay so far. 到目前为止还可以。 However, I have a problem with '(' , ')' in value. 但是,我对'('')'的值有疑问。 If I place '(' or ')' in the value, value includes all the rest. 如果我在值中放置'('')' ,则值包括所有其余值。

eg number(abc(kk)123)+status(open) returns key:number , value:abc(kk)123)+status(open 例如number(abc(kk)123)+status(open)返回key:numbervalue:abc(kk)123)+status(open
It is supposed to be two pairs of key-value. 应该是两对键值。

Can you guys suggest to improve the expression above? 你们可以建议改善上面的表达方式吗?

Not possible with regular expressions at all, sorry. 很抱歉,使用正则表达式根本不可能。 If you want to count opening and closing parantheses, regular expressions are, in general, not good enough. 如果要计算打开和关闭的括号,通常,正则表达式还不够好。 The language you are trying to parse is not a regular language . 您尝试解析的语言不是常规语言

Of course, there may be ways around that limitation. 当然,可能有一些方法可以解决该限制。 We cannot know that if you give us as little context as you did. 我们不知道您是否能像您一样给我们提供很少的背景信息。

Get the matched group from index 1 and 2 从索引1和2获取匹配的组

([a-zA-Z0-9]+)\((.*?)\)(?=\+|$)

Here is online demo 这是在线演示

The above regex pattern looks of for )+ as delimiter between keys and values. 上面的正则表达式模式将)+用作键和值之间的分隔符。

Note: The above regex pattern will not work if value contains )+ for example number(abc(kk)+123+4+4)+status(open) 注意:如果值包含)+ ,例如number(abc(kk)+123+4+4)+status(open) ,则上述正则表达式模式将不起作用

在此处输入图片说明

Sample code: 样例代码:

String str = "number(abc(kk)123)+status(open)";
Pattern p = Pattern.compile("([a-zA-Z0-9]+)\\((.*?)\\)(?=\\+|$)");
Matcher m = p.matcher(str);
while (m.find()) {
    System.out.println(m.group(1) + ":" + m.group(2));
}

output: 输出:

number:abc(kk)123
status:open

Someone posted an answer with a working solution regex: ([a-zA-z0-9]+)\\((.*?)\\)(?=\\+|$) - This works great. 有人用有效的正则表达式发布了答案: ([a-zA-z0-9]+)\\((.*?)\\)(?=\\+|$) -这很好用。 When I tested on online regex tester site and came back, the post had gone. 当我在在线正则表达式测试器网站上进行测试并返回时,该帖子已消失。 Is it right solution? 这是正确的解决方案吗? I am wondering why the answer has been deleted. 我想知道为什么答案被删除了。

See this golfed regex: 看到这个打高尔夫球的正则表达式:

 
 
 
  
  ([^\\W_]+)\\((.*?)\\)(?![^+])
 
  
  • You can use a shorthanded character class [^\\W_] instead of [a-zA-Z0-9] . 您可以使用简写字符类[^\\W_]代替[a-zA-Z0-9]
  • You can use a negative lookahead assertion (?![^+]) to match without backtracking. 您可以使用否定的超前断言(?![^+])进行匹配,而无需回溯。

However, this is not a practical solution as )+ within inner elements will break: number(abc(kk)+5+123+4+4)+status(open) 但是,这不是实际的解决方案,因为内部元素中的)+会损坏: number(abc(kk)+5+123+4+4)+status(open)

This is the case where Java, which has the regex implementation that doesn't support recursion, is disadvantaged. 在这种情况下,具有不支持递归的正则表达式实现的Java处于不利地位。 As I mentioned in this thread , the practical approach would be to use a workaround (copy-paste regex), or build your own finite state machine to parse it. 正如我在该线程中提到的,实际的方法是使用替代方法(复制粘贴正则表达式),或者构建自己的有限状态机来解析它。

Also, you have a typographical error in your original regex. 另外,您的原始正则表达式中存在印刷错误。 [a-zA-z0-9]+ has a range " Az ". [a-zA-z0-9]+的范围为“ Az ”。 You meant to type " AZ ". 您的意思是键入“ AZ ”。

I'll do a little assumption that you're able to add a + at the end of your chunk ie number(abc(kk)123)+status(open)+ 我会做一个小小的假设,您可以在块的末尾添加一个+,即number(abc(kk)123)+status(open)+

If it is possible you'll have it work with: 如果有可能,您可以使用它:

KEY_EXPRESSION = "[a-zA-z0-9]+";
VALUE_EXPRESSION = "[a-zA-Z0-9\\*\\+,%_\\-!@#\\$\\^=<>\\.\\?';:\\|~`&\\{\\}\\[\\]\\(\\)/ ]*?";
CHUNK_EXPRESSION = "(" + KEY_EXPRESSION + ")\\((" + VALUE_EXPRESSION + ")\\)+";

The changes are on line 2 adding the ( ) with escaping and replacing * by *? 所做的更改在第2行上,将( )进行转义,然后将*替换为*?

The ? ? turn off the greedy matching and try to keep the shortest match (reluctant operator). 关闭贪婪匹配,并尝试保持最短匹配(勉强的运算符)。

On line 3 adding a + at the end of the mask to help separate the key(value) fields. 在第3行中,在掩码的末尾添加+ ,以帮助分隔key(value)字段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM