简体   繁体   English

如果正则表达式具有嵌套组,我如何使用正则表达式匹配字符串?

[英]How can I use regex to match strings if the regex has nested group?

There are some strings: 有一些字符串:

111/aaa
111/aaa|222/bbb

They are in the form of expression: 他们是表达形式:

(.*)/(.*)(|(.*)/(.*))?

I tried to use it to match a string and extract the values: 我试图用它来匹配一个字符串并提取值:

var rrr = """(.*)/(.*)(|(.*)/(.*))?""".r

"123/aaa|444/bbb" match {
    case rrr(pid,pname, cid,cname) => println(s"$pid, $pname, $cid, $cname")
    case _ => println("not matched ?!")
}

But it prints: 但它打印:

not matched ?!

And I want to get: 我想得到:

123, aaa, 444, bbb

How to fix it? 怎么解决?


UPDATE UPDATE

Thanks for @BartKiers and @Barmar's anser, that I found my regex has several mistakes, and finally found this solution: 感谢@BartKiers和@ Barmar的anser,我发现我的正则表达式有几个错误,最后找到了这个解决方案:

var rrr = """(.*?)/(.*?)([|](.*?)/(.*?))?""".r

"123/aaa|444/bbb" match {
    case rrr(pid,pname, _, cid,cname) => println(s"$pid, $pname, $cid, $cname")
    case _ => println("not matched ?!")
}

It works, but you can see there is a _ which is actually not useful. 它有效,但你可以看到有一个_实际上没用。 Is there any way to redefine the regex that I can just write rrr(pid,pname,cid,cname) to match it? 有没有办法重新定义正则表达式,我可以写rrr(pid,pname,cid,cname)来匹配它?

.* could lead to a lot of backtracking becuase .* would first match the complete string and then go back one by one until it matches the first / . .*可能导致很多回溯因为.*首先匹配完整的字符串,然后一个接一个地返回,直到它匹配第一个/

Also it won't capture the values in groups properly as you would expect it to.. 此外,它不会像您期望的那样正确捕获组中的值。

You should use .*? 你应该使用.*?

Your regex should be 你的正则表达式应该是

^(.*?)/(.*?)(?:\|(.*?)/(.*?))?$

There wouldn't be any performance difference for small strings but it would capture the values in the right group 小字符串不会有任何性能差异,但它会捕获正确组中的值

Notice the ?: in the regex, it means don't capture the group (?:\\|(.*?)/(.*?))? 注意?:在正则表达式中,它意味着不捕获组(?:\\|(.*?)/(.*?))? . So it will be 4 subgroups only as the result. 因此,仅作为结果将是4个子组。

Try to escape the | 试着逃避| , which is the logical OR in regex: ,这是正则表达式中的逻辑OR:

var rrr = """(.*)/(.*)(\|(.*)/(.*))?""".r

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM