简体   繁体   English

单个字符串中的多个正则表达式模式匹配

[英]multiple regex pattern matches in a single string groovy

I have a test string like this 我有这样的测试字符串

08:28:57,990 DEBUG [http-0.0.0.0-18080-33] [tester] [1522412937602-580613] [TestManager] ABCD: loaded 35 test accounts

I want to regex and match "ABCD" and "35" in this string 我想用正则表达式匹配此字符串中的“ ABCD”和“ 35”

   def regexString = ~ /(\s\d{1,5}[^\d\]\-\:\,\.])|([A-Z]{4}\:)/
   ............
   while (matcher.find()) {
                acct = matcher.group(1)
                grpName = matcher.group(2)
                println ("group : " +grpName + " acct : "+ acct)
            }

My Current Output is 我当前的输出是

group : ABCD: acct : null
group : null acct :  35 

But I expected something like this 但是我期望这样的事情

group : ABCD: acct : 35

Is there any option to match all the patterns in the string before it loops into the while(). 在循环到while()之前,是否有任何选项可以匹配字符串中的所有模式。 Or a better way to implement this 或者更好的方法来实现这一点

I believe your issues is with the 'or' in your regex. 我相信您的问题与正则表达式中的“或”有关。 I think it is essentially parsing it twice, once to match the first half of the regex and then again to match the second half after the '|'. 我认为它实际上是对其进行了两次解析,一次是匹配正则表达式的前半部分,然后是再次匹配“ |”后的后半部分。 You need a regex that will match both in one parse. 您需要一个正则表达式,在一个解析中将两者都匹配。 You can reverse the matches so they match in order: 您可以反转匹配项,以便它们按顺序匹配:

/([A-Z]{4})\:.*\s(\d{1,5)}[^\d\]-"\,\.]/

Also notice the change in parentheses so you don't capture more than you need - Currently you are capturing the ':' after the group name and an extra space before the acct. 还要注意括号中的变化,这样您就不会捕获到多余的内容-当前,您正在捕获组名后面的':',并且在acct前面有一个多余的空格。 This is assuming the "ABCD" will always come before the "35". 假设“ ABCD”将始终位于“ 35”之前。

There is also a lot more you can do assuming that all your strings are formatted very similarly: 假设所有字符串的格式都非常相似,您还可以做更多的事情:

For example, if there is always a space after the acct number you could simplify it to: 例如,如果acct号后总是有一个空格,您可以将其简化为:

/([A-Z]{4})\:.*\s(\d{1,5)}\s/

There's probably a lot more you could do to make sure you're always capturing the correct things, but i'd have to see or know more about the dataset to do so. 要确保始终捕获正确的内容,您可能需要做更多的工作,但是我必须查看或了解更多有关数据集的信息。

Then of course you have the switch the order of matches in your code: 然后,您当然可以在代码中切换匹配顺序:

  while (matcher.find()) {
                grpName = matcher.group(1)
                acct = matcher.group(2)
                println ("group : " +grpName + " acct : "+ acct)
            }

You may use 您可以使用

String s = "08:28:57,990 DEBUG [http-0.0.0.0-18080-33] [tester] [1522412937602-580613] [TestManager] ABCD: loaded 35 test accounts"
def res = s =~ /\b([A-Z]{4}):[^\]\[\d]*(\d{1,5})\b/
if (res.find()) {
    println "${res[0][1]}, ${res[0][2]}"
} else {
    println "not found"
}

See the Groovy demo . 参见Groovy演示

The regex - \\b([AZ]{4}):[^\\]\\[\\d]*(\\d{1,5})\\b - matches a string starting with a whole word consisting of 4 uppercase ASCII letters (captured into Group 1), then followed with : and 0+ chars other than [ , ] and digits, and then matches and captures into Group 2 a whole number consisting of 1 to 4 digits. 正则表达式- \\b([AZ]{4}):[^\\]\\[\\d]*(\\d{1,5})\\b匹配以一个包含4个大写ASCII字母的单词开头的字符串(捕获到组1中),然后加上:和0+个除[]和数字之外的字符,然后匹配并捕获由1到4位数字组成的整数到组2中。

See the regex demo . 参见regex演示

In the code, =~ operator makes the regex engine find a partial match (ie searches for the pattern anywhere inside the string) and the res variable contains all the match objects that hold a whole match inside res[0][0] , Group 1 inside res[0][1] and Group 2 value in res[0][2] . 在代码中, =~运算符使regex引擎找到部分匹配项(即,在字符串内的任意位置搜索模式),并且res变量包含在res[0][0]内包含所有匹配项的所有匹配对象1个内部res[0][1]和在第2组值res[0][2]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM