简体   繁体   中英

multiple regex pattern matches in a single string groovy

I have a test string like this

08:28:57,990 DEBUG [http-0.0.0.0-18080-33] [tester] [1522412937602-580613] [TestManager] ABCD: loaded 35 test accounts

I want to regex and match "ABCD" and "35" in this string

   def regexString = ~ /(\s\d{1,5}[^\d\]\-\:\,\.])|([A-Z]{4}\:)/
   ............
   while (matcher.find()) {
                acct = matcher.group(1)
                grpName = matcher.group(2)
                println ("group : " +grpName + " acct : "+ acct)
            }

My Current Output is

group : ABCD: acct : null
group : null acct :  35 

But I expected something like this

group : ABCD: acct : 35

Is there any option to match all the patterns in the string before it loops into the while(). Or a better way to implement this

I believe your issues is with the 'or' in your regex. I think it is essentially parsing it twice, once to match the first half of the regex and then again to match the second half after the '|'. You need a regex that will match both in one parse. You can reverse the matches so they match in order:

/([A-Z]{4})\:.*\s(\d{1,5)}[^\d\]-"\,\.]/

Also notice the change in parentheses so you don't capture more than you need - Currently you are capturing the ':' after the group name and an extra space before the acct. This is assuming the "ABCD" will always come before the "35".

There is also a lot more you can do assuming that all your strings are formatted very similarly:

For example, if there is always a space after the acct number you could simplify it to:

/([A-Z]{4})\:.*\s(\d{1,5)}\s/

There's probably a lot more you could do to make sure you're always capturing the correct things, but i'd have to see or know more about the dataset to do so.

Then of course you have the switch the order of matches in your code:

  while (matcher.find()) {
                grpName = matcher.group(1)
                acct = matcher.group(2)
                println ("group : " +grpName + " acct : "+ acct)
            }

You may use

String s = "08:28:57,990 DEBUG [http-0.0.0.0-18080-33] [tester] [1522412937602-580613] [TestManager] ABCD: loaded 35 test accounts"
def res = s =~ /\b([A-Z]{4}):[^\]\[\d]*(\d{1,5})\b/
if (res.find()) {
    println "${res[0][1]}, ${res[0][2]}"
} else {
    println "not found"
}

See the Groovy demo .

The regex - \\b([AZ]{4}):[^\\]\\[\\d]*(\\d{1,5})\\b - matches a string starting with a whole word consisting of 4 uppercase ASCII letters (captured into Group 1), then followed with : and 0+ chars other than [ , ] and digits, and then matches and captures into Group 2 a whole number consisting of 1 to 4 digits.

See the regex demo .

In the code, =~ operator makes the regex engine find a partial match (ie searches for the pattern anywhere inside the string) and the res variable contains all the match objects that hold a whole match inside res[0][0] , Group 1 inside res[0][1] and Group 2 value in res[0][2] .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM