简体   繁体   中英

Repeating regular expression

I have a log file which i want to parse. It is about getting the Values between the square brackets and after the "OK:" using regex. The Problem is i do not know how many times the pattern is occuring and i can not say how long each code is. So i can only relay on the fact that it is surrounded by "[OK:" and "]".

So far i tried to use this pattern here as regex:

String ok_pattern = "(.*itId=<)(.{1,10})(>.*)(\\[OK:)(.{4,27})(].*)";
Pattern p_ok = Pattern.compile(ok_pattern);

String testString = "RANDOMTEXT itId=<1232> Code < [OK:AZ1000105]  [OK:10000006] [OK:F1000000007] > RANDOMTEXT";

Matcher m = p_ok.matcher(testString);
if(m.find()) {
    System.out.println(m.group(5));
}

But this only works for the case when there is only one "[OK:...]". I played around with using a "*" and "+" after the 5th group but i could not succeed. How do i do this repetetive and still capture all results?

My goal is to extract the itemId and the (char-)number combination after the "OK:" using regex. So in this example I want to get "1232"(ItemID) and "AZ1000105", "10000006", "F1000000007".

I am thankful for every help!

Your basic setup is correct, but your pattern is somewhat off from ideal. Try using the following regex pattern:

(?<=\[OK:)[^\]]+|(?<=itId=<)[^>]+

This still uses a lookbehind, but it only asserts that what precedes is [OK: . Then, it matches, without even using a capture group, any amount of characters which are not a closing square bracket. This corresponds to the content you are trying to find. The portion to the right of the alternation matches itId values.

String ok_pattern = "(?<=\\[OK:)[^\\]]+|(?<=itId=<)[^>]+";
Pattern p_ok = Pattern.compile(ok_pattern);
String testString = "RANDOMTEXT itId=<1232> Code < [OK:AZ1000105]  [OK:10000006] [OK:F1000000007] > RANDOMTEXT";

Matcher m = p_ok.matcher(testString);
while (m.find()) {
    System.out.println(m.group(0));
}

1232
AZ1000105
10000006
F1000000007

If you want to capture the digits in itId=<1232> followed by subsequent captures of what is after OK: in that order , you could make use of the \\G anchor to assert the position at the end of the previous match.

Match the itId digits in the first capturing group and the value of OK: in the second capturing group:

itId=<(\d+)> Code < |\G(?!^)\[OK:([A-Z0-9]+)\]\s*

In Java:

String ok_pattern = "itId=<(\\d+)> Code < |\\G(?!^)\\[OK:([A-Z0-9]+)\\]\\s*";

Explanation

  • itId=<(\\d+)> Code < Match the first part and capture 1+ digits in group 1
  • | Or
  • \\G(?!^) End of the previous match, not at the start
  • \\[OK:([A-Z0-9]+)\\]\\s* Match [OK: , then capture your value in group 2 and match ] followed by 0+ whitespace chars

Regex demo | Java demo

Note that if you want to match more than ([A-Z0-9]+) you could also use a negated character class to match not a square bracket ([^]]+)

For example, you might check for the existence of the groups:

String ok_pattern = "itId=<(\\d+)> Code < |\\G(?!^)\\[OK:([^]]+)\\]\\s*";
Pattern p_ok = Pattern.compile(ok_pattern);
String testString = "RANDOMTEXT itId=<1232> Code < [OK:AZ1000105]  [OK:10000006] [OK:F1000000007] > RANDOMTEXT";
Matcher m = p_ok.matcher(testString);

while(m.find()) {
    if (null != m.group(1)) {
        System.out.println("itId: " + m.group(1));  
    }
    if (null != m.group(2)) {
        System.out.println("Ok code: " + m.group(2));   
    }   
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM