简体   繁体   中英

Regular Expression and Capture Groups

I have a question regarding Java regular expressions and capture groups. My goal is to parse a log file and extract relevant fields into QRadar. I am not exactly writing Java code however since QRadar uses Java regular expressions to parse the log file and since my question is a regular expression problem I am posting it here in hope of getting some pointers/solution to my problem.

Here goes my question -

I am trying to parse a log file that is a CEF (Common Event Format) formatted log file. Following are a couple of lines from the log file -

[blah, blah...] cs1=DataValue1 cs2=DataValue2

[blah, blah...] cs2=DataValue3 cs1=DataValue4

My goal is to extract the data values for the fields cs1 and cs2 from the above lines. So I am interested in capturing the values - DataValue1 , DataValue2 , DataValue3 and DataValue4 from the above lines

I came up with the following regular expressions for accomplishing the same -

RegEx for cs1 field - \\scs1\\=(.*?)\\s\\w+\\=

RegEx for cs2 field - \\scs2\\=(.*?)\\s\\w+\\=

Using the above regular expressions and capture group I am able to capture the data values. But only in certain cases. So if you look at the log entries above you will notice that the order of the fields cs1 and cs2 within the log entry is not fixed. So at times the cs1 field appears before cs2 (in the middle of log entry) and at other times the field cs1 appears at the end (is the last field) of the log entry. Similar behavior exists with cs2 field as well. Using my current regular expression only works when the field is not the last field.

Eg for the 1st log entry line [blah, blah...] cs1=DataValue1 cs2=DataValue2 , my regular expressions correctly parse/extract the value of the cs1 field but they fail for the cs2 field since cs2 field is at the end of the line.

Similarly, for the 2nd log entry line [blah, blah...] cs2=DataValue3 cs1=DataValue4 , my regular expressions correctly parse/extract the value of cs2 field but they fail to extract the value for the cs1 field since cs1 field is at the end of the line.

My question is - What should my regular expression be so that it can parse/extract the data field value correctly irrespective of whether the field appears in the middle or at the end of the log file entry?

Any help is appreciated

Regards,

PS: In case anyone is interested I posted this question on the QRadar forum as well ( https://www.ibm.com/developerworks/community/forums/html/topic?id=f48bc2dc-2ccb-42df-b543-dc0522491fad ) but no luck yet with any responses...

Just use a lookahead to capture the values of cs1 and cs2 fields, if you don't know the order of it's arrangement.

^(?=.*?\scs1=(\S+))(?=.*\scs2=(\S+))

Java regex would be,

^(?=.*?\\scs1=(\\S+))(?=.*\\scs2=(\\S+))

DEMO

Group index 1 contains the value of cs1 and index 2 contains the value of cs2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM