简体   繁体   English

正则表达式和捕获组

[英]Regular Expression and Capture Groups

I have a question regarding Java regular expressions and capture groups. 我对Java正则表达式和捕获组有疑问。 My goal is to parse a log file and extract relevant fields into QRadar. 我的目标是解析日志文件并将相关字段提取到QRadar中。 I am not exactly writing Java code however since QRadar uses Java regular expressions to parse the log file and since my question is a regular expression problem I am posting it here in hope of getting some pointers/solution to my problem. 我不是完全在编写Java代码,但是因为QRadar使用Java正则表达式来解析日志文件,并且由于我的问题是正则表达式问题,所以我将其发布在这里,希望能得到一些指向我的问题的指针/解决方案。

Here goes my question - 这是我的问题-

I am trying to parse a log file that is a CEF (Common Event Format) formatted log file. 我试图解析一个日志文件,该文件是CEF(公共事件格式)格式的日志文件。 Following are a couple of lines from the log file - 以下是日志文件中的几行-

[blah, blah...] cs1=DataValue1 cs2=DataValue2

[blah, blah...] cs2=DataValue3 cs1=DataValue4

My goal is to extract the data values for the fields cs1 and cs2 from the above lines. 我的目标是从上述行中提取字段cs1cs2的数据值。 So I am interested in capturing the values - DataValue1 , DataValue2 , DataValue3 and DataValue4 from the above lines 所以我有兴趣从上述DataValue4行中捕获值DataValue1DataValue2DataValue3DataValue4

I came up with the following regular expressions for accomplishing the same - 为了实现相同的目的,我想出了以下正则表达式-

RegEx for cs1 field - \\scs1\\=(.*?)\\s\\w+\\= RegEx for \\scs1\\=(.*?)\\s\\w+\\=字段- \\scs1\\=(.*?)\\s\\w+\\=

RegEx for cs2 field - \\scs2\\=(.*?)\\s\\w+\\= RegEx for \\scs2\\=(.*?)\\s\\w+\\=字段- \\scs2\\=(.*?)\\s\\w+\\=

Using the above regular expressions and capture group I am able to capture the data values. 使用上面的正则表达式和捕获组,我可以捕获数据值。 But only in certain cases. 但仅在某些情况下。 So if you look at the log entries above you will notice that the order of the fields cs1 and cs2 within the log entry is not fixed. 因此,如果您查看上面的日志条目,您会注意到日志条目中的字段cs1cs2的顺序是不固定的。 So at times the cs1 field appears before cs2 (in the middle of log entry) and at other times the field cs1 appears at the end (is the last field) of the log entry. 因此,在次cs1场出现之前cs2 (在日志条目的中间),而在其他时候领域cs1出现在末尾(是最后一个字段)的日志条目。 Similar behavior exists with cs2 field as well. cs2字段也存在类似的行为。 Using my current regular expression only works when the field is not the last field. 仅当该字段不是最后一个字段时,才使用当前的正则表达式。

Eg for the 1st log entry line [blah, blah...] cs1=DataValue1 cs2=DataValue2 , my regular expressions correctly parse/extract the value of the cs1 field but they fail for the cs2 field since cs2 field is at the end of the line. 例如,对于第一条日志输入行[blah, blah...] cs1=DataValue1 cs2=DataValue2 ,我的正则表达式正确地解析/提取了cs1字段的值,但是由于cs2字段位于cs2字段的末尾,因此它们对于cs2字段失败线。

Similarly, for the 2nd log entry line [blah, blah...] cs2=DataValue3 cs1=DataValue4 , my regular expressions correctly parse/extract the value of cs2 field but they fail to extract the value for the cs1 field since cs1 field is at the end of the line. 同样,对于第2个日志条目线[blah, blah...] cs2=DataValue3 cs1=DataValue4 ,我的正则表达式正确解析/提取的值cs2场,但他们未能提取值了cs1因为场cs1场在该行的末尾。

My question is - What should my regular expression be so that it can parse/extract the data field value correctly irrespective of whether the field appears in the middle or at the end of the log file entry? 我的问题是-我的正则表达式应该是什么,以便它可以正确解析/提取数据字段值,而不管该字段是出现在日志文件条目的中间还是结尾?

Any help is appreciated 任何帮助表示赞赏

Regards, 问候,

PS: In case anyone is interested I posted this question on the QRadar forum as well ( https://www.ibm.com/developerworks/community/forums/html/topic?id=f48bc2dc-2ccb-42df-b543-dc0522491fad ) but no luck yet with any responses... PS:如果有人感兴趣,我也会在QRadar论坛上发布此问题( https://www.ibm.com/developerworks/community/forums/html/topic?id=f48bc2dc-2ccb-42df-b543-dc052249249fad )但还没有任何回应...

Just use a lookahead to capture the values of cs1 and cs2 fields, if you don't know the order of it's arrangement. 如果您不知道cs1cs2字段的排列顺序,只需先行捕捉即可。

^(?=.*?\scs1=(\S+))(?=.*\scs2=(\S+))

Java regex would be, Java正则表达式将是

^(?=.*?\\scs1=(\\S+))(?=.*\\scs2=(\\S+))

DEMO DEMO

Group index 1 contains the value of cs1 and index 2 contains the value of cs2 组索引1包含cs1的值,组索引2包含cs2的值

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM