繁体   English   中英

XML日志文件正则表达式

[英]XML Log file regex

我无法更改的遗留系统每天要抽出5 Gig的糟糕的XML日志,并浪费我的摄取许可证。 每分钟发生1000多次以上的2类详细错误,但每隔几分钟就会出现一次真正有趣的条目。 我想大幅缩短sed中的重复条目,并保留有趣的条目

所以我需要
1.正则表达式匹配两个烦人的日志条目(例如...'decimal'...和...'DBNull'...,但偶尔不有趣的日志条目)中的每一个。
一个正则表达式可以匹配每个烦人的错误类别,我可以做2次sed传递
2.我需要一个带有时间戳的捕获组,以便可以将简短的XML行替换为简洁的版本-但要使用正确的时间戳,以免丢失保真度。

我已经达到了与捕获创建日期相匹配的程度:

(?:<Log).*?(createdDate="\d{2}\/\d{2}\/\d{4}.\d{2}:\d{2}:\d{2}").*?(?:decimal).*?(<\/Log>)

这很接近,但是有一种逆向贪婪的感觉,在这种情况下,我将“十进制”与开头的对数匹配

样本数据

<Log type="ERROR" createdDate="11/09/2015 08:13:14" > 
 <![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format.
  ]]></Log> 

<Log type="ERROR" createdDate="11/09/2015 08:13:13" > 
 <![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format.
  ]]></Log> 

<Log type="ERROR" createdDate="11/09/2015 08:13:12" > 
 <![CDATA[ [129] Services.DService.D.FailedToAddRQ(Exceptionex, RQEntityrQ, RHeaderEntityrHeader, StringPRef, ): FailedToAddRQ()...with parameters [pRef:=123,0,1], [rQ.AffinityCode:=],[Q.thing=thing][rQ.AffinityRQDT:=123],[rHeader.RHeaderIDPK:=123],[rQ.UWriteIDFK:=] 
  Data.DataAccessLayerException: Conversion from type 'DBNull' to type 'Long' is not valid.
Parameters:
 [RETURN_VALUE][ReturnValue] Value: [0]
 ---> System.InvalidCastException: Conversion from type 'DBNull' to type 'Long' is not valid.
 ]]></Log> 

 <Log type="ERROR" createdDate="11/09/2015 08:13:11" > 
 <![CDATA[ [129] Services.DService.D.FailedToAddRQ(Exceptionex, RQEntityrQ, RHeaderEntityrHeader, StringPRef, ): FailedToAddRQ()...with parameters [pRef:=123,0,1], [rQ.AffinityCode:=],[Q.thing=thing][rQ.AffinityRQDT:=123],[rHeader.RHeaderIDPK:=123],[rQ.UWriteIDFK:=] 
  Data.DataAccessLayerException: Conversion from type 'DBNull' to type 'Long' is not valid.
  ]]></Log> 

 <Log type="ERROR" createdDate="11/09/2015 08:13:10" > 
 <![CDATA[ [231] An actual interesting log entry with a real error message ]]></Log>

<Log type="ERROR" createdDate="11/09/2015 08:13:09" > 
 <![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format.
  ]]></Log> 

不确定您要寻找的是什么,但这是如何隔离<Log...</Log>块并进行替换的示例:

sed '/^<Log /{:a;/<\/Log>/!{N;ba;};s/>.*\(decimal\|DBNull\).*</>\1</}' file.log

细节:

/^<Log / { # condition: a line that starts with "<Log "
    :a;    # define the label "a"
    /<\/Log>/! { # condition: if the line doesn't contain "</Log>"
        N;       # append the next line to the pattern space
        ba;      # go to the label "a"
    };
    s/>.*\(decimal\|DBNull\).*</>\1</ # replace the block
}

(我假设<Log始终位于行的开头,这与第10和11秒处的记录不同,这可能是拼写错误。)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM