簡體   English   中英

XML日志文件正則表達式

[英]XML Log file regex

我無法更改的遺留系統每天要抽出5 Gig的糟糕的XML日志,並浪費我的攝取許可證。 每分鍾發生1000多次以上的2類詳細錯誤,但每隔幾分鍾就會出現一次真正有趣的條目。 我想大幅縮短sed中的重復條目,並保留有趣的條目

所以我需要
1.正則表達式匹配兩個煩人的日志條目(例如...'decimal'...和...'DBNull'...,但偶爾不有趣的日志條目)中的每一個。
一個正則表達式可以匹配每個煩人的錯誤類別,我可以做2次sed傳遞
2.我需要一個帶有時間戳的捕獲組,以便可以將簡短的XML行替換為簡潔的版本-但要使用正確的時間戳,以免丟失保真度。

我已經達到了與捕獲創建日期相匹配的程度:

(?:<Log).*?(createdDate="\d{2}\/\d{2}\/\d{4}.\d{2}:\d{2}:\d{2}").*?(?:decimal).*?(<\/Log>)

這很接近,但是有一種逆向貪婪的感覺,在這種情況下,我將“十進制”與開頭的對數匹配

樣本數據

<Log type="ERROR" createdDate="11/09/2015 08:13:14" > 
 <![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format.
  ]]></Log> 

<Log type="ERROR" createdDate="11/09/2015 08:13:13" > 
 <![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format.
  ]]></Log> 

<Log type="ERROR" createdDate="11/09/2015 08:13:12" > 
 <![CDATA[ [129] Services.DService.D.FailedToAddRQ(Exceptionex, RQEntityrQ, RHeaderEntityrHeader, StringPRef, ): FailedToAddRQ()...with parameters [pRef:=123,0,1], [rQ.AffinityCode:=],[Q.thing=thing][rQ.AffinityRQDT:=123],[rHeader.RHeaderIDPK:=123],[rQ.UWriteIDFK:=] 
  Data.DataAccessLayerException: Conversion from type 'DBNull' to type 'Long' is not valid.
Parameters:
 [RETURN_VALUE][ReturnValue] Value: [0]
 ---> System.InvalidCastException: Conversion from type 'DBNull' to type 'Long' is not valid.
 ]]></Log> 

 <Log type="ERROR" createdDate="11/09/2015 08:13:11" > 
 <![CDATA[ [129] Services.DService.D.FailedToAddRQ(Exceptionex, RQEntityrQ, RHeaderEntityrHeader, StringPRef, ): FailedToAddRQ()...with parameters [pRef:=123,0,1], [rQ.AffinityCode:=],[Q.thing=thing][rQ.AffinityRQDT:=123],[rHeader.RHeaderIDPK:=123],[rQ.UWriteIDFK:=] 
  Data.DataAccessLayerException: Conversion from type 'DBNull' to type 'Long' is not valid.
  ]]></Log> 

 <Log type="ERROR" createdDate="11/09/2015 08:13:10" > 
 <![CDATA[ [231] An actual interesting log entry with a real error message ]]></Log>

<Log type="ERROR" createdDate="11/09/2015 08:13:09" > 
 <![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format.
  ]]></Log> 

不確定您要尋找的是什么,但這是如何隔離<Log...</Log>塊並進行替換的示例:

sed '/^<Log /{:a;/<\/Log>/!{N;ba;};s/>.*\(decimal\|DBNull\).*</>\1</}' file.log

細節:

/^<Log / { # condition: a line that starts with "<Log "
    :a;    # define the label "a"
    /<\/Log>/! { # condition: if the line doesn't contain "</Log>"
        N;       # append the next line to the pattern space
        ba;      # go to the label "a"
    };
    s/>.*\(decimal\|DBNull\).*</>\1</ # replace the block
}

(我假設<Log始終位於行的開頭,這與第10和11秒處的記錄不同,這可能是拼寫錯誤。)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM