解析日志文件

Question

我正在嘗試編寫一個腳本來簡化在特定應用程序日志文件中搜索特定信息的過程。 所以我想也許有一種方法可以將它們轉換為XML樹，並且我有一個不錯的開始....但問題是，如果你問我，應用程序日志文件是一個絕對的混亂

有些條目很簡單

2014/04/09 11:27:03 INFO  Some.code.function - Doing stuff

理想情況下，我想把上面的內容變成這樣的東西

    <Message>
    <Date>2014/04/09</Date>
    <Time>11:48:38</Time>
    <Type>INFO</Type>
    <Source>Some.code.function</Source>
    <Sub>Doing stuff</Sub>
    </Message>

其他條目是這樣的，其中有額外的信息和換行符

2014/04/09 11:27:04 INFO  Some.code.function - Something happens

changes: 
this stuff happened

我想將最后一個塊轉換為類似上面的內容，但是將其他信息添加到一個部分中

    <Message>
    <Date>2014/04/09</Date>
    <Time>11:48:38</Time>
    <Type>INFO</Type>
    <Source>Some.code.function</Source>
    <Sub>Doing stuff</Sub>
    <details>changes: 
this stuff happened</details>
    </Message>

然后是其他消息，錯誤將以形式出現

2014/04/09 11:27:03 ERROR  Some.code.function - Something didn't work right
Log Entry: LONGARSEDGUID
Error Code: E3145
Application: Name
Details:
message information etc etc and more line breaks, this part of the message may add up to an unknown number of lines before the next entry

我想將最后一個塊轉換為上一個示例，但是為日志條目，錯誤代碼，應用程序添加XML節點，再次，像這樣的細節

    <Message>
    <Date>2014/04/09</Date>
    <Time>11:48:38</Time>
    <Type>ERROR  </Type>
    <Source>Some.code.function</Source>
    <Sub>Something didn't work right</Sub>
    <Entry>LONGARSEDGUID</Entry>
    <Code>E3145</Code>
    <Application>Name</Application>
    <details>message information etc etc and more line breaks, this part of the message may add up to an unknown number of lines before the next entry</details>
    </Message>

現在我知道Select-String有一個上下文選項，可以讓我在我過濾的行之后選擇一些行，問題是，這不是一個常數。

我正在考慮一個正則表達式也可以在日期字符串之前選擇段落塊，但是正則表達式不是我的強點，我認為可能有更好的方法，因為一個常量是新條目以日期字符串

這個想法是要么將這些分解成xml或各種各樣的表，然后從那里我希望它可能需要最后一次或過濾不相關或重復的消息更容易一點

出於隱私原因，在刪除/替換一些信息之后，我有一個我剛剛在pastebin上扔過的樣本

http://pastebin.com/raw.php?i=M9iShyT2

Answer 1

處理這些文件的一種可能方法是逐行處理它們。 每個日志條目都以時間戳開頭，並在出現以時間戳開頭的下一行時結束，因此您可以執行以下操作：

Get-Content 'C:\path\to\your.log' | % {
  if ($_ -match '^\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}') {
    if ($logRecord) {
      # If a current log record exists, it is complete now, so it can be added
      # to your XML or whatever, e.g.:

      $logRecord -match '^(\d{4}/\d{2}/\d{2}) (\d{2}:\d{2}:\d{2}) (\S+) ...'

      $message = $xml.CreateElement('Message')

      $date = $xml.CreateElement('Date')
      $date.InnerText = $matches[1]
      $message.AppendChild($date)

      $time = $xml.CreateElement('Time')
      $time.InnerText = $matches[2]
      $message.AppendChild($time)

      $type = $xml.CreateElement('Type')
      $type.InnerText = $matches[3]
      $message.AppendChild($type)

      ...

      $xml.SelectSingleNode('...').AppendChild($message)
    }
    $logRecord = $_          # start new record
  } else {
    $logRecord += "`r`n$_"   # append to current record
  }
}

Answer 2

對不起，這有點晚了，我在那里工作了一段時間（努力工作期待我在他們的角錢里有所作為）。 我最終得到了類似於Ansgar Wiechers解決方案的東西，但是將事物格式化為對象並將它們收集到一個數組中。 它不管理您稍后添加的XML，但是這為您提供了一個很好的對象數組，可用於其他記錄。 我將在這里解釋主要的RegEx系列，我將在線評論它的實用性。

'（^ \\ d {4} / \\ d {2} / \\ d {2} \\ d {2}：\\ d {2}：\\ d {2}）[\\ d +？]（\\ w +？）{1 ，2}（。+？） - （。+）$'是檢測新記錄開始的正則表達式。 我開始解釋它，但是你可能有更好的資源來學習RegEx而不是我向我解釋它。 有關完整細分和示例，請參閱此RegEx101.com鏈接。

$Records=@() #Create empty array that we will populate with custom objects later
$Event = $Null #make sure nothing in $Event to give script a clean start
Get-Content 'C:\temp\test1.txt' | #Load file, and start looping through it line-by-line.
?{![string]::IsNullOrEmpty($_)}|% { #Filter out blank lines, and then perform the following on each line
  if ($_ -match '(^\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) \[\d+?] (\w+?) {1,2}(.+?) - (.+)$') { #New Record Detector line! If it finds this RegEx match, it means we're starting a new record.
    if ($Event) { #If there's already a record in progress, add it to the Array
      $Records+=$Event
    }
    $Event = New-Object PSObject -Property @{ #Create a custom PSObject object with these properties that we just got from that RegEx match
DateStamp = [datetime](get-date $Matches[1]) #We convert the date/time stamp into an actual DateTime object. That way sorting works better, and you can compare it to real dates if needed.
Type = $Matches[2]
Source = $Matches[3]
Message = $Matches[4]}

好的，這里的原因暫停一點。 $Matches不是我定義的，為什么我引用它？ 。 當PowerShell從RegEx表達式獲取匹配時，它會自動將結果匹配存儲在$ Matches中。 因此，我們在括號中匹配的所有組都變為$ Matches [1]，$ Matches [2]，依此類推。 是的，它是一個數組，並且有一個$ Matches [0]，但這是匹配的整個字符串，而不僅僅是匹配的組。 我們現在返回您定期安排的腳本...

  } else { #End of the 'New Record' section. If it's not a new record if does the following
    if($_ -match "^((?:[^ ^\[])(?:\w| |\.)+?):(.*)$"){

RegEx再次匹配。 它首先說明這必須是帶有克拉字符（^）的字符串的開頭。 然后它說（在(?:<stuff>)格式中注明的非捕獲組中，這對我的目的來說只是意味着它不會出現在$ Matches中） [^ \\[] ; 這意味着下一個字符不能是空格或開括號（用a轉義），只是為了加快速度並跳過這些檢查。 如果你在括號[]有東西，而第一個字符是克拉，則意味着“與這些括號中的任何內容都不匹配”。

我實際上只是將下一部分更改為包含句點，並使用\\ w而不是[a-zA-Z0-9]，因為它基本上是相同的但更短。 \\ w是RegEx中的“單詞字符”，包括字母，數字和下划線。 我不確定為什么下划線被認為是一個單詞的一部分，但我不制定規則我只是玩游戲。 我正在使用[a-zA-Z0-9]匹配'a'和'z'（小寫）之間的任何東西，'A'和'Z'（大寫）之間的任何東西，以及'0'和'9'之間的任何東西。 存在包含下划線字符的風險\\ w更短更簡單。

然后是這個RegEx的實際捕獲部分。 這有兩組，第一組是字母，數字，下划線，空格和句點（用\\來轉義，因為'。'它自己匹配任何字符）。 然后冒號。 然后是第二組，直到行結束時為止。

        $Field = $Matches[1] #Everything before the colon is the name of the field
        $Value = $Matches[2].trim() #everything after the colon is the data in that field
        $Event | Add-Member $Field $Value #Add the Field to $Event as a NoteProperty, with a value of $Value. Those two are actually positional parameters for Add-Member, so we don't have to go and specify what kind of member, specify what the name is, and what the value is. Just Add-Member <[string]name> <value can be a string, array, yeti, whatever... it's not picky>
        } #End of New Field for current record
    else{$Value = $_} #If it didn't find the regex to determine if it is a new field then this is just more data from the last field, so don't change the field, just set it all as data.

    } else { #If it didn't find the regex then this is just more data from the last field, so don't change the field, just set it all as data.the field does not 'not exist') do this:
            $Event.$Field += if(![string]::isNullOrEmpty($Event.$Field)){"`r`n$_"}else{$_}}

對於相當短的代碼，這是一個很長的解釋。 它真的只是將數據添加到現場！ 這有一個倒置的（前綴為! ） If檢查當前字段是否有任何數據，如果它，或當前是否為空或空。 如果為空，則添加新行，然后添加$ Value數據。 如果它沒有任何數據，它會跳過新的行位，只是添加數據。

    }
  }
}
$Records+=$Event #Adds the last event to the array of records.

對不起，我對XML不是很了解。 但至少這會讓你獲得干凈的記錄。

編輯：好的，現在代碼已經注明，希望一切都解釋得很好。 如果有些事情仍然令人困惑，或許我可以將您推薦給一個比我能解釋得更好的網站。 我在PasteBin中針對您的示例輸入運行了上述內容。

解析日志文件

問題描述

2 個解決方案

解決方案1
1 2014-04-09 21:59:40

解決方案2
1 已采納 2014-04-09 23:38:52

解析日志文件

問題描述

2 個解決方案

解決方案1 1 2014-04-09 21:59:40

解決方案2 1 已采納 2014-04-09 23:38:52

解決方案1
1 2014-04-09 21:59:40

解決方案2
1 已采納 2014-04-09 23:38:52