[英]PowerShell - Assistance needed with Regex with named capture groups
晚上好,
我正在嘗試自學正則表達式並遇到一個試圖解決這個問題的問題。 我有 3 天的日志,如下所示。
我將信息捕獲到命名捕獲組中,然后將 powershell 添加到數組列表中。
問題,我需要忽略<
>
之間的所有內容,我不需要它。
然后我需要向前看,看看它是Added
, Deleted
還是Updated
,忽略Configuration
部分。 如果它是這 3 個之一,則返回匹配項。然后跳過BY USER
並獲取用戶名。
從正則表達式的角度來看,最終結果應該如下所示:
Date 09 Dec 2020
Time 12:59:28
ErrorID VPSa0217I
PrintQ PRINTQUEUE1
Action UPDATED
User op9p99
包含如下記錄的日志文件:
09 Dec 2020 12:59:28 VPSa0217I <CREQ0009 > PRINTQUEUE1 ADDED BY USER op9p99
09 Dec 2020 13:00:22 VPSa0219I <CREQ0011 > PRINTQUEUE1 CONFIGURATION UPDATED BY USER op9p99
09 Dec 2020 14:20:59 VPSa0217I <CREQ0014 > PRINTQUEUE1 DELETED BY USER op9p99
試過:
#$Regex1 = "(?<Date>\d{2}\s[ADFJMNOS][a-z]{2,8}\s[12][0-9]{3}\b)\s(?<Time>(?!\s)\d+:\d+:\d+).(?<ErrorID>[VPSa]{2,4}\d{4}[A-Z])(?<Junk>.<.*?>.*?\s)(?<PrintQ>\w+)(?<Action>.\bADDED|DELETED|UPDATED\b)(?<Junk2>\s\w+\s\w+\s)(?<User>\w+)"
#$Regex2 = "(?<Date>\d{2}\s[ADFJMNOS][a-z]{2,8}\s[12][0-9]{3}\b)(?<Time>\s+\d{1,2}:\d{2}:\d{2})\s(?<ErrorID>[VPSa]{2,4}\d{4}[A-Z])(?<Junk>.<.*?>.*?\s)(?<PrintQ>\w+)(?<Action>\s\bADDED|DELETED|UPDATED\b)(?<Junk2>\s\w+\s\w+\s)(?<User>\w+)"
$regex3 = "(?<Date>\d{2}\s[ADFJMNOS][a-z]{2,8}\s[12][0-9]{3}\b)(?<Time>\s+\d{1,2}:\d{2}:\d{2})\s(?<ErrorID>[VPSa]{2,4}\d{4}[A-Z])(?<Junk>.<.*?>.*?\s)(?<PrintQ>\w+).(?<Action>ADDED|DELETED|UPDATED\b)(?<Junk2>\s\w+\s\w+\s)(?<User>\w+)"
作品:
$Datereg = "(?<Date>\d{2}\s[ADFJMNOS][a-z]{2,8}\s[12][0-9]{3}\b)"
$TimeReg = "(?<Time>\s+\d{1,2}:\d{2}:\d{2})\s"
$ErrorIDReg = "(?<ErrorID>[VPSa]{2,4}\d{4}[A-Z])"
$Junk1Reg = "(?<Junk>.<.*?>.*?\s)"
$PrintQreg = "(?<PrintQ>\w+)"
$ActionReg = "(?<Action>\s\w+)"
$Junk2Reg = "(?<Junk2>\s\w+\s\w+)"
$UserReg = "(?<User>\s\w+\s)"
$regex = $Datereg + $TimeReg + $ErrorIDReg + $Junk1Reg + $PrintQreg + $ActionReg + $Junk2Reg + $UserReg
謝謝您的幫助。
鑒於感興趣的標記大多是空格分隔的標記,我建議采用一種不同的方法,主要基於-split
字符串拆分運算符:
Get-Content logfile.txt | ForEach-Object {
# Split the line into tokens by whitespace.
$tokens = -split $_
# Get the action value.
# Use the 4th token *from the end* (-4) to account for the fact that
# some lines have an extra word - 'CONFIGURATION' - inserted before the
# action value.
$action = $tokens[-4]
if ($action -in 'UPDATED', 'DELETED', 'ADDED') {
# Construct and output an object from the tokens.
[pscustomobject] @{
Date = $tokens[0..2] -join ' '
Time = $tokens[3]
ErrorId = $tokens[4]
PrintQ = $tokens[7]
Action = $action
User = $tokens[-1] # user is always the last token
}
}
}
注意:PowerShell 的運算符通常不區分大小寫; 如果您需要區分大小寫的匹配,請在運算符名稱之前放置一個c
,例如-ceq
和-cin
。
使用您的示例輸入,上述輸出:
Date : 09 Dec 2020
Time : 12:59:28
ErrorId : VPSa0217I
PrintQ : PRINTQUEUE1
Action : ADDED
User : op9p99
Date : 09 Dec 2020
Time : 13:00:22
ErrorId : VPSa0219I
PrintQ : PRINTQUEUE1
Action : UPDATED
User : op9p99
Date : 09 Dec 2020
Time : 14:20:59
ErrorId : VPSa0217I
PrintQ : PRINTQUEUE1
Action : DELETED
User : op9p99
試試這組正則表達式:
$log = @"
09 Dec 2020 12:59:28 VPSa0217I <CREQ0009 > PRINTQUEUE1 ADDED BY USER op9p99
09 Dec 2020 13:00:22 VPSa0219I <CREQ0011 > PRINTQUEUE1 CONFIGURATION UPDATED BY USER op9p99
09 Dec 2020 14:20:59 VPSa0217I <CREQ0014 > PRINTQUEUE1 DELETED BY USER op9p99
"@
$DateReg = "(?<Date>\d{2}\s[ADFJMNOS][a-z]{2,8}\s[12][0-9]{3}\b)"
$TimeReg = "(?<Time>\s+\d{1,2}:\d{2}:\d{2})\s"
$ErrorIDReg = "(?<ErrorID>[VPSa]{2,4}\d{4}[A-Z])\s"
$Junk1Reg = "(?<Junk><[^>]+>)\s"
$PrintQreg = "(?<PrintQ>\w+)\s(?!CONFIGURATION\s)"
$ActionReg = "(?<Action>\w+)\s"
$Junk2Reg = "(?<Junk2>\w+\s\w+)\s"
$UserReg = "(?<User>\w+)"
$regex = $Datereg + $TimeReg + $ErrorIDReg + $Junk1Reg + $PrintQreg + $ActionReg + $Junk2Reg + $UserReg
$log -split "`n" | Foreach-Object { if ($_ -match $regex) {"Matched line: $_"}}
哪個輸出:
Matched line: 09 Dec 2020 12:59:28 VPSa0217I <CREQ0009 > PRINTQUEUE1 ADDED BY USER op9p99
Matched line: 09 Dec 2020 14:20:59 VPSa0217I <CREQ0014 > PRINTQUEUE1 DELETED BY USER op9p99
主要的調整是使用稱為negative lookahead
的零寬度斷言,以確保在PrintQ
文本之后沒有文本CONFIGURATION
。 我還調整了您的Junk1Reg
以使用<[^>]+>
。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.