简体   繁体   English

PowerShell - 使用命名捕获组的正则表达式需要帮助

[英]PowerShell - Assistance needed with Regex with named capture groups

Good Evening,晚上好,

I am trying to teach myself regex and running into an issue trying to figure this out.我正在尝试自学正则表达式并遇到一个试图解决这个问题的问题。 I have 3 days worth of logs that will look similar to below.我有 3 天的日志,如下所示。

I am capturing the information into named capture groups and then adding in powershell to an array list.我将信息捕获到命名捕获组中,然后将 powershell 添加到数组列表中。

Problems, I need to ignore everything between < > , I don't need it.问题,我需要忽略< >之间的所有内容,我不需要它。

Then I need to look ahead and see if it is Added , Deleted or Updated , ignoring the Configuration part.然后我需要向前看,看看它是AddedDeleted还是Updated ,忽略Configuration部分。 Then return the match if it is one of those 3. Then skip BY USER and just grab the user name.如果它是这 3 个之一,则返回匹配项。然后跳过BY USER并获取用户名。

The final result should look like this from regex perspective:从正则表达式的角度来看,最终结果应该如下所示:

Date    09 Dec 2020
Time    12:59:28
ErrorID VPSa0217I
PrintQ  PRINTQUEUE1
Action  UPDATED
User    op9p99

Logfile containing records like these:包含如下记录的日志文件:

09 Dec 2020 12:59:28 VPSa0217I <CREQ0009        > PRINTQUEUE1 ADDED BY USER op9p99
09 Dec 2020 13:00:22 VPSa0219I <CREQ0011        > PRINTQUEUE1 CONFIGURATION UPDATED BY USER op9p99
09 Dec 2020 14:20:59 VPSa0217I <CREQ0014        > PRINTQUEUE1 DELETED BY USER op9p99

Tried:试过:

#$Regex1 = "(?<Date>\d{2}\s[ADFJMNOS][a-z]{2,8}\s[12][0-9]{3}\b)\s(?<Time>(?!\s)\d+:\d+:\d+).(?<ErrorID>[VPSa]{2,4}\d{4}[A-Z])(?<Junk>.<.*?>.*?\s)(?<PrintQ>\w+)(?<Action>.\bADDED|DELETED|UPDATED\b)(?<Junk2>\s\w+\s\w+\s)(?<User>\w+)"
    
#$Regex2 = "(?<Date>\d{2}\s[ADFJMNOS][a-z]{2,8}\s[12][0-9]{3}\b)(?<Time>\s+\d{1,2}:\d{2}:\d{2})\s(?<ErrorID>[VPSa]{2,4}\d{4}[A-Z])(?<Junk>.<.*?>.*?\s)(?<PrintQ>\w+)(?<Action>\s\bADDED|DELETED|UPDATED\b)(?<Junk2>\s\w+\s\w+\s)(?<User>\w+)"
    
$regex3 = "(?<Date>\d{2}\s[ADFJMNOS][a-z]{2,8}\s[12][0-9]{3}\b)(?<Time>\s+\d{1,2}:\d{2}:\d{2})\s(?<ErrorID>[VPSa]{2,4}\d{4}[A-Z])(?<Junk>.<.*?>.*?\s)(?<PrintQ>\w+).(?<Action>ADDED|DELETED|UPDATED\b)(?<Junk2>\s\w+\s\w+\s)(?<User>\w+)"

Works:作品:

$Datereg = "(?<Date>\d{2}\s[ADFJMNOS][a-z]{2,8}\s[12][0-9]{3}\b)"
$TimeReg = "(?<Time>\s+\d{1,2}:\d{2}:\d{2})\s"
$ErrorIDReg = "(?<ErrorID>[VPSa]{2,4}\d{4}[A-Z])"
$Junk1Reg = "(?<Junk>.<.*?>.*?\s)"
$PrintQreg = "(?<PrintQ>\w+)"
$ActionReg = "(?<Action>\s\w+)"
$Junk2Reg = "(?<Junk2>\s\w+\s\w+)"
$UserReg = "(?<User>\s\w+\s)"

$regex = $Datereg + $TimeReg + $ErrorIDReg + $Junk1Reg + $PrintQreg + $ActionReg + $Junk2Reg + $UserReg

Thanks for the help.谢谢您的帮助。

Given that the tokens of interest are mostly space-separated tokens, I suggest a different approach, based primarily on -split , the string splitting operator :鉴于感兴趣的标记大多是空格分隔的标记,我建议采用一种不同的方法,主要基于-split 字符串拆分运算符

Get-Content logfile.txt | ForEach-Object { 

  # Split the line into tokens by whitespace.
  $tokens = -split $_

  # Get the action value.
  # Use the 4th token *from the end* (-4) to account for the fact that
  # some lines have an extra word - 'CONFIGURATION' - inserted before the
  # action value.
  $action = $tokens[-4]
 
  if ($action -in 'UPDATED', 'DELETED', 'ADDED') {
    # Construct and output an object from the tokens.
    [pscustomobject] @{ 
      Date = $tokens[0..2] -join ' '
      Time = $tokens[3]
      ErrorId = $tokens[4]
      PrintQ = $tokens[7]
      Action = $action
      User = $tokens[-1] # user is always the last token
    }
  }

}

Note: PowerShell's operators are generally case- insensitive ;注意:PowerShell 的运算符通常不区分大小写 if you need case- sensitive matching, place a c before the operator name, such as -ceq and -cin .如果您需要区分大小写的匹配,请在运算符名称之前放置一个c ,例如-ceq-cin

With your sample input, the above outputs:使用您的示例输入,上述输出:

Date    : 09 Dec 2020
Time    : 12:59:28
ErrorId : VPSa0217I
PrintQ  : PRINTQUEUE1
Action  : ADDED
User    : op9p99

Date    : 09 Dec 2020
Time    : 13:00:22
ErrorId : VPSa0219I
PrintQ  : PRINTQUEUE1
Action  : UPDATED
User    : op9p99

Date    : 09 Dec 2020
Time    : 14:20:59
ErrorId : VPSa0217I
PrintQ  : PRINTQUEUE1
Action  : DELETED
User    : op9p99

Try this set of regexes:试试这组正则表达式:

$log = @"
09 Dec 2020 12:59:28 VPSa0217I <CREQ0009        > PRINTQUEUE1 ADDED BY USER op9p99
09 Dec 2020 13:00:22 VPSa0219I <CREQ0011        > PRINTQUEUE1 CONFIGURATION UPDATED BY USER op9p99
09 Dec 2020 14:20:59 VPSa0217I <CREQ0014        > PRINTQUEUE1 DELETED BY USER op9p99
"@

$DateReg = "(?<Date>\d{2}\s[ADFJMNOS][a-z]{2,8}\s[12][0-9]{3}\b)"
$TimeReg = "(?<Time>\s+\d{1,2}:\d{2}:\d{2})\s"
$ErrorIDReg = "(?<ErrorID>[VPSa]{2,4}\d{4}[A-Z])\s"
$Junk1Reg = "(?<Junk><[^>]+>)\s"
$PrintQreg = "(?<PrintQ>\w+)\s(?!CONFIGURATION\s)"
$ActionReg = "(?<Action>\w+)\s"
$Junk2Reg = "(?<Junk2>\w+\s\w+)\s"
$UserReg = "(?<User>\w+)"

$regex = $Datereg + $TimeReg + $ErrorIDReg + $Junk1Reg + $PrintQreg + $ActionReg + $Junk2Reg + $UserReg

$log -split "`n" | Foreach-Object { if ($_ -match $regex) {"Matched line: $_"}}

Which outputs:哪个输出:

Matched line: 09 Dec 2020 12:59:28 VPSa0217I <CREQ0009        > PRINTQUEUE1 ADDED BY USER op9p99
Matched line: 09 Dec 2020 14:20:59 VPSa0217I <CREQ0014        > PRINTQUEUE1 DELETED BY USER op9p99

The primary tweak was to use a zero-width assertion called negative lookahead to make sure that after the PrintQ text there is not the text CONFIGURATION .主要的调整是使用称为negative lookahead的零宽度断言,以确保在PrintQ文本之后没有文本CONFIGURATION I also tweaked your Junk1Reg to use <[^>]+> .我还调整了您的Junk1Reg以使用<[^>]+>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM