简体   繁体   English

使用选择字符串匹配多个单行模式并写入输出

[英]Using Select-String to Match Multiple Single-Line Patterns and Write to Output

I am trying to build a simple script to utilize regex and match multiple patterns on a single line - recursively throughout an input file, and write the result to an output file. 我正在尝试构建一个简单的脚本来利用正则表达式并在一行上匹配多个模式-递归遍历整个输入文件,并将结果写入输出文件。 But I'm hitting a wall: 但是我撞墙了:

Sample text: 示范文本:

BMC12345 COMBINED PHASE STATISTICS:  31 ROWS SELECTED FOR SPACE 'KDDT111D.DIH0345S', 0 ROWS SELECTED BUT DISCARDED DUE TBMC123456 COMBINED PHASE STATISTICS:  10 PHYSICAL (10 LOGICAL) RECORDS DISCARDED TO SYSDISC

Here's what I've got so far: 到目前为止,这是我得到的:

$table = [regex] "'.*'"
$discard = [regex] "\d* PHYSICAL"

Select-String -Pattern ($table, $discard) -AllMatches .\test.txt | foreach {
    $_.Matches.Value
} > output.txt

Output: 输出:

'KDDT111D.DIH0345S'

Desired output: 所需的输出:

'KDDT111D.DIH0345S' 10 Physical

For some reason I am unable to get both patterns to write to output.txt. 由于某种原因,我无法同时将两种模式都写入output.txt。 Ideally once I get this working I would like to use Export-Csv to get something a bit cleaner like: 理想情况下,一旦我开始工作,我想使用Export-Csv来获得一些更Export-Csv ,例如:

|KDDT111D|DIH0345S|10 Physical|

i think you will find the -match operator a bit more suited to this. 我认为您会发现-match运算符更适合于此。 [ grin ] using named matches against your sample stored in $InStuff , this ... [ 咧嘴笑 ]使用命名匹配对$InStuff存储的样本,这...

$InStuff -match ".+SPACE '(?<Space>.+)\.(?<SubSpace>.+)'.+: (?<Discarded>.+) \(.+"

... gives the following set of matches ... ...给出以下匹配项...

Name                           Value                                                                              
----                           -----                                                                              
Space                          KDDT111D                                                                           
SubSpace                       DIH0345S                                                                           
Discarded                      10 PHYSICAL                                                                        
0                              BMC12345 COMBINED PHASE STATISTICS: 31 ROWS SELECTED FOR SPACE 'KDDT111D.DIH0345...

the named matches can be addressed by $Matches.<the capture group name> . 命名的匹配项可以通过$Matches.<the capture group name>

You have run into a Select-String limitation : The .Matches property of the [Microsoft.PowerShell.Commands.MatchInfo] objects that Select-String emits for each input object (line) only ever contains the (potentially multiple) matches for the first regex passed to the 已运行到Select-String局限性.Matches所述的属性[Microsoft.PowerShell.Commands.MatchInfo]的对象Select-String发射对每一输入对象(线)永远只含有(潜在的多个),用于所述第一匹配正则表达式传递给
-Pattern parameter. -Pattern参数。 [1] [1]

You can work around the problem by passing a single regex instead, by combining the input regexes via alternation ( | ): 您可以通过传递单个正则表达式来解决此问题 ,方法是通过交替| )组合输入正则表达式:

Select-String -Pattern ($table, $discard -join '|') -AllMatches .\test.txt | 
  ForEach-Object { $_.Matches.Value } > output.txt

A simplified example: 一个简化的例子:

# ('f.', '.z' -join '|') -> 'f.|.z'
'foo bar baz' | Select-String -AllMatches ('f.', '.z' -join '|') |
  ForEach-Object { $_.Matches.Value }

The above yields: 以上收益:

fo
az

proving that the matches for both regexes were reported. 证明两个正则表达式都匹配。

Caveat re output ordering : Using alternation ( | ) causes the matches for a given input string to be reported in the order in which they're found in the input , not in the order in which the regexes were specified . 需要注意的重新排序输出 :采用交替( | )会导致在他们在输入中发现,没有在指定这些正则表达式的顺序的顺序来报告给定的输入字符串匹配。
That is, both -Pattern 'f.|.z' and -Pattern '.z|f.' 也就是说, -Pattern 'f.|.z'-Pattern '.z|f.' above would have resulted in the same output order. 以上将导致相同的输出顺序。


[1] The problem exists as of Windows PowerShell v5.1 / PowerShell Core 6.2.0-preview.4 and is discussed in this GitHub issue [1]从Windows PowerShell v5.1 / PowerShell Core 6.2.0-preview.4开始存在此问题此GitHub问题对此问题进行了讨论

Thanks to the contributors for the ideas and learning experience. 感谢贡献者的想法和学习经验。 I was able to get the desired output utilizing a combination of both answers receive. 通过结合接收两个答案,我能够获得所需的输出。

I found that the -match operator only returned the first occurrence of the regex pattern match from the source file, so I needed to add a foreach loop in order to recursively return matches throughout the log file. 我发现-match运算符仅从源文件返回了第一次出现的regex模式匹配项,因此我需要添加一个foreach循环,以便在整个日志文件中递归返回匹配项。

I also modified the regex to include only discard values greater than 0. 我还修改了正则表达式以仅包含大于0的丢弃值。

Sample Text: 示范文本:

BMC51472I COMBINED PHASE STATISTICS:  0 ROWS SELECTED FOR SPACE 'KDDT000D.KDAICH0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  3499604 ROWS SELECTED FOR SPACE 'KDDT000D.KDAIND0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  1 ROWS SELECTED FOR SPACE 'KDDT000D.KDCISR0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  9185775 ROWS SELECTED FOR SPACE 'KDDT000D.KDIADR0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  11 PHYSICAL (11 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  0 ROWS SELECTED FOR SPACE 'KDDT000D.KDICHT0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  2387375 ROWS SELECTED FOR SPACE 'KDDT000D.KDICMS0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  1632821 ROWS SELECTED FOR SPACE 'KDDT000D.KDIPRV0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  0 ROWS SELECTED FOR SPACE 'KDDT000D.KDLADD0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  24845 PHYSICAL (24845 LOGICAL) RECORDS DISCARDED TO SYSDISC

Example: 例:

  $regex = ".+SPACE '(?<Space>.+)\.(?<SubSpace>.+)'.+: (?<Discarded>.+) .[1-9][0-9]*\s\b"

    $timestamp = Get-Date
    $timestamp = Get-Date $timestamp -f "MM_dd_yy"
    $dir = "C:\Users\JonMonJovi\"

    cat $dir\*.log.txt | where {
        $_ -match $regex
    } | foreach {
        $Matches.Space, $Matches.SubSpace, $Matches.Discarded -join "|"
    } > C:\Users\JonMonJovi\Discarded\Discard_Log_$timestamp.txt

Output: 输出:

KDDT000D|KDIADR0S| 11 PHYSICAL
KDDT000D|KDLADD0S| 24845 PHYSICAL

From here I am able to use the pipe delimited .txt output file to import into Excel, fulfilling my requirements. 从这里,我可以使用以竖线分隔的.txt输出文件导入Excel,从而满足我的要求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM