简体   繁体   English

PowerShell:选择匹配前的行 - 使用输入字符串变量时的Select-String -Context问题

[英]PowerShell: Select line preceding a match — Select-String -Context issue when using input string variable

I need return a line preceeding a match on a multi-line string variable. 我需要在多行字符串变量的匹配之前返回一行。

It seems when using a string variable for the input Select-String considers the entire string as having matched. 当输入使用字符串变量时,Select-String似乎认为整个字符串已匹配。 As such the Context properties are "outside" either end of the string and are null. 因此,Context属性在字符串的两端“外部”并且为null。

Consider the below example: 考虑以下示例:

$teststring = @"
line1
line2
line3
line4
line5
"@

Write-Host "Line Count:" ($teststring | Measure-Object -Line).Lines #verify PowerShell does regard input as a multi-line string (it does)

Select-String -Pattern "line3" -InputObject $teststring -AllMatches -Context 1,0 | % {
$_.Matches.Value #this prints the exact match
$_.Context #output shows all context properties to be empty 
$_.Context.PreContext[0] #this would ideally output first line before the match
$_.Context.PreContext[0] -eq $null #but instead is null
}

Am I misunderstanding something here? 我在这里误解了什么吗?

What is the best way to return "line2" when matching for "line3"? 匹配“line3”时返回“line2”的最佳方法是什么?

Thanks! 谢谢!

Edit: Additional requirements I neglected to state: Needs to provide the line above ALL matched lines for a string of indeterminate length. 编辑:我忽略的附加要求:需要在所有匹配的行上方提供一行不确定长度的行。 EG when searching the below for "line3" I need to return "line2" and "line5". EG在下面搜索“line3”时我需要返回“line2”和“line5”。

line1
line2
line3
line4
line5
line3
line6

Select-String operates on arrays of input, so rather than a single, multiline string you must provide an array of lines for -Context and -AllMatches to work as intended: Select-String对输入数组进行操作,因此您必须-Context-AllMatches 提供一系列行而不是单个多行字符串 ,以便按预期工作:

$teststring = @"
line1
line2
line3
line4
line5
line3
line6
"@

$teststring -split '\r?\n' | Select-String -Pattern "line3" -AllMatches -Context 1,0 | % {
  "line before:  " + $_.Context.PreContext[0]
  "matched part: " + $_.Matches.Value  # Prints the what the pattern matched
}

This yields: 这会产生:

line before:  line2
matched part: line3
line before:  line5
matched part: line3
  • $teststring -split '\\r?\\n' splits the multi-line string into an array of lines: $teststring -split '\\r?\\n'将多行字符串拆分为一行数组:

    • Note: What line-break sequences your here-document uses (LF-only vs. CRLF) depends on the enclosing script file; 注意:此文档使用的换行符序列(LF-only与CRLF)取决于封闭的脚本文件; regex \\r?\\n handles either style. regex \\r?\\n处理任何一种风格。
  • Note that it is crucial to use the pipeline to provide Select-String 's input; 请注意,使用管道提供Select-String的输入至关重要; if you used -InputObject , the array would be coerced back to a single string. 如果使用-InputObject ,则数组将被强制转换回单个字符串。


Select-String is convenient, but slow . Select-String很方便,但很慢
Especially for a single string already in memory, a solution using the .NET Framework's [Regex]::Matches() method will perform much better , though it is more complex . 特别是对于已经在内存中的单个字符串, 使用.NET Framework的[Regex]::Matches()方法的解决方案将表现得更好 ,尽管它更复杂

Note that PowerShell's own -match and -replace operators are built on the same .NET class, but do not expose all of its functionality; 请注意,PowerShell自己的-match-replace运算符构建在同一个.NET类上,但不公开其所有功能; -match - which does report capture groups in the automatic $Matches variable - is not an option here, because it only ever returns 1 match. -match - 在自动$Matches变量中报告捕获组 - 这里不是一个选项,因为它只返回1个匹配。

The following is essentially the same approach as in mjolinor's answer answer, but with several problems corrected[1]. 以下基本上与mjolinor的答案答案相同,但纠正了几个问题[1]。

# Note: The sample string is defined so that it contains LF-only (\n)
#       line breaks, merely to simplify the regex below for illustration.
#       If your script file use LF-only line breaks, the 
#       `-replace '\r?\n', "`n" call isn't needed.
$teststring = @"
line1
line2
line3
line4
line5
line3
line6
"@ -replace '\r?\n', "`n" 

[Regex]::Matches($teststring, '(?:^|(.*)\n).*(line3)') | ForEach-Object { 
  "line before:  " + $_.Groups[1].Value
  "matched part: " + $_.Groups[2].Value
}
  • Regex (?:^|(.*)\\n).*(line3) uses 2 capture groups ( (...) ) to capture both the (matching part of) the line to match and the line before ( (?:...) is an auxiliary non -capturing group that is needed for precedence): 正则表达式(?:^|(.*)\\n).*(line3)使用2个捕获组( (...) )来捕获要匹配的行(匹配部分)和之前的行( (?:...)是优先所需的辅助捕获组:

    • (?:^|(.*)\\n) matches either the very start of the string ( ^ ) or ( | ) any - possibly empty - sequence of non-newline characters ( .* ) followed by a newline ( \\n ); (?:^|(.*)\\n)匹配字符串的开头( ^ )或( | )any - 可能为空 - 非换行符( .* )后跟换行符( \\n ) ; this ensures that the line to match is also found when there is no preceding line (ie, of the line to match is the first one). 这确保了当没有前一行时(即,要匹配的行是第一行),也可以找到要匹配的行。
    • (line3) is the group defining the line to match; (line3)是定义要匹配的行的组; it is preceded by .* to match the behavior in the question, where pattern line3 is found even it is only part of a line. 它前面有.*来匹配问题中的行为,其中找到了pattern line3 ,即使它只是一行的一部分
      • If you want only full lines to match, use the following regex instead: 如果只想匹配完整行,请使用以下正则表达式:
        (?:^|(.*)\\n)(line3)(?:\\n|$)
  • [Regex]::Matches() finds all matches and returns them as a collection of System.Text.RegularExpressions.Match objects, which the ForEach-Object cmdlet call can then operate on to extract the capture-group matches ( $_.Groups[<n>].Value ). [Regex]::Matches()查找所有匹配项并将它们作为System.Text.RegularExpressions.Match对象的集合返回,然后ForEach-Object cmdlet调用可以对其进行操作以提取捕获组匹配项( $_.Groups[<n>].Value )。


[1] As of this writing: [1]撰写本文时:
- There is no need to match twice - the enclosing if ($teststring -match $pattern) { ... } is unnecessary. - 没有必要匹配两次 - 封闭if ($teststring -match $pattern) { ... }是不必要的。
- Inline option (?m) is not needed, because . - 不需要内联选项(?m) ,因为. does not match newlines by default . 默认情况下与换行符匹配。
- (.+?) captures only nonempty lines (and ? , the non-greedy quantifier, is not needed). - (.+?)只捕获非空行 (和? ,不需要非贪婪量词)。
- If the line of interest is the first line - ie, if there's no line before , it won't be matched. - 如果感兴趣的行是第一行 - 即,如果之前没有行,则不会匹配。

You can use a multi-line regex, with the -match operator: 您可以使用带有-match运算符的多行正则表达式:

$teststring = @"
line1
line2
line3
line4
line5
line3
line6
"@

$pattern = 
@'
(?m)
(.+?)
line3
'@


if ($teststring -match $pattern)
  { [Regex]::Matches($teststring,$pattern) |
    foreach {$_.groups[1].value} }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM