简体   繁体   English

Powershell:用变量替换正则表达式命名组

[英]Powershell: Replacing regex named groups with variables

Say I have a regular expression like the following, but I loaded it from a file into a variable $regex, and so have no idea at design time what its contents are, but at runtime I can discover that it includes the "version1", "version2", "version3" and "version4" named groups: 假设我有一个如下所示的正则表达式,但我将它从一个文件加载到一个变量$ regex中,所以在设计时不知道它的内容是什么,但在运行时我发现它包含“version1”, “version2”,“version3”和“version4”命名组:

"Version (?<version1>\d),(?<version2>\d),(?<version3>\d),(?<version4>\d)"

...and I have these variables: ......我有这些变量:

$version1 = "3"
$version2 = "2"
$version3 = "1"
$version4 = "0"

...and I come across the following string in a file: ...我在文件中遇到以下字符串:

Version 7,7,0,0

...which is stored in a variable $input, so that ($input -match $regex) evaluates to $true. ...存储在变量$ input中,因此($ input -match $ regex)的计算结果为$ true。

How can I replace the named groups from $regex in the string $input with the values of $version1, $version2, $version3, $version4 if I do not know the order in which they appear in $regex (I only know that $regex includes these named groups)? 如果我不知道它们出现在$ regex中的顺序,我怎么能用字符串$ input中的$ regex替换$ version1,$ version2,$ version3,$ version4的值中的命名组(我只知道$正则表达式包括这些命名组)?

I can't find any references describing the syntax for replacing a named group with the value of a variable by using the group name as an index to the match - is this even supported? 我找不到任何描述使用组名作为匹配索引用变量值替换命名组的语法的引用 - 这是否支持?

EDIT: To clarify - the goal is to replace templated version strings in any kind of text file where the version string in a given file requires replacement of a variable number of version fields (could be 2, 3, or all 4 fields). 编辑:澄清 - 目标是替换任何类型的文本文件中的模板版本字符串,其中给定文件中的版本字符串需要替换可变数量的版本字段(可能是2,3或所有4个字段)。 For example, the text in a file could look like any of these (but is not restricted to these): 例如,文件中的文本可能看起来像这些中的任何一个(但不限于这些):

#define SOME_MACRO(4, 1, 0, 0)

Version "1.2.3.4"

SomeStruct vs = { 99,99,99,99 }

Users can specify a file set and a regular expression to match the line containing the fields, with the original idea being that the individual fields would be captured by named groups. 用户可以指定文件集和正则表达式以匹配包含字段的行,最初的想法是命名组将捕获各个字段。 The utility has the individual version field values that should be substituted in the file, but has to preserve the original format of the line that will contain the substitutions, and substitute only the requested fields. 该实用程序具有应在文件中替换的各个版本字段值,但必须保留将包含替换的行的原始格式,并仅替换所请求的字段。

EDIT-2: I think I can get the result I need with substring calculations based on the position and extent of each of the matches, but was hoping Powershell's replace operation was going to save me some work. EDIT-2:我想我可以根据每个匹配的位置和范围得到我需要的子串计算结果,但希望Powershell的替换操作能够为我节省一些工作。

EDIT-3: So, as Ansgar correctly and succinctly describes below, there isn't a way (using only the original input string, a regular expression about which you only know the named groups, and the resulting matches) to use the "-replace" operation (or other regex operations) to perform substitutions of the captures of the named groups, while leaving the rest of the original string intact. EDIT-3:所以,正如Ansgar正确而简洁地描述如下,没有办法(仅使用原始输入字符串,正则表达式,您只知道命名组,以及产生的匹配)使用“ -替换“操作(或其他正则表达式操作)以执行命名组的捕获的替换,同时保留原始字符串的其余部分。 For this problem, if anybody's curious, I ended up using the solution below. 对于这个问题,如果有人好奇,我最终使用下面的解决方案。 YMMV, other solutions possible. YMMV,其他解决方案可行。 Many thanks to Ansgar for his feedback and options provided. 非常感谢Ansgar提供的反馈和选择。

In the following code block: 在以下代码块中:

  • $input is a line of text on which substitution is to be performed $ input是要在其上执行替换的一行文本
  • $regex is a regular expression (of type [string]) read from a file that has been verified to contain at least one of the supported named groups $ regex是一个正则表达式(类型为[string]),从已验证包含至少一个受支持的命名组的文件中读取
  • $regexToGroupName is a hash table that maps a regex string to an array of group names ordered according to the order of the array returned by [regex]::GetGroupNames(), which matches the left-to-right order in which they appear in the expression $ regexToGroupName是一个哈希表,它将正则表达式字符串映射到根据[regex] :: GetGroupNames()返回的数组的顺序排序的组名称数组,这些数组匹配它们出现的从左到右的顺序。表达方式
  • $groupNameToVersionNumber is a hash table that maps a group name to a version number. $ groupNameToVersionNumber是一个哈希表,它将组名映射到版本号。

Constraints on the named groups within $regex are only (I think) that the expression within the named groups cannot be nested, and should match at most once within the input string. $ regex中命名组的约束只是(我认为)命名组中的表达式不能嵌套,并且在输入字符串中最多应匹配一次。

# This will give us the index and extent of each substring
# that we will be replacing (the parts that we will not keep)
$matchResults = ([regex]$regex).match($input)

# This will hold substrings from $input that were not captured
# by any of the supported named groups, as well as the replacement
# version strings, properly ordered, but will omit substrings captured
# by the named groups
$lineParts = @()
$startingIndex = 0
foreach ($groupName in $regexToGroupName.$regex)
{
    # Excise the substring leading up to the match for this group...
    $lineParts = $lineParts + $input.Substring($startingIndex, $matchResults.groups[$groupName].Index - $startingIndex)

    # Instead of the matched substring, we'll use the substitution
    $lineParts = $lineParts + $groupNameToVersionNumber.$groupName

    # Set the starting index of the next substring that we will keep...
    $startingIndex = $matchResults.groups[$groupName].Index + $matchResults.groups[$groupName].Length
}

# Keep the end of the original string (if there's anything left)
$lineParts = $lineParts + $input.Substring($startingIndex, $input.Length - $startingIndex)

$newLine = ""
foreach ($part in $lineParts)
{
   $newLine = $newLine + $part
}
$input= $newLine

Regular expressions don't work that way, so you can't. 正则表达式不起作用,所以你不能。 Not directly, that is. 不是直接的,也就是说。 What you can do (short of using a more appropriate regular expression that groups the parts you want to keep ) is to extract the version string and then in a second step replace that substring with the new version string: 您可以做什么(除了使用更合适的正则表达式对要保留的部分进行分组)是提取版本字符串,然后在第二步中用新版本字符串替换该子字符串:

$oldver = $input -replace $regexp, '$1,$2,$3,$4'
$newver = $input -replace $oldver, "$Version1,$Version2,$Version3,$Version4"

Edit: 编辑:

If you don't even know the structure, you must extract that from the regular expression as well. 如果您甚至不知道结构,则必须从正则表达式中提取该结构。

$version = @($version1, $version2, $version3, $version4)
$input -match $regexp
$oldver = $regexp
$newver = $regexp
for ($i = 1; $i -le 4; $i++) {
  $oldver = $oldver -replace "\(\?<version$i>\\d\)", $matches["version$i"]
  $newver = $newver -replace "\(\?<version$i>\\d\)", $version[$i-1]
}
$input -replace $oldver, $newver

Simple Solution 简单解决方案

In the scenario where you simply want to replace a version number found somewhere in your $input text, you could simply do this: 在您只想替换$input文本中某处的版本号的情况下,您可以这样做:

$input -replace '(Version\s+)\d+,\d+,\d+,\d+',"`$1$Version1,$Version2,$Version3,$Version4"

Using Named Captures in PowerShell 在PowerShell中使用命名捕获

Regarding your question about named captures, that can be done by using curly brackets. 关于命名捕获的问题,可以使用大括号来完成。 ie

'dogcatcher' -replace '(?<pet>dog|cat)','I have a pet ${pet}.  '

Gives: 得到:

I have a pet dog.  I have a pet cat.  cher

Issue with multiple captures & solution 多个捕获和解决方案的问题

You can't replace multiple values in the same replace statement, since the replacement string is used for everything. 您不能在同一替换语句中替换多个值,因为替换字符串用于所有内容。 ie if you did this: 即如果你这样做:

 'dogcatcher' -replace '(?<pet>dog|cat)|(?<singer>cher)','I have a pet ${pet}.  I like ${singer}''s songs.  '

You'd get: 你会得到:

I have a pet dog.  I like 's songs.  I have a pet cat.  I like 's songs.  I have a pet .  I like cher's songs.  

...which is probably not what you're hoping for. ......这可能不是你所希望的。

Rather, you'd have to do a match per item: 相反,你必须为每个项目做一个匹配:

'dogcatcher' -replace '(?<pet>dog|cat)','I have a pet ${pet}.  ' -replace '(?<singer>cher)', 'I like ${singer}''s songs.  ' 

...to get: ...要得到:

I have a pet dog.  I have a pet cat.  I like cher's songs.  

More Complex Solution 更复杂的解决方案

Bringing this back to your scenario, you're not actually using the captured values; 将此重新带回您的场景,您实际上并未使用捕获的值; rather you're hoping to replace the spaces they were in with new values. 相反,你希望用新的价值取代他们所在的空间。 For that, you'd simply want this: 为此,你只需要这个:

$input = 'I''m running Programmer''s Notepad version 2.4.2.1440, and am a big fan.  I also have Chrome v    56.0.2924.87 (64-bit).' 

$version1 = 1
$version2 = 3
$version3 = 5
$version4 = 7

$v1Pattern = '(?<=\bv(?:ersion)?\s+)\d+(?=\.\d+\.\d+\.\d+)'
$v2Pattern = '(?<=\bv(?:ersion)?\s+\d+\.)\d+(?=\.\d+\.\d+)'
$v3Pattern = '(?<=\bv(?:ersion)?\s+\d+\.\d+\.)\d+(?=\.\d+)'
$v4Pattern = '(?<=\bv(?:ersion)?\s+\d+\.\d+\.\d+\.)\d+'

$input -replace $v1Pattern, $version1 -replace $v2Pattern, $version2 -replace $v3Pattern,$version3 -replace $v4Pattern,$version4

Which would give: 哪个会给:

I'm running Programmer's Notepad version 1.3.5.7, and am a big fan.  I also have Chrome v    1.3.5.7 (64-bit).

NB: The above could be written as a 1 liner, but I've broken it down to make it simpler to read. 注意:以上内容可以写成1个衬垫,但我已将其分解以使其更易于阅读。

This takes advantage of regex lookarounds; 这利用了正则表达式的外观; a way of checking the content before and after the string you're capturing, without including those in the match. 一种检查您正在捕获的字符串之前和之后的内容的方法,不包括匹配中的内容。 ie so when we select what to replace we can say "match the number that appears after the word version" without saying "replace the word version". 即所以当我们选择要替换的内容时,我们可以说“匹配单词版本后出现的数字”而不说“替换单词版本”。

More info on those here: http://www.regular-expressions.info/lookaround.html 有关这些的更多信息: http//www.regular-expressions.info/lookaround.html

Your Example 你的例子

Adapting the above to work for your example (ie where versions may be separated by commas or dots, and there's no consistency to their format beyond being 4 sets of numbers: 调整上述内容以适用于您的示例(即版本可能用逗号或点分隔,除了4组数字之外,它们的格式不一致:

$input = @'
#define SOME_MACRO(4, 1, 0, 0)

Version "1.2.3.4"

SomeStruct vs = { 99,99,99,99 }
'@

$version1 = 1
$version2 = 3
$version3 = 5
$version4 = 7

$v1Pattern = '(?<=\b)\d+(?=\s*[\.,]\s*\d+\s*[\.,]\s*\d+\s*[\.,]\s*\d+\b)'
$v2Pattern = '(?<=\b\d+\s*[\.,]\s*)\d+(?=\s*[\.,]\s*\d+\s*[\.,]\s*\d+\b)'
$v3Pattern = '(?<=\b\d+\s*[\.,]\s*\d+\s*[\.,]\s*)\d+(?=\s*[\.,]\s*\d+\b)'
$v4Pattern = '(?<=\b\d+\s*[\.,]\s*\d+\s*[\.,]\s*\d+\s*[\.,]\s*)\d+\b'

$input -replace $v1Pattern, $version1 -replace $v2Pattern, $version2 -replace $v3Pattern,$version3 -replace $v4Pattern,$version4

Gives: 得到:

#define SOME_MACRO(1, 3, 5, 7)

Version "1.3.5.7"

SomeStruct vs = { 1,3,5,7 }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM