正则表达式 - 匹配特定字符（多次）并忽略注释

Question

我不是正则表达式方面的专家，需要一些帮助来设置一个。

我正在使用 Powershell 及其 [regex] 类型，这是一个C# 类，最终目标是读取 toml 文件（底部的示例数据，或使用此链接到 regex101 ），其中我需要：

匹配一些值（“__”之间的值）
忽略评论。 （注释以“#”开头）

要匹配值并将它们放入捕获组，以下正则表达式有效：

match the template value (values between "__" ):
__(?<tokenName>[\w\.]+)__

我也想忽略注释行，我想出了这个：

Ignore lines that start with a comment (even if "#" is preceded by spaces or tabs):
^(?!\s*\t*#).*

当我把它们放在一起时问题就开始了

^(?!\s*\t*#).*__(?<tokenName>[\w\.]+)__

这个表达式有以下问题：

每行最多一场比赛，最后一场（即：在带有“Prop5 = ...”的行中，我得到一场比赛而不是两场比赛）
不考虑行尾的注释（即：带有“Prop4 = ...”的行有两个匹配项而不是一个匹配项）

我也试过

add this at the end of the expression, it should stop the match on the first occurrence of the character
[^#]

add this at the beginning, which should check if the matched string has the given char before it and exclude it
(?<!^#)

这是我的数据示例

#templateFile
[Agent]
    Prop1 = "__Data.Agent.Prop1__"
    Prop2 = [__Data.Agent.Prop2__]
    #I'm a comment
    #Prop3 = "__NotUsed__"
    Prop4 = [__Data.Agent.Prop4__] #sample usage comment __Data.Agent.xxx__
    Prop5 = ["__Data.Agent.Prop5a__","__Data.Agent.Prop5b__"]

我认为更简单的解决方案是匹配给定的字符串，前提是在同一行之前没有“#”。 是否可以？

编辑：

@the-fourth-bird 提出的第一个表达式完美地工作，它只需要指定多行修饰符。 最终（可运行）结果在 PowerShell 中如下所示。

[regex]$reg = "(?m)(?<!^.*#.*)__(?<tokenName>[\w.]+)__"

$text = '
#templateFile
[Agent]
    Prop1 = "__Data.Agent.Prop1__"
    Prop2 = [__Data.Agent.Prop2__]
    Prop5 = ["__Data.Agent.Prop5a__","__Data.Agent.Prop5b__"]
    #a comment
    #Prop3 = "__Data.Agent.Prop3__"
    Prop4 = [__Data.Agent.Prop4__] #sample usage comment __Data.Agent.xxx__
'

$reg.Matches($text) | Format-Table
#This returns
Groups         Success Name Captures Index Length Value
------         ------- ---- -------- ----- ------ -----
{0, tokenName}    True 0    {0}         31     20 __Data.Agent.Prop1__
{0, tokenName}    True 0    {0}         62     20 __Data.Agent.Prop2__
{0, tokenName}    True 0    {0}         94     21 __Data.Agent.Prop5a__
{0, tokenName}    True 0    {0}        118     21 __Data.Agent.Prop5b__
{0, tokenName}    True 0    {0}        194     20 __Data.Agent.Prop4__

Answer 1

我认为你可以利用无限重复来检查前面的内容是否不包含#来解释 Prop4 中的评论

(?<!^.*#.*)__(?<tokenName>[\w.]+)__

.Net 正则表达式演示

如果 Prop4 应该有 2 个匹配项，您可以使用：

(?<!^[ \t]*#.*)__(?<tokenName>[\w.]+)__

.NET 正则表达式演示

这两个表达式都需要多行修饰符才能正常工作。 它可以通过在开头添加 (?m) 来内联指定。 （或通过在支持它的构造函数中指定它）

(?m)(?<!^.*#.*)__(?<tokenName>[\w.]+)__

正则表达式 - 匹配特定字符（多次）并忽略注释

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-01-22 17:51:25

正则表达式 - 匹配特定字符（多次）并忽略注释

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-01-22 17:51:25

解决方案1
1 已采纳 2020-01-22 17:51:25