简体   繁体   English

如何使用正则表达式与多行模式中间包含特定文本的文本不匹配?

[英]How to not match against text that contains specific text in the middle of a multiline pattern using regular expressions?

I'm trying to create a C# regular expression that detects when references in our .csproj files do not have < SpecificVersion> set to False (had to add a space after all <'s to make it show up properly in StackOverflow). 我正在尝试创建一个C#正则表达式,以检测.csproj文件中的引用何时未将<SpecificVersion>设置为False(必须在所有<之后添加一个空格,以使其在StackOverflow中正确显示)。 So these are the cases that I need to handle: 因此,这些是我需要处理的情况:

1. <Reference Include="IQ.MyStuff1, Version=4.1.0.0, Culture=neutral, processorArchitecture=MSIL" />
2. <Reference Include="IQ.MyStuff2, Version=4.7.22.21777, Culture=neutral, processorArchitecture=MSIL">
    <HintPath>..\..\DebugDLLFiles\IQ.MyStuff2.dll</HintPath>
</Reference>
3. <Reference Include="IQ.MyStuff3, Version=4.1.0.0, Culture=neutral, processorArchitecture=MSIL">
    <HintPath>..\..\DebugDLLFiles\IQ.MyStuff3.dll</HintPath>
    <SpecificVersion>True</SpecificVersion>
</Reference>
4. <Reference Include="IQ.MyStuff4, Version=4.5.3.17401, Culture=neutral, processorArchitecture=MSIL">
    <SpecificVersion>True</SpecificVersion>
</Reference>

So basically any file reference that doesn't explicitly have "< SpecificVersion>False< /SpecificVersion>" in it. 因此,基本上任何文件引用中都没有显式包含“ <SpecificVersion> False </ SpecificVersion>”。

So let's ignore the first case because it doesn't have a body like the other 3 and can be treated differently. 因此,让我们忽略第一种情况,因为它没有像其他三种情况那样的主体,并且可以区别对待。 So here is what I have so far: 所以这是我到目前为止所拥有的:

<Reference(\s|\n|\r)*?  # Match against '<Reference '.
Include=""IQ\..*?""     # Match against the entire Include attribute; We only care about IQ DLLs.
(\s|\n\r)*?>            # Eat any whitespace and match against the closing tag character.
[What should go here?]
</Reference>            # Match against the closing tag.

So I've tried numerous things in the [What should go here?] block, but can't seem to get any to work quite perfectly. 因此,我在[这里应该走什么?]块中尝试了许多事情,但是似乎无法使它们完美地工作。 The closest I came was using the following for this block: 我最接近的那个块使用以下代码:

(?!                     # Do a negative look-ahead to NOT match against this Reference tag if it already has <SpecificVersion>False</SpecificVersion>.
    (.|\n|\r)*?         # Eat everything before the <SpecificVersion> tag, if it even exists.
    <SpecificVersion>(\s|\n|\r)*?False(\s|\n|\r)*?</SpecificVersion>    # Specify that we don't want to match if this tag already has <SpecificVersion>False</SpecificVersion>.
)
(.|\n|\r)*?             # Eat everything after the <SpecificVersion> tag, if it even existed.

This works for all cases, except for where there is a valid reference below any of the ones I want to match against, where a valid reference would look something like: 这适用于所有情况,除了在我要匹配的对象下方有有效引用的地方,有效引用看起来像这样:

<Reference Include=\"IQ.MyStuff5, Version=4.5.3.17401, Culture=neutral, processorArchitecture=MSIL\">
    <SpecificVersion>False</SpecificVersion>
</Reference>

It seems that the look-ahead I'm using doesn't stop at the < /Reference> tag, but continues looking down the entire file to make sure no text below it has "< SpecificVersion>False< /SpecificVersion>". 似乎我正在使用的前瞻功能不会在</ Reference>标记处停止,而是继续向下查找整个文件,以确保其下面的文本均没有“ <SpecificVersion> False </ SpecificVersion>”。

How can I make my look-ahead stop at the first "< /Reference>" it encounters, or if you have another way to solve my problem I'm open to that too. 我该如何在它遇到的第一个“ </ Reference>”处停止前进,或者如果您有另一种方法来解决我的问题,我也可以接受。 Any suggestions are appreciated. 任何建议表示赞赏。 Thanks. 谢谢。

Give up with Regex. 放弃正则表达式。 It's doomed. 注定了 Isn't it XML? 是不是XML? Why not treat it as such? 为什么不这样对待呢?

The " don't parse HTML with regex " rule applies equally to XML. 不使用正则表达式解析HTML ”规则同样适用于XML。

If you want to give regex a try anyway, I'd suggest something like this: 无论如何,如果您想尝试一下正则表达式,建议您使用以下方法:

<Reference[^>]*?>(?:.(?!</Reference>))*?<SpecificVersion>([^<]*?)</SpecificVersion>

It matches all tags which have the tag inside - ie it will completely ignore any Reference tag that doesn't have the tag. 它匹配所有包含标签的标签-即它将完全忽略没有标签的任何参考标签。

  • it looks for the Reference tag 它寻找参考标签
  • matches everything that is not a closing Reference tag until it finds the tag 匹配所有不是结束引用标签的内容,直到找到该标签
  • then it captures the value inside the tag 然后捕获标签内的值

I tested it online in regexpal and it seems to work correctly in multiple cases. 我在正则表达式上在线测试了它,它在多种情况下似乎都能正常工作。

EDIT: 编辑:

  • use RegexOptions.Singleline to make dot match new lines 使用RegexOptions.Singleline使点与新行匹配

If you want to match the case when the SpecificVersion tag is not present at all, try this alteration - it will tryMatch the option with the tag, but if it fails it will still match the 如果您要匹配完全不存在SpecificVersion标记的情况,请尝试此更改-它会尝试将选项与标记匹配,但如果失败,它将仍然与

<Reference[^>]*?>(?:.(?!</Reference>))*?(<SpecificVersion>([^<]*?)</SpecificVersion>)|<Reference[^>]*?>(?:.(?!</Reference>))*?(?:<SpecificVersion>([^<]*?)</SpecificVersion>)?

Let me know how you're getting on. 让我知道你过得怎么样。

So following spender's advice I looked into regex alternatives. 因此,根据支出者的建议,我研究了正则表达式的替代品。 I discovered Linq To XML and it made solving my problem very easy. 我发现了Linq To XML,这使解决我的问题变得非常容易。 Here is the code I ended using to solve my problem. 这是我用来解决问题的最终代码。 It finds all references in a .csproj file to IQ DLL files and ensures that they all have a < SpecificVersion>False< /SpecificVersion> element. 它会在.csproj文件中找到对IQ DLL文件的所有引用,并确保它们都具有<SpecialVersion> False </ SpecificVersion>元素。 Just for some background info, the reason I need to do this is that our builds run fine on our local machines when Specific Version is set to True, but it breaks on our TFS build server unless it is set to False. 仅出于某些背景信息,我需要这样做的原因是,当“特定版本”设置为True时,我们的构建在本地计算机上运行良好,但是除非将其设置为False,否则它在TFS构建服务器上会中断。 I'm pretty sure the reason for this is that our TFS build updates the version number, so then the version that each project is set to use is out-of-date. 我很确定这是因为我们的TFS构建会更新版本号,因此每个项目设置要使用的版本都是过时的。 Anyways, here's the code :) 无论如何,这是代码:)

// Let's parse us some XML!
XElement xmlFile = XElement.Load(filePath);

// Grab all of the references to DLL files.
var iqReferences = xmlFile.Descendants().Where(e => e.Name.LocalName.Equals("Reference", StringComparison.InvariantCultureIgnoreCase));

// We only care about iQ DLL files.
iqReferences = iqReferences.Where(r => r.Attribute("Include") != null && r.Attribute("Include").Value.StartsWith("IQ.", StringComparison.InvariantCultureIgnoreCase));

// If this project file doesn't reference any iQ DLL files, move on to the next project file.
if (!iqReferences.Any())
    continue;

// Make sure they all have <SpecificVersion> set to False.
foreach (XElement reference in iqReferences)
{
    // If this Reference element already has a child SpecificVersion element whose value is false, skip this reference since it is good.
    if (reference.Elements().Where(e => e.Name.LocalName.Equals("SpecificVersion", StringComparison.InvariantCultureIgnoreCase))
        .Any(e => e.Value.Equals("False", StringComparison.InvariantCultureIgnoreCase)))
        continue;

    // Add this reference to the list of bad references.
    badReferences.AppendLine("\t" + reference.Attribute("Include").Value);

    // Fix the reference.
    reference.SetElementValue(reference.Name.Namespace + "SpecificVersion", "False");
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用正则表达式在多行文本中查找键模式 - How to find key pattern in multiline text using regular expression 使用正则表达式匹配和替换文本中的字符串 - Match and replace string in text using regular expressions 使用正则表达式阅读文本 - Reading the text using regular expressions 正则表达式(.NET)-如何匹配在字符串末尾包含可变位数的模式? - Regular Expressions (.NET) - How can I match a pattern that contains a variable number of digits at the end of the string? 如何使用正则表达式提取text.text信息? - How to extract text.text information using regular expressions? 将C#中带有正则表达式的多行文本块拆分为matchcollection - Splitting a multiline block of text with regular expressions in c# into a matchcollection 尝试使用C#正则表达式来匹配文本文件中的特定模式和模式序列 - Trying to use C# Regular Express to Match a Specific Pattern and Pattern Sequence from a Text File 如何使用正则表达式获取 td 标签之间的文本 - How to get text between td tags using regular expressions 使用C#中的正则表达式返回包含匹配项的整行 - Return the whole line that contains a match using regular expressions in c# 使用正则表达式解析结构化文本 - Parsing structured text using regular expressions
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM