简体   繁体   中英

How to not match against text that contains specific text in the middle of a multiline pattern using regular expressions?

I'm trying to create a C# regular expression that detects when references in our .csproj files do not have < SpecificVersion> set to False (had to add a space after all <'s to make it show up properly in StackOverflow). So these are the cases that I need to handle:

1. <Reference Include="IQ.MyStuff1, Version=4.1.0.0, Culture=neutral, processorArchitecture=MSIL" />
2. <Reference Include="IQ.MyStuff2, Version=4.7.22.21777, Culture=neutral, processorArchitecture=MSIL">
    <HintPath>..\..\DebugDLLFiles\IQ.MyStuff2.dll</HintPath>
</Reference>
3. <Reference Include="IQ.MyStuff3, Version=4.1.0.0, Culture=neutral, processorArchitecture=MSIL">
    <HintPath>..\..\DebugDLLFiles\IQ.MyStuff3.dll</HintPath>
    <SpecificVersion>True</SpecificVersion>
</Reference>
4. <Reference Include="IQ.MyStuff4, Version=4.5.3.17401, Culture=neutral, processorArchitecture=MSIL">
    <SpecificVersion>True</SpecificVersion>
</Reference>

So basically any file reference that doesn't explicitly have "< SpecificVersion>False< /SpecificVersion>" in it.

So let's ignore the first case because it doesn't have a body like the other 3 and can be treated differently. So here is what I have so far:

<Reference(\s|\n|\r)*?  # Match against '<Reference '.
Include=""IQ\..*?""     # Match against the entire Include attribute; We only care about IQ DLLs.
(\s|\n\r)*?>            # Eat any whitespace and match against the closing tag character.
[What should go here?]
</Reference>            # Match against the closing tag.

So I've tried numerous things in the [What should go here?] block, but can't seem to get any to work quite perfectly. The closest I came was using the following for this block:

(?!                     # Do a negative look-ahead to NOT match against this Reference tag if it already has <SpecificVersion>False</SpecificVersion>.
    (.|\n|\r)*?         # Eat everything before the <SpecificVersion> tag, if it even exists.
    <SpecificVersion>(\s|\n|\r)*?False(\s|\n|\r)*?</SpecificVersion>    # Specify that we don't want to match if this tag already has <SpecificVersion>False</SpecificVersion>.
)
(.|\n|\r)*?             # Eat everything after the <SpecificVersion> tag, if it even existed.

This works for all cases, except for where there is a valid reference below any of the ones I want to match against, where a valid reference would look something like:

<Reference Include=\"IQ.MyStuff5, Version=4.5.3.17401, Culture=neutral, processorArchitecture=MSIL\">
    <SpecificVersion>False</SpecificVersion>
</Reference>

It seems that the look-ahead I'm using doesn't stop at the < /Reference> tag, but continues looking down the entire file to make sure no text below it has "< SpecificVersion>False< /SpecificVersion>".

How can I make my look-ahead stop at the first "< /Reference>" it encounters, or if you have another way to solve my problem I'm open to that too. Any suggestions are appreciated. Thanks.

Give up with Regex. It's doomed. Isn't it XML? Why not treat it as such?

The " don't parse HTML with regex " rule applies equally to XML.

If you want to give regex a try anyway, I'd suggest something like this:

<Reference[^>]*?>(?:.(?!</Reference>))*?<SpecificVersion>([^<]*?)</SpecificVersion>

It matches all tags which have the tag inside - ie it will completely ignore any Reference tag that doesn't have the tag.

  • it looks for the Reference tag
  • matches everything that is not a closing Reference tag until it finds the tag
  • then it captures the value inside the tag

I tested it online in regexpal and it seems to work correctly in multiple cases.

EDIT:

  • use RegexOptions.Singleline to make dot match new lines

If you want to match the case when the SpecificVersion tag is not present at all, try this alteration - it will tryMatch the option with the tag, but if it fails it will still match the

<Reference[^>]*?>(?:.(?!</Reference>))*?(<SpecificVersion>([^<]*?)</SpecificVersion>)|<Reference[^>]*?>(?:.(?!</Reference>))*?(?:<SpecificVersion>([^<]*?)</SpecificVersion>)?

Let me know how you're getting on.

So following spender's advice I looked into regex alternatives. I discovered Linq To XML and it made solving my problem very easy. Here is the code I ended using to solve my problem. It finds all references in a .csproj file to IQ DLL files and ensures that they all have a < SpecificVersion>False< /SpecificVersion> element. Just for some background info, the reason I need to do this is that our builds run fine on our local machines when Specific Version is set to True, but it breaks on our TFS build server unless it is set to False. I'm pretty sure the reason for this is that our TFS build updates the version number, so then the version that each project is set to use is out-of-date. Anyways, here's the code :)

// Let's parse us some XML!
XElement xmlFile = XElement.Load(filePath);

// Grab all of the references to DLL files.
var iqReferences = xmlFile.Descendants().Where(e => e.Name.LocalName.Equals("Reference", StringComparison.InvariantCultureIgnoreCase));

// We only care about iQ DLL files.
iqReferences = iqReferences.Where(r => r.Attribute("Include") != null && r.Attribute("Include").Value.StartsWith("IQ.", StringComparison.InvariantCultureIgnoreCase));

// If this project file doesn't reference any iQ DLL files, move on to the next project file.
if (!iqReferences.Any())
    continue;

// Make sure they all have <SpecificVersion> set to False.
foreach (XElement reference in iqReferences)
{
    // If this Reference element already has a child SpecificVersion element whose value is false, skip this reference since it is good.
    if (reference.Elements().Where(e => e.Name.LocalName.Equals("SpecificVersion", StringComparison.InvariantCultureIgnoreCase))
        .Any(e => e.Value.Equals("False", StringComparison.InvariantCultureIgnoreCase)))
        continue;

    // Add this reference to the list of bad references.
    badReferences.AppendLine("\t" + reference.Attribute("Include").Value);

    // Fix the reference.
    reference.SetElementValue(reference.Name.Namespace + "SpecificVersion", "False");
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM