简体   繁体   English

使用正则表达式从字符串中删除属性值不是特定值的所有 xml 节点

[英]Remove all xml nodes where attribute value is not of specific values from string with regex

I would like to remove all xml nodes where name is not a number of values:我想删除名称不是多个值的所有 xml 节点:

<Property Name="Operation" Type="String" Access="ReadWrite" Value="ProduceFile" />
<Property Name="BackOfficeType" Type="String" Access="ReadWrite" Value="growBusiness Solutions" />
<Property Name="module" Type="String" Access="ReadWrite" Value="Document" />
<Property Name="vti_pluggableparserversion" Type="String" Access="ReadOnly" Value="16.0.0.20405" />
<Property Name="_Author" Type="String" Access="ReadWrite" Value="hfhf fghfgh" />
<Property Name="modifiedBy" Type="String" Access="ReadWrite" Value="fghfghfghfg" />
<Property Name="vti_parserversion" Type="String" Access="ReadOnly" Value="16.0.0.20405" />

How do I remove all element above with regex where Name is not Operation or module?如何使用名称不是操作或模块的正则表达式删除上面的所有元素?

I was thinking something like:我在想类似的事情:

xml = Regex.Replace(xml, @"<Property Name=""(?!Operation |module)"".*?/>", "");

But this is not working.但这不起作用。

I don't understand why, because " is not a special character in the C# Regex system, but removing the second quote and the space after Operation makes it work (this is without the necessary escaping):我不明白为什么,因为"不是 C# 正则表达式系统中的特殊字符,但是删除第二个引号和Operation后的空格使其工作(这没有必要的转义):

<Property Name="(?!Operation|module).*?/>

I'll update this answer if I figure out what's going on with that second quote.如果我弄清楚第二个报价发生了什么,我会更新这个答案。

EDIT: Well I feel a fool for not noticing this myself.编辑:嗯,我觉得自己没有注意到这一点是个傻瓜。 A friend of mine pointed out that by having Name="(?!Operation|module)" it essentially says "Only match on Name="" . If you add the following example to your sample data you'll see that's what is happening:我的一个朋友指出,通过Name="(?!Operation|module)"它本质上说“仅在Name=""上匹配。如果您将以下示例添加到您的示例数据中,您会看到正在发生的事情:

<Property Name="" Type="String" Access="ReadOnly" Value="16.0.0.20405" />

So adding another wildcard inside the quotes will allow it to match on all the entries that don't have "Operation" or "module" in them:因此,在引号内添加另一个通配符将允许它匹配所有没有“操作”或“模块”的条目:

<Property Name="(?!Operation|module).*".*?/>

However, this raises a new issue, which is now if you have Name="Operation Awesome" the filtering group will ignore it as well.但是,这引发了一个新问题,如果您有Name="Operation Awesome" ,过滤组也会忽略它。 So the negative lookahead would have to be changed somehow to specifically ignore exact words and not property names simply containing the words.因此,必须以某种方式更改负前瞻,以专门忽略确切的单词,而不是简单地包含单词的属性名称。 So how do we do that?那么我们该怎么做呢?

<Property Name=(?!"Operation"|"module").*?/>

This ensures it only keep an exact match of "Operation" or "module" .这确保它只保持"Operation""module"的精确匹配。 The only side effect present now is it will delete any malformed XML like PropertyName="Operation Type="string" . You may consider this a negative, but if you want to be able to handle invalid XML you should be doing so with another method before this one.现在唯一的副作用是它将删除任何格式错误的 XML ,例如PropertyName="Operation Type="string" 。您可能认为这是负面的,但如果您希望能够处理无效的 XML ,您应该使用另一种方法在这个之前。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM