[英]Notepad++ deleting tags with specific text inside
我有一个带有产品的大型XML文件。 我正在尝试删除所有缺货的产品。 文件大小超过20MB。
<product>
<name>bla1</name>
<price>50$</price>
<stock>yes</stock>
<description>bla</description>
</product>
<product>
<name>bla2</name>
<price>60$</price>
<stock>no</stock>
<description>bla</description>
</product>
...
是否可以使用Notepad ++的正则表达式删除它们,还是应该使用simpleXML(PHP)或类似的东西?
我的基本PHP代码:
$url = 'input/products.xml';
$xml = new SimpleXMLElement(file_get_contents($url));
foreach ($xml->product->children() as $product) {
//finding out of stock products and deleting them
}
$xml->asXml('output/products.xml');
通过正则表达式进行模式匹配不是理想的选择,如果您可以访问PHP,那么我建议您使用适当的HTLM解析工具。 话虽如此,我提供了可以在Notepad ++中使用的解决方案
<product\\s*(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\\s>]*)*?\\s?\\/?>(?:(?!</product).)*<stock\\s*(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\\s>]*)*?\\s?\\/?>no</stock>(?:(?!</product).)*<\\/product>
用。。。来代替: 没有
要更好地查看图像,可以右键单击它并选择在新窗口中查看。
此正则表达式将执行以下操作:
stock
stock
的价值为no
在Notepad ++中,请注意,您应该使用notpad ++版本6.1或更高版本,因为旧版本中的正则表达式存在问题,现已解决。
按ctrl h进入查找和替换模式
选择正则表达式选项
在“查找内容”字段中放置正则表达式
在“替换为”字段中输入“
点击全部替换
现场演示
https://regex101.com/r/cW9nC5/1
示范文本
<product>
<name>bla1</name>
<price>50$</price>
<stock>yes</stock>
<description>bla</description>
</product>
<product>
<name>bla2</name>
<price>60$</price>
<stock>no</stock>
<description>bla</description>
</product>
更换后
<product>
<name>bla1</name>
<price>50$</price>
<stock>yes</stock>
<description>bla</description>
</product>
NODE EXPLANATION
----------------------------------------------------------------------
<product '<product'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the least amount possible)):
----------------------------------------------------------------------
[^>=] any character except: '>', '='
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=' '=\''
----------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=" '="'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
= '='
----------------------------------------------------------------------
[^'"] any character except: ''', '"'
----------------------------------------------------------------------
[^\s>]* any character except: whitespace (\n,
\r, \t, \f, and " "), '>' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
\s? whitespace (\n, \r, \t, \f, and " ")
(optional (matching the most amount
possible))
----------------------------------------------------------------------
\/? '/' (optional (matching the most amount
possible))
----------------------------------------------------------------------
> '>\r\n'
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
</product '</product'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
<stock '<stock'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the least amount possible)):
----------------------------------------------------------------------
[^>=] any character except: '>', '='
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=' '=\''
----------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=" '="'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
= '='
----------------------------------------------------------------------
[^'"] any character except: ''', '"'
----------------------------------------------------------------------
[^\s>]* any character except: whitespace (\n,
\r, \t, \f, and " "), '>' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
\s? whitespace (\n, \r, \t, \f, and " ")
(optional (matching the most amount
possible))
----------------------------------------------------------------------
\/? '/' (optional (matching the most amount
possible))
----------------------------------------------------------------------
>no</stock> '>no</stock>'
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
</product '</product'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
< '<'
----------------------------------------------------------------------
\/ '/'
----------------------------------------------------------------------
product> 'product>'
----------------------------------------------------------------------
我想notepad ++会更容易,例如:
查找: <product>\\s+<name>.*?<\\/name>\\s+<price>.*?<\\/price>\\s+<stock>no<\\/stock>\\s+<description>.*?\\/description>\\s+<\\/product>
替换:一无所有
DEMO
https://regex101.com/r/fH0mM7/1
注意
确保检查底部的Regular Expression
您可以使用以下代码使用PHP进行此操作
<?php
$url = 'input/products.xml';
$xml = new SimpleXMLElement(file_get_contents($url));
$i = count($xml) - 1;
for ($i; $i >= 0; --$i) {
$product = $xml->product[$i];
if ($product->stock == "no") {
unset($xml->product[$i]);
}
}
$xml->asXml('output/products.xml');
?>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.