[英]regex match all xml tags that contain a certain attribute value
I have an xml file where I want to match all xml tags that contain an attribute matching a certain string in Perl. 我有一个xml文件,我想在其中匹配包含与Perl中的某个字符串匹配的属性的所有xml标记。
Sample XML: 样本XML:
<item attr="Car" />
<item attr="Apple_And_Pears.htm#123" />
<item attr="Paper" />
<item attr="Orange_And_Peach.htm#213" />
I want a regex that grabs all nodes that has an attribute that contains ".htm" 我想要一个正则表达式,以捕获具有属性包含“ .htm”的所有节点
<item attr="Orange_And_Peach.htm#213" />
<item attr="Apple_And_Pears.htm#123" />
With the following regex, I'm matching with all tags rather than only tags with .htm attribute: 使用以下正则表达式,我将与所有标签匹配,而不仅仅是具有.htm属性的标签:
<item.*?attr="[^>]*>
Is there some sort of positive lookahead until a certain character? 在确定角色之前是否有某种积极的前瞻性?
Thanks 谢谢
The appropriate Perl solution is not regex. 合适的Perl解决方案不是regex。 With Mojo::DOM (one of many options):
使用Mojo :: DOM (许多选项之一):
use strict;
use warnings;
use Mojo::DOM;
use File::Slurper 'read_text';
my $xml = read_text 'test.xml';
my $dom = Mojo::DOM->new->xml(1)->parse($xml);
my $tags = $dom->find('item[attr*=".htm"]');
print "$_\n" for @$tags;
As Grinnz suggested you should use an approriate xml-parser (check out this interesting post on stackoverflow explaining why), but since you asked for it here's a simple regex you could use with a positive lookahead: 正如Grinnz所建议的那样,您应该使用适当的xml解析器(在stackoverflow上查看此有趣的文章以解释原因),但是由于您要这样做,因此这里有一个简单的正则表达式,可以使用积极的前瞻性:
<item.*?attr=".*(?=\\.htm).*
If you want to match tags with only one ".htm" in it, you can use both a negative and positive lookaround: 如果要匹配仅包含一个“ .htm”的标记,则可以使用否定和肯定的查找:
^(?:(?!\\.htm).)*\\.htm(?!.*\\.htm).*$
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.