简体   繁体   English

正则表达式匹配包含特定属性值的所有xml标记

[英]regex match all xml tags that contain a certain attribute value

I have an xml file where I want to match all xml tags that contain an attribute matching a certain string in Perl. 我有一个xml文件,我想在其中匹配包含与Perl中的某个字符串匹配的属性的所有xml标记。

Sample XML: 样本XML:

<item attr="Car" />
<item attr="Apple_And_Pears.htm#123" />
<item attr="Paper" />
<item attr="Orange_And_Peach.htm#213" />

I want a regex that grabs all nodes that has an attribute that contains ".htm" 我想要一个正则表达式,以捕获具有属性包含“ .htm”的所有节点

<item attr="Orange_And_Peach.htm#213" />
<item attr="Apple_And_Pears.htm#123" />

With the following regex, I'm matching with all tags rather than only tags with .htm attribute: 使用以下正则表达式,我将与所有标签匹配,而不仅仅是具有.htm属性的标签:

<item.*?attr="[^>]*>

Is there some sort of positive lookahead until a certain character? 在确定角色之前是否有某种积极的前瞻性?

Thanks 谢谢

The appropriate Perl solution is not regex. 合适的Perl解决方案不是regex。 With Mojo::DOM (one of many options): 使用Mojo :: DOM (许多选项之一):

use strict;
use warnings;
use Mojo::DOM;
use File::Slurper 'read_text';

my $xml = read_text 'test.xml';
my $dom = Mojo::DOM->new->xml(1)->parse($xml);
my $tags = $dom->find('item[attr*=".htm"]');
print "$_\n" for @$tags;

As Grinnz suggested you should use an approriate xml-parser (check out this interesting post on stackoverflow explaining why), but since you asked for it here's a simple regex you could use with a positive lookahead: 正如Grinnz所建议的那样,您应该使用适当的xml解析器(在stackoverflow上查看此有趣的文章以解释原因),但是由于您要这样做,因此这里有一个简单的正则表达式,可以使用积极的前瞻性:

<item.*?attr=".*(?=\\.htm).*

If you want to match tags with only one ".htm" in it, you can use both a negative and positive lookaround: 如果要匹配仅包含一个“ .htm”的标记,则可以使用否定和肯定的查找:

^(?:(?!\\.htm).)*\\.htm(?!.*\\.htm).*$

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM