简体   繁体   English

将 Perl 与正则表达式一起使用,如何删除字符串中的字符串?

[英]Using Perl with Regex, how can I remove a string within a string?

So I have several XML files that have persons with unique IDs and they each have a favorite food (a person can be in several xml files):所以我有几个 XML 文件,这些文件有具有唯一 ID 的人,他们每个人都有自己喜欢的食物(一个人可以在几个 xml 文件中):

There are cases where the person with id=300 might have the food right in the beginning of the tag.在某些情况下,id=300 的人可能在标签的开头就有食物。

<person id="299">
    <food>
       <type> Hot Dog </type>
    </food>
</person>
<person id="300">
    <food>
       <type> Burger</type>
    </food>
</person>

Or there might be other tags before the food tag或者食物标签之前可能有其他标签

<person id="300">
    <year>
       <birth> 1990 </birth>
       <marriage> 2020 </marriage>
    </year>
    <food>
       <type> Vegan </type>
    </food>
</person>

I need to use a single Perl RegEx functions to remove the food tags ONLY of the persons whose ID is 300, independently if it is at the beginning, middle, or end of the person tag我需要使用单个 Perl RegEx 函数仅删除 ID 为 300 的人的食物标签,如果它位于人员标签的开头、中间或末尾

I know if it was for the whole person tag I could use something like:我知道如果它是针对整个人的标签,我可以使用类似的东西:

$fileContents =~ s/<person id=\"300\"[^<]+<\/person>//g;

But I must leave the person tag intact, I must only remove the food tag inside the person tag, but I can't remove all the food tags because I need to leave it for people with other ID's.但是我必须保持人员标签完好无损,我只能删除人员标签内的食物标签,但我不能删除所有食物标签,因为我需要将它留给其他ID的人。

Could you help me please??请问你能帮帮我吗?? I been struggling a lot with this D:我一直在为这个 D 苦苦挣扎:

You can't safely do that with a substitution.你不能通过替换安全地做到这一点。

And even a half-assed approach is more complicated than using an existing XML parser.即使是半途而废的方法也比使用现有的 XML 解析器更复杂。

$_->unbindNode()
   for $doc->findnodes('//person[@id="300"]/food');

Full solution:完整解决方案:

use XML::LibXML qw( );

# my $doc = XML::LibXML->new->parse_file(...);
#    or
# my $doc = XML::LibXML->new->parse_string(...);

$_->unbindNode()
   for $doc->findnodes('//person[@id="300"]/food');

# $doc->toFile(...)
#    or
# $doc->toString(...)
perl -i.bk -pe'BEGIN{undef$/}s|<person (.*?)>.*?</person>|$p=$&;$1=~/id="300"/?$p=~s,<food>.*?</food>,,sr:$p|esg' files*.xml

...removes <food>.....</food> from persons with id="300" in one or more files*.xml. ...从一个或多个文件*.xml 中 id="300" 的人中删除<food>.....</food> The original files are kept and renamed with .bk added to their file names.原始文件被保留并重命名,并在其文件名中添加了.bk So only run this once if you need to keep the original files...or change -i.bk into for example -i.bk$(date +%Y%m%d%h%M%S) .因此,如果您需要保留原始文件,请仅运行一次...或将-i.bk更改为例如-i.bk$(date +%Y%m%d%h%M%S)

Note: I think ikegami's answer is much better.注意:我认为ikegami的答案要好得多。

But sometimes one writes perl for systems not allowing extra modules and XML::LibXML sadly isn't a core module.但有时有人会为不允许额外模块的系统编写 perl ,而 XML::LibXML 遗憾地不是核心模块。 And sometimes half-assed XML might be best/fastest handled with half-assed methods.有时半途而废的 XML 可能最好/最快地使用半途而废的方法处理。 Perhaps "XML" written by something beyond your control.也许“XML”是由您无法控制的东西编写的。 Maybe it's missing a root node for the list of persons, like in the first example here (the list of <person> s could be surrounded with <list> ... </list> to make it readable to XML::LibXML) Or with ' or " missing around attribute values, which also wouldn't be readable to XML::LibXML right away.也许它缺少人员列表的根节点,就像这里的第一个示例一样( <person>的列表可以用<list> ... </list>包围,以使其对 XML::LibXML 可读)或者在属性值周围缺少 ' 或 ",这也不会立即被 XML::LibXML 读取。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM