简体   繁体   English

Python 正则表达式:获取具有特定模式的字符串

[英]Python regular expression: Getting string with certain pattern

I need some help with regular expression in python.我需要一些关于 python 中的正则表达式的帮助。 I have many files which, unfortunately, contains invaid xml: as below some 'img' tag is not closed.不幸的是,我有很多文件包含无效的 xml:如下所示,一些“img”标签未关闭。

<a href="a_url" class="url" target="_blank">
   <img ng-src=" {{getImageUrl('homeHelp64.jpg')}}" alt="?" width="16" height="16">
      Home
</a>

In Python what I would like to do is to find all such 'img' tags that are not closed and replace it with close tag like below ('/' before >):在 Python 中,我想做的是找到所有未关闭的“img”标签并将其替换为如下所示的关闭标签(> 之前的“/”):

<img ng-src=" {{getImageUrl('homeHelp64.jpg')}}" alt="?" width="16" height="16"/>

using the following pattern I can get all instances of img tag but i need to get only the ones that are not closed.使用以下模式,我可以获得 img 标签的所有实例,但我只需要获取未关闭的实例。

pattern = '(img.*?)>'

will appreciate your help in defining the pattern and how to replace the 'img' and close the xml tag at the end.将感谢您在定义模式以及如何替换“img”和最后关闭 xml 标签方面的帮助。

If I understood the problem correctly, I managed to write the required regular expression.如果我正确理解了问题,我设法编写了所需的正则表达式。 By link, you can test the expression.通过链接,您可以测试表达式。 Required flag - s必需标志 - s

https://regex101.com/r/YctLzb/5/tests https://regex101.com/r/YctLzb/5/tests

\<(\w+)\s[^<>]+\>(?!.*\<\/\1\>)

Update: this works without s flag and matches tags without attrs: https://regex101.com/r/6nwHQP/1/tests更新:这在没有 s 标志的情况下有效,并且匹配没有 attrs 的标签: https://regex101.com/r/6nwHQP/1/tests

\<(\w+)(?:\s[^<>]+)?\>(?!(?:.|\n)*\<\/\1\>)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM