Python 正则表达式：获取具有特定模式的字符串

Question

I need some help with regular expression in python.我需要一些关于 python 中的正则表达式的帮助。 I have many files which, unfortunately, contains invaid xml: as below some 'img' tag is not closed.不幸的是，我有很多文件包含无效的 xml：如下所示，一些“img”标签未关闭。

<a href="a_url" class="url" target="_blank">
   <img ng-src=" {{getImageUrl('homeHelp64.jpg')}}" alt="?" width="16" height="16">
      Home
</a>

In Python what I would like to do is to find all such 'img' tags that are not closed and replace it with close tag like below ('/' before >):在 Python 中，我想做的是找到所有未关闭的“img”标签并将其替换为如下所示的关闭标签（> 之前的“/”）：

<img ng-src=" {{getImageUrl('homeHelp64.jpg')}}" alt="?" width="16" height="16"/>

using the following pattern I can get all instances of img tag but i need to get only the ones that are not closed.使用以下模式，我可以获得 img 标签的所有实例，但我只需要获取未关闭的实例。

pattern = '(img.*?)>'

will appreciate your help in defining the pattern and how to replace the 'img' and close the xml tag at the end.将感谢您在定义模式以及如何替换“img”和最后关闭 xml 标签方面的帮助。

Answer 1

If I understood the problem correctly, I managed to write the required regular expression.如果我正确理解了问题，我设法编写了所需的正则表达式。 By link, you can test the expression.通过链接，您可以测试表达式。 Required flag - s必需标志 - s

https://regex101.com/r/YctLzb/5/tests https://regex101.com/r/YctLzb/5/tests

\<(\w+)\s[^<>]+\>(?!.*\<\/\1\>)

Update: this works without s flag and matches tags without attrs: https://regex101.com/r/6nwHQP/1/tests更新：这在没有 s 标志的情况下有效，并且匹配没有 attrs 的标签： https://regex101.com/r/6nwHQP/1/tests

\<(\w+)(?:\s[^<>]+)?\>(?!(?:.|\n)*\<\/\1\>)

Python 正则表达式：获取具有特定模式的字符串

问题描述

1 个解决方案

解决方案1
-1 2020-06-09 16:04:39

Python 正则表达式：获取具有特定模式的字符串

问题描述

1 个解决方案

解决方案1 -1 2020-06-09 16:04:39

解决方案1
-1 2020-06-09 16:04:39