python正则表达式解析div标签

Question

a question about python regular expression. 有关python正则表达式的问题。

I would like to match a div block like 我想匹配一个div块

<div class="leftTail"><ul class="hotnews">any news stuff</ul></div>

I was thinking a pattern like 我在想一个像

p = re.compile(r'<div\s+class=\"leftTail\">[^(div)]+</div>')

but it seems not working properly 但似乎无法正常工作

another pattern 另一种模式

p = re.compile(r'<div\s+class=\"leftTail\">[\W|\w]+</div>')

i got much more than i think, it gets all the stuff until the last tag in the file. 我得到的比我想象的要多得多，它可以获取所有内容，直到文件中的最后一个标签为止。

Thanks for any help 谢谢你的帮助

Answer 1

You might want to consider graduating to an actual HTML parser. 您可能要考虑升级到实际的HTML解析器。 I suggest you give Beautiful Soup a try. 我建议您尝试一下美丽汤。 There are many crazy ways for HTML to be formatted, and the regular expressions may not work correctly all the time, even if you write them correctly. 有许多疯狂的方法可以格式化HTML，即使正确编写了正则表达式，也可能无法始终正常工作。

Answer 2

Don't use regular expressions to parse XML or HTML. 不要使用正则表达式来解析XML或HTML。 You'll never be able to get it to work correctly for nested divs. 您将永远无法使它对于嵌套div正常工作。

Answer 3

尝试这个：

p = re.compile(r'<div\s+class=\"leftTail\">.*?</div>')

python正则表达式解析div标签

问题描述

3 个解决方案

解决方案1
12 2009-10-09 00:36:04

解决方案2
4 2009-10-09 00:35:04

解决方案3
4 2009-10-09 00:41:39

python正则表达式解析div标签

问题描述

3 个解决方案

解决方案1 12 2009-10-09 00:36:04

解决方案2 4 2009-10-09 00:35:04

解决方案3 4 2009-10-09 00:41:39

解决方案1
12 2009-10-09 00:36:04

解决方案2
4 2009-10-09 00:35:04

解决方案3
4 2009-10-09 00:41:39