python regular expression to parse div tags

Question

a question about python regular expression.

I would like to match a div block like

<div class="leftTail"><ul class="hotnews">any news stuff</ul></div>

I was thinking a pattern like

p = re.compile(r'<div\s+class=\"leftTail\">[^(div)]+</div>')

but it seems not working properly

another pattern

p = re.compile(r'<div\s+class=\"leftTail\">[\W|\w]+</div>')

i got much more than i think, it gets all the stuff until the last tag in the file.

Thanks for any help

Answer 1

You might want to consider graduating to an actual HTML parser. I suggest you give Beautiful Soup a try. There are many crazy ways for HTML to be formatted, and the regular expressions may not work correctly all the time, even if you write them correctly.

Answer 2

Don't use regular expressions to parse XML or HTML. You'll never be able to get it to work correctly for nested divs.

Answer 3

尝试这个：

p = re.compile(r'<div\s+class=\"leftTail\">.*?</div>')

python regular expression to parse div tags

Question

3 answers

solution1
12 2009-10-09 00:36:04

solution2
4 2009-10-09 00:35:04

solution3
4 2009-10-09 00:41:39

python regular expression to parse div tags

Question

3 answers

solution1 12 2009-10-09 00:36:04

solution2 4 2009-10-09 00:35:04

solution3 4 2009-10-09 00:41:39

solution1
12 2009-10-09 00:36:04

solution2
4 2009-10-09 00:35:04

solution3
4 2009-10-09 00:41:39