简体   繁体   中英

python regular expression to parse div tags

a question about python regular expression.

I would like to match a div block like

<div class="leftTail"><ul class="hotnews">any news stuff</ul></div>

I was thinking a pattern like

p = re.compile(r'<div\s+class=\"leftTail\">[^(div)]+</div>')

but it seems not working properly

another pattern

p = re.compile(r'<div\s+class=\"leftTail\">[\W|\w]+</div>')

i got much more than i think, it gets all the stuff until the last tag in the file.

Thanks for any help

You might want to consider graduating to an actual HTML parser. I suggest you give Beautiful Soup a try. There are many crazy ways for HTML to be formatted, and the regular expressions may not work correctly all the time, even if you write them correctly.

Don't use regular expressions to parse XML or HTML. You'll never be able to get it to work correctly for nested divs.

尝试这个:

p = re.compile(r'<div\s+class=\"leftTail\">.*?</div>')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM