Code:
from bs4 import BeautifulSoup
soup = BeautifulSoup('<div><p>p_string</p><div>div_string</div></div>')
for m in soup.div:
print "extract(first loop): ", m.extract()
print "current soup.div(frist loop): ", soup.div #it contains another div block
print '___________________________________________________________'
#I have to do another for loop to purge the remaining div block, why?
for m in soup.div:
print "extract(second loop): ", m.extract()
print "current soup.div(second loop): ", soup.div #removed
Result:
extract(first loop): <p>p_string</p>
current soup.div(frist loop): <div><div>div_string</div></div>
___________________________________________________________
extract(second loop): <div>div_string</div>
current soup.div(second loop): <div></div>
Why didn't it extract all elements( p
and div
) in the first for
loop?
This is because you are calling extract()
in the loop which removes a tag from a tree - removing the tag's children while iterating over them. It is basically the same as iterating over the list and remove items from it in the loop .
Instead, use .find_all()
:
for m in soup.div.find_all():
print m.extract()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.