[英]Python BeautifulSoup html5lib mix seems to be deleting every other item in for loop
我是python的新手,但到目前为止我真的很喜欢这种语言。
我一直在创建一堆复杂的html5元素,并使用html5lib模块。
当我浏览段落中的元素时,可以很好地打印它们,但是当我尝试使用bs4的insert方法时,我只能得到所有其他元素的输出,而我也不知道为什么!
我的python:
i = 0
for gallery_elem in gallery_header_next_sibling:
if ( gallery_elem.name.lower() == 'img' ):
if ( i == 0 ):
new_gallery = soup.new_tag( "div" )
new_gallery[ "class" ] = "gallery"
new_gallery_elem = soup.new_tag( "figure" )
if ( gallery_elem.has_attr( "alt" ) ):
new_gallery_cap = soup.new_tag( "figcaption" )
new_gallery_cap.string = gallery_elem[ "alt" ]
new_gallery_elem.insert( 2, new_gallery_cap )
if ( gallery_elem.has_attr( "title" ) ):
new_gallery_attribution = soup.new_tag( "dl" )
new_gallery_attribution_dt = soup.new_tag( "dt" )
new_gallery_attribution_dt.string = "Image owner:"
new_gallery_attribution_dd = soup.new_tag( "dd" )
new_gallery_attribution_dd.string = gallery_elem[ "title" ]
new_gallery_attribution.insert( 0, new_gallery_attribution_dt )
new_gallery_attribution.insert( 1, new_gallery_attribution_dd )
new_gallery_elem.insert( 1, new_gallery_attribution )
new_gallery_elem.insert( 1, gallery_elem )
i = i + 1
new_gallery_elem.insert( 1, gallery_elem )
HTML
<img alt="Caption One." src="img/orange.jpg" title="Attribution One."/>
<img alt="Caption Two." src="img/red.jpg" title="Attribution Two."/>
<img alt="Caption Three." src="img/urban.jpg" title="Attribution Three."/>
<img alt="Caption Four." src="img/brolly.jpg" title="Attribution Four."/>
<img alt="Caption Five." src="img/tomy.jpg" title="Attribution Five."/>
输出:
<figure><figcaption>Caption One.</figcaption><img alt="Caption One." src="img/orange.jpg" title="Attribution One."/><dl><dt>Image owner:</dt><dd>Attribution One.</dd></dl></figure>
<figure><figcaption>Caption Three.</figcaption><img alt="Caption Three." src="img/urban.jpg" title="Attribution Three."/><dl><dt>Image owner:</dt><dd>Attribution Three.</dd></dl></figure>
<figure><figcaption>Caption Five.</figcaption><img alt="Caption Five." src="img/tomy.jpg" title="Attribution Five."/><dl><dt>Image owner:</dt><dd>Attribution Five.</dd></dl></figure>
如果我抽出下一行,我将获得所有五个元素。 有人对我做错了什么有任何暗示吗?
new_gallery_elem.insert( 1, gallery_elem )
因此,在进行了一些实验之后,我发现如果将所需的元素存储在列表中,然后从列表中检索它们,而不是尝试现场编辑汤,则可以解决我的问题。
一旦创建并存储了对象,就可以将它们重新添加到之前创建并插入汤中的父元素中。
我希望这可以解决其他人的过早秃顶问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.