![](/img/trans.png)
[英]Get an attribute value based on the name attribute with BeautifulSoup
[英]How to use BeautifulSoup to get attribute value if the attribute name duplicated
我在下面编写了Python代码以通过BeautifulSoup解析HTML:
parsed_html = BeautifulSoup('<img id = \'defualtPagePic\' src="http://my.com/images/realTarget.jpg" alt="test" src="http://my.com/images/fakeTarget.jpg" alt="too bad" onError="this.src=\'http://my.com/images/veryBad.jpg\';" />', "html.parser")
print("a >> "+ str(parsed_html.find(id="defualtPagePic").attrs))
print("b >> "+ str(parsed_html.find(id="defualtPagePic")['src']))
这是执行结果:
a >> {'id': 'defualtPagePic', 'src': 'http://my.com/images/fakeTarget.jpg', 'alt': 'too bad', 'onerror': "this.src='http://my.com/images/veryBad.jpg';"}
b >> http://my.com/images/fakeTarget.jpg
我想获取“ realTarget.jpg”,但失败了,并且获取了“ fakeTarget.jpg”。 我认为原因是BeautifulSoup总是为特定属性名称获取最新值。
关于这种情况有什么建议吗?
您可以按如下所示切换到使用lxml
解析器:
html = '<img id = \'defualtPagePic\' src="http://my.com/images/realTarget.jpg" alt="test" src="http://my.com/images/fakeTarget.jpg" alt="too bad" onError="this.src=\'http://my.com/images/veryBad.jpg\';" />'
soup = BeautifulSoup(html, "lxml")
print(soup.img['src'])
然后将显示:
http://my.com/images/realTarget.jpg
如果没有,则需要单独安装lxml
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.