[英]BeautifulSoup: get elements that have a certain attribute, independent of its value
Imagine I have the following html: 假设我有以下html:
<div id='0'>
stuff here
</div>
<div id='1'>
stuff here
</div>
<div id='2'>
stuff here
</div>
<div id='3'>
stuff here
</div>
Is there a simple way to extract all div
's that have the attribute id
, independent of its value using BeautifulSoup? 有没有一种简单的方法可以使用BeautifulSoup提取与属性id
无关的所有具有id
属性的div
? I realize it is trivial to do this with xpath, but it seems that there's no way to do xpath search in BeautifulSoup. 我意识到用xpath做到这一点很简单,但是在BeautifulSoup中似乎没有办法进行xpath搜索。
Use id=True
to match only elements that have the attribute set: 使用id=True
仅匹配具有属性集的元素:
soup.find_all('div', id=True)
The inverse works too; 反之亦然。 you can exclude tags with the id
attribute: 您可以使用id
属性排除标签:
soup.find_all('div', id=False):
To find tags with a given attribute you can also use CSS selectors : 要查找具有给定属性的标签,您还可以使用CSS选择器 :
soup.select('div[id]'):
but this does not support the operators needed to search for the inverse, unfortunately. 但是不幸的是,这不支持搜索逆运算符的运算符。
Demo: 演示:
>>> from bs4 import BeautifulSoup
>>> sample = '''\
... <div id="id1">This has an id</div>
... <div>This has none</div>
... <div id="id2">This one has an id too</div>
... <div>But this one has no clue (or id)</div>
... '''
>>> soup = BeautifulSoup(sample)
>>> soup.find_all('div', id=True)
[<div id="id1">This has an id</div>, <div id="id2">This one has an id too</div>]
>>> soup.find_all('div', id=False)
[<div>This has none</div>, <div>But this one has no clue (or id)</div>]
>>> soup.select('div[id]')
[<div id="id1">This has an id</div>, <div id="id2">This one has an id too</div>]
BeautifulSoup4 supports commonly-used css selectors . BeautifulSoup4支持常用的CSS选择器 。
>>> import bs4
>>>
>>> soup = bs4.BeautifulSoup('''
... <div id="0"> this </div>
... <div> not this </div>
... <div id="2"> this too </div>
... ''')
>>> soup.select('div[id]')
[<div id="0"> this </div>, <div id="2"> this too </div>]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.