[英]Beautiful Soup: searching for a nested pattern?
soup.find_all
will search a BeautifulSoup document for all occurrences of a single tag. soup.find_all
将在BeautifulSoup文档中搜索所有出现的单个标记。 Is there a way to search for particular patterns of nested tags? 有没有办法搜索嵌套标签的特定模式?
For example, I would like to search for all occurrences of this pattern: 例如,我想搜索此模式的所有实例:
<div class="separator">
<a>
<img />
</a>
</div>
There are multiple ways to find the pattern, but the easiest one would be to use a CSS selector
: 有多种方法可以找到模式,但最简单的方法是使用
CSS selector
:
for img in soup.select('div.separator > a > img'):
print img # or img.parent.parent to get the "div"
Demo: 演示:
>>> from bs4 import BeautifulSoup
>>> data = """
... <div>
... <div class="separator">
... <a>
... <img src="test1"/>
... </a>
... </div>
...
... <div class="separator">
... <a>
... <img src="test2"/>
... </a>
... </div>
...
... <div>test3</div>
...
... <div>
... <a>test4</a>
... </div>
... </div>
... """
>>> soup = BeautifulSoup(data)
>>>
>>> for img in soup.select('div.separator > a > img'):
... print img.get('src')
...
test1
test2
I do understand that, strictly speaking, the solution would not work if the div
has more than just one a
child, or inside the a
tag there is smth else except the img
tag. 我明白,严格来说,该解决方案将不若工作
div
已经不仅仅是一个多a
孩子,或者里面a
标签有除其他不便img
标签。 If this is the case the solution can be improved with additional checks (will edit the answer if needed). 如果是这种情况,可以通过额外的检查来改进解决方案(如果需要,将编辑答案)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.