[英]Select all divs except ones with certain classes in BeautifulSoup
As discussed in this question one can easily get all div
s with certain classes.正如在这个问题中所讨论的,人们可以很容易地获得具有某些类的所有div
。 But here, I have a list of classes that I want to exclude & want to get all divs that doesn't have any class given in the list.但是在这里,我有一个我想要排除的类列表,并且想要获取列表中没有任何 class 的所有 div。
For ie对于即
classToIgnore = ["class1", "class2", "class3"]
Now want to get all divs that doesn't contains the classes mentioned above list.现在想要获取所有不包含上面提到的类的 div 列表。 How can i achieve that?我怎样才能做到这一点?
Alternate solution替代解决方案
soup.find_all('div', class_=lambda x: x not in classToIgnore)
Example例子
from bs4 import BeautifulSoup
html = """
<div class="c1"></div>
<div class="c1"></div>
<div class="c2"></div>
<div class="c3"></div>
<div class="c4"></div>
"""
soup = BeautifulSoup(html, 'html.parser')
classToIgnore = ["c1", "c2"]
print(soup.find_all('div', class_=lambda x: x not in classToIgnore))
Output Output
[<div class="c3"></div>, <div class="c4"></div>]
If you are dealing with nested classes then try deleting the inner unwanted classes using decompose and then just find_all('div')
如果您正在处理嵌套类,请尝试使用decompose删除内部不需要的类,然后只需find_all('div')
for div in soup.find_all('div', class_=lambda x: x in classToIgnore):
div.decompose()
print(soup.find_all('div'))
This might leave some extra spaces but you can strip that off easily later.这可能会留下一些额外的空间,但您可以稍后轻松地将其删除。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.