简体   繁体   English

Select 除了在 BeautifulSoup 中具有某些类的所有 div

[英]Select all divs except ones with certain classes in BeautifulSoup

As discussed in this question one can easily get all div s with certain classes.正如在这个问题中所讨论的,人们可以很容易地获得具有某些类的所有div But here, I have a list of classes that I want to exclude & want to get all divs that doesn't have any class given in the list.但是在这里,我有一个我想要排除的类列表,并且想要获取列表中没有任何 class 的所有 div。

For ie对于即

classToIgnore = ["class1", "class2", "class3"]

Now want to get all divs that doesn't contains the classes mentioned above list.现在想要获取所有不包含上面提到的类的 div 列表。 How can i achieve that?我怎样才能做到这一点?

Using CSS selector, try this:使用 CSS 选择器,试试这个:

divs = soup.select("div:not('.class1, .class2, .class3')")

Reference参考

  1. Link 1 链接 1
  2. Link 2链接 2

Alternate solution替代解决方案

soup.find_all('div', class_=lambda x: x not in classToIgnore)

Example例子

from bs4 import BeautifulSoup
html = """
<div class="c1"></div>
<div class="c1"></div>
<div class="c2"></div>
<div class="c3"></div>
<div class="c4"></div>
"""
soup = BeautifulSoup(html, 'html.parser')
classToIgnore = ["c1", "c2"]
print(soup.find_all('div', class_=lambda x: x not in classToIgnore))

Output Output

[<div class="c3"></div>, <div class="c4"></div>]

If you are dealing with nested classes then try deleting the inner unwanted classes using decompose and then just find_all('div')如果您正在处理嵌套类,请尝试使用decompose删除内部不需要的类,然后只需find_all('div')

for div in soup.find_all('div', class_=lambda x: x in classToIgnore):
    div.decompose()
print(soup.find_all('div'))

This might leave some extra spaces but you can strip that off easily later.这可能会留下一些额外的空间,但您可以稍后轻松地将其删除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM