简体   繁体   English

BeautifulSoup中是否有任何严格的findAll函数?

[英]Is there any strict findAll function in BeautifulSoup?

I am using Python- 2.7 and BeautifulSoup 我正在使用Python- 2.7和BeautifulSoup

Apologies if I am unable to explain what exactly I want 抱歉,如果我无法解释我到底想要什么

There is this html page in which data is embedded in specific structure I want to pull the data ignoring the first block 有一个HTML页面,其中数据以特定的结构嵌入其中,我想忽略第一个块来提取数据

But the problem is when I do- 但是问题是当我这样做时

self.tab = soup.findAll("div","listing-row") 

It also gives me the first block which is actually (unwanted html block)- 它还给了我第一个实际上是(不需要的html块)的块-

("div","listing-row wide-featured-listing")

I am not using 我没有用

soup.find("div","listing-row")

since I want all the classes named "listing-row " only in that entire page. 因为我只希望整个页面中所有名为“ listing-row ”的类。

How can I ignore the class named "listing-row wide-featured-listing" ? 如何忽略名为“ listing-row wide-featured-listing”的类

Help/Guidance in any form is appreciated. 任何形式的帮助/指导表示赞赏。 Thanks a lot ! 非常感谢 !

Or, you may make a CSS selector to match the class exactly to listing-row : 或者,您可以创建一个CSS选择器以将类完全匹配到listing-row

soup.select("div[class=listing-row]")

Demo: 演示:

>>> from bs4 import BeautifulSoup
>>> 
>>> data = """
... <div>
...     <div class="listing-row">result1</div>
...     <div class="listing-row wide-featured-listing">result2</div>
...     <div class="listing-row">result3</div>
... </div>
... """
>>> 
>>> soup = BeautifulSoup(data, "html.parser")
>>> print [row.text for row in soup.select("div[class=listing-row]")]
[u'result1', u'result3']

You could just filter out that element: 您可以过滤掉该元素:

self.tab = [el for el in soup.find_all('div', class_='listing-row')
            if 'wide-featured-listing' not in el.attr['class']]

You could use a custom function: 您可以使用自定义函数:

self.tab = soup.find_all(lambda e: e.name == 'div' and
                                   'listing-row' in e['class'] and
                                   'wide-featured-listing' not in el.attr['class'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM