简体   繁体   English

为什么在使用 BS4 和请求检索信息时,我的列表中总是出现不止一项?

[英]Why do I keep getting more than one item in my list when retrieving information with BS4 and requests?

from bs4 import BeautifulSoup
import requests

Webpage = requests.get('https://www.brainyquote.com/quote_of_the_day')

soup = BeautifulSoup( Webpage.content, 'html.parser')

qoute = soup.find(class_='qotd-q-cntr')
words = [qoute.find('a').text for item in qoute]


print(words) 

When printing the variable words I get the same quote appearing three times in my list but I want to just get it one time.打印可变words时,我的列表中出现了 3 次相同的引用,但我只想获得一次。 my output is similar to the following我的 output 类似于以下

['qoute','qoute','qoute']

I'm looking to get it to be something like this我想让它变成这样

['qoute']

This is because you are scraping via class attribute and the one you gave is the one for all the quotes when you inspect that website.这是因为您正在通过class属性进行抓取,而您提供的一个是您检查该网站时所有报价的一个。

行情_1

Instead, search for something more specific.相反,搜索更具体的内容。 Like an h2 tag with class qotd-h2 and innerText "Quote of the Day".就像一个带有class qotd-h2innerText “今日报价”的h2标签。

行情_2

Then from getting that anchor element you can traverse the DOM to get to the quote.然后从获取该锚元素中,您可以遍历 DOM 以获取报价。

Example例子

from bs4 import BeautifulSoup
import requests

Webpage = requests.get('https://www.brainyquote.com/quote_of_the_day')

soup = BeautifulSoup( Webpage.content, 'html.parser')

#? Find the quote of the day title
anchor = soup.find('h2', class_='qotd-h2', text="Quote of the Day")
quoteDiv = anchor.parent.find(class_="clearfix") #? The div surrounding the quote
quote = quoteDiv.find(title="view quote") #? The quote tag

print(quote.text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM