[英]Why do I keep getting more than one item in my list when retrieving information with BS4 and requests?
from bs4 import BeautifulSoup
import requests
Webpage = requests.get('https://www.brainyquote.com/quote_of_the_day')
soup = BeautifulSoup( Webpage.content, 'html.parser')
qoute = soup.find(class_='qotd-q-cntr')
words = [qoute.find('a').text for item in qoute]
print(words)
When printing the variable words
I get the same quote appearing three times in my list but I want to just get it one time.打印可变
words
时,我的列表中出现了 3 次相同的引用,但我只想获得一次。 my output is similar to the following我的 output 类似于以下
['qoute','qoute','qoute']
I'm looking to get it to be something like this我想让它变成这样
['qoute']
This is because you are scraping via class
attribute and the one you gave is the one for all the quotes when you inspect that website.这是因为您正在通过
class
属性进行抓取,而您提供的一个是您检查该网站时所有报价的一个。
Instead, search for something more specific.相反,搜索更具体的内容。 Like an
h2
tag with class
qotd-h2
and innerText
"Quote of the Day".就像一个带有
class
qotd-h2
和innerText
“今日报价”的h2
标签。
Then from getting that anchor element you can traverse the DOM to get to the quote.然后从获取该锚元素中,您可以遍历 DOM 以获取报价。
Example例子
from bs4 import BeautifulSoup
import requests
Webpage = requests.get('https://www.brainyquote.com/quote_of_the_day')
soup = BeautifulSoup( Webpage.content, 'html.parser')
#? Find the quote of the day title
anchor = soup.find('h2', class_='qotd-h2', text="Quote of the Day")
quoteDiv = anchor.parent.find(class_="clearfix") #? The div surrounding the quote
quote = quoteDiv.find(title="view quote") #? The quote tag
print(quote.text)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.