简体   繁体   English

无法使用 BeautifulSoup 刮取特定的“跨度” class

[英]Trouble scraping a specific 'span' class using BeautifulSoup

I am scraping from https://ca.finance.yahoo.com/quote/AAPL and want to get the change in stock price, the text in green/red.我正在从https://ca.finance.yahoo.com/quote/AAPL 抓取并希望获得股票价格的变化,文本为绿色/红色。 I have been able to scrape the stock price but not the change value since they are located in the same 'div' class but different 'span' class.由于它们位于相同的“div”class 但不同的“span”class 中,因此我能够获取股价但不能获取变化值。

Text I want: '-3.89 (-1.36%)' - numbers will vary我想要的文本:'-3.89 (-1.36%)' - 数字会有所不同

HTML from website: HTML 来自网站:

    <div class="My(6px) Pos(r) smartphone_Mt(6px)" data-reactid="29">
        <div class="D(ib) Va(m) Maw(65%) Ov(h)" data-reactid="30">
            <div class="D(ib) Mend(20px)" data-reactid="31"><span class="Trsdu(0.3s) Fw(b) 
             Fz(36px) Mb(-4px) D(ib)" data-reactid="32">282.80</span>
                   <span class="Trsdu(0.3s) Fw(500) Pstart(10px) Fz(24px) C($negativeColor)" 
                    data-reactid="33">-3.89 (-1.36%)</span>

What I've used to get the price: (282.80) - prices may vary我用来得到的价格:(282.80) - 价格可能会有所不同

stockLink = ('https://ca.finance.yahoo.com/quote/AAPL')
stockPage = requests.get(stockLink)
stockSoup = BeautifulSoup(stockPage.text, 'lxml')
stockQuote = stockSoup.find('div', {'class': 
                                       'My(6px)Pos(r)smartphone_Mt(6px)'}).find('span').text 
print(stockQuote)
                 

I've tried many variation of trying to change the class name, span name and using data-reactid but none seem to work, they all output an empty "[]".我尝试了许多尝试更改 class 名称、跨度名称和使用 data-reactid 但似乎都不起作用的变体,它们都是 output 一个空的“[]”。

Thank you Very Much.非常感谢您。

It looks like the problem is in this line看起来问题出在这一行

stockQuote = page_soup.find('div', {'class': 'My(6px)Pos(r)smartphone_Mt(6px)'}).find('span').text

The class names should be seperated by spaces as they are each a different class in html. class 名称应该用空格分隔,因为它们在 html 中是不同的 class。

The solution is to separate them as you see in the page HTML.解决方案是将它们分开,如您在 HTML 页面中看到的那样。 It would look like this (two spaces have been added to the class identifier):它看起来像这样(在 class 标识符中添加了两个空格):

stockQuote = page_soup.find('div', {'class': 'My(6px) Pos(r) smartphone_Mt(6px)'}).find('span').text

However, this returns the number before the red/green text.但是,这将返回红色/绿色文本之前的数字。 As there are multiple spans in this div, you have to find all of them.由于此 div 中有多个 span,因此您必须找到所有 span。 This is how I did it:我是这样做的:

stockQuote = page_soup.find('div', {'class': 'My(6px) Pos(r) smartphone_Mt(6px)'}).findAllNext('span')
stockQuote = stockQuote[1].text

The findAllNext function will find the spans in that div. findAllNext function 将在该 div 中找到跨度。 It returns about 36, but the one you are looking for is the second one.它返回大约 36,但您要查找的是第二个。 Then, you just get the text from it like you did before and it should return the number you are looking for.然后,您只需像以前一样从中获取文本,它应该返回您要查找的数字。

Although web scraping is a good tool, it may be worth looking into yahoo's yfinance API.虽然 web 抓取是一个不错的工具,但可能值得研究一下 yahoo 的yfinance API。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM