简体   繁体   中英

Trouble scraping a specific 'span' class using BeautifulSoup

I am scraping from https://ca.finance.yahoo.com/quote/AAPL and want to get the change in stock price, the text in green/red. I have been able to scrape the stock price but not the change value since they are located in the same 'div' class but different 'span' class.

Text I want: '-3.89 (-1.36%)' - numbers will vary

HTML from website:

    <div class="My(6px) Pos(r) smartphone_Mt(6px)" data-reactid="29">
        <div class="D(ib) Va(m) Maw(65%) Ov(h)" data-reactid="30">
            <div class="D(ib) Mend(20px)" data-reactid="31"><span class="Trsdu(0.3s) Fw(b) 
             Fz(36px) Mb(-4px) D(ib)" data-reactid="32">282.80</span>
                   <span class="Trsdu(0.3s) Fw(500) Pstart(10px) Fz(24px) C($negativeColor)" 
                    data-reactid="33">-3.89 (-1.36%)</span>

What I've used to get the price: (282.80) - prices may vary

stockLink = ('https://ca.finance.yahoo.com/quote/AAPL')
stockPage = requests.get(stockLink)
stockSoup = BeautifulSoup(stockPage.text, 'lxml')
stockQuote = stockSoup.find('div', {'class': 
                                       'My(6px)Pos(r)smartphone_Mt(6px)'}).find('span').text 
print(stockQuote)
                 

I've tried many variation of trying to change the class name, span name and using data-reactid but none seem to work, they all output an empty "[]".

Thank you Very Much.

It looks like the problem is in this line

stockQuote = page_soup.find('div', {'class': 'My(6px)Pos(r)smartphone_Mt(6px)'}).find('span').text

The class names should be seperated by spaces as they are each a different class in html.

The solution is to separate them as you see in the page HTML. It would look like this (two spaces have been added to the class identifier):

stockQuote = page_soup.find('div', {'class': 'My(6px) Pos(r) smartphone_Mt(6px)'}).find('span').text

However, this returns the number before the red/green text. As there are multiple spans in this div, you have to find all of them. This is how I did it:

stockQuote = page_soup.find('div', {'class': 'My(6px) Pos(r) smartphone_Mt(6px)'}).findAllNext('span')
stockQuote = stockQuote[1].text

The findAllNext function will find the spans in that div. It returns about 36, but the one you are looking for is the second one. Then, you just get the text from it like you did before and it should return the number you are looking for.

Although web scraping is a good tool, it may be worth looking into yahoo's yfinance API.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM