简体   繁体   中英

Scraping historical data from Yahoo Finance with Python

as some of you probably know by now, it seems that Yahoo! Finance has discontinued its API for stock market data. While I am aware of the existence of the fix-yahoo-finance solution, I was trying to implement a more stable solution to my code by directly scraping historical data from Yahoo.

So here is what I have for the moment:

import requests
from bs4 import BeautifulSoup

page = requests.get("https://finance.yahoo.com/quote/AAPL/history?period1=345423600&period2=1495922400&interval=1d&filter=history&frequency=1d")
soup = BeautifulSoup(page.content, 'html.parser')
soup
print(soup.prettify())

To get the data from Yahoo table I can do:

c=soup.find_all('tbody')
print(c)

My question is, how do I turn "c" into a nicer dataframe? Thanks!

I wrote this to get historical data from YF directly from the download csv link. It needs to make two requests, one to get the cookie and the crumb and another one to get the data. It returns a pandas dataframe

import re
from io import StringIO
from datetime import datetime, timedelta

import requests
import pandas as pd


class YahooFinanceHistory:
    timeout = 2
    crumb_link = 'https://finance.yahoo.com/quote/{0}/history?p={0}'
    crumble_regex = r'CrumbStore":{"crumb":"(.*?)"}'
    quote_link = 'https://query1.finance.yahoo.com/v7/finance/download/{quote}?period1={dfrom}&period2={dto}&interval=1d&events=history&crumb={crumb}'

    def __init__(self, symbol, days_back=7):
        self.symbol = symbol
        self.session = requests.Session()
        self.dt = timedelta(days=days_back)

    def get_crumb(self):
        response = self.session.get(self.crumb_link.format(self.symbol), timeout=self.timeout)
        response.raise_for_status()
        match = re.search(self.crumble_regex, response.text)
        if not match:
            raise ValueError('Could not get crumb from Yahoo Finance')
        else:
            self.crumb = match.group(1)

    def get_quote(self):
        if not hasattr(self, 'crumb') or len(self.session.cookies) == 0:
            self.get_crumb()
        now = datetime.utcnow()
        dateto = int(now.timestamp())
        datefrom = int((now - self.dt).timestamp())
        url = self.quote_link.format(quote=self.symbol, dfrom=datefrom, dto=dateto, crumb=self.crumb)
        response = self.session.get(url)
        response.raise_for_status()
        return pd.read_csv(StringIO(response.text), parse_dates=['Date'])

You can use it like this:

df = YahooFinanceHistory('AAPL', days_back=30).get_quote()

Alternatively, use the python library yfinance and yahoofinancials .

import pandas as pd
import yfinance as yf
from yahoofinancials import YahooFinancials

Choose the company stock ticker, eg. 'AAPL'.

df = yf.download('AAPL')

This will store historical stock data to the dataframe df.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM