如何從yahoo finance獲取最大歷史價格數據？

Question

我想從雅虎金融獲得 scrapy 的最大歷史價格數據。
這里是url的fb(facebook)最大歷史價格數據。

https://query1.finance.yahoo.com/v7/finance/download/FNMA?period1=221115600&period2=1508472000&interval=1d&events=history&crumb=1qRuQKELxmM

為了寫一個股價web的爬蟲，有兩個問題解決不了。
1.參數period1如何獲取？
您可以在web頁面中手動獲取，只需單擊max。
如何用 python 代碼獲取參數？
不同的股票有不同的 period1 值。

2.如何自動創建參數crumb=1qRuQKELxmM，不同的股票有不同的crumb值？
這是我在 scrapy 框架下的股票最大歷史數據。

import scrapy

class TestSpider(scrapy.Spider):
    name = "quotes"
    allowed_domains = ["finance.yahoo.com"]

    def __init__(self, *args, **kw):
        self.timeout = 10

    def start_requests(self):
        stockName =  get-it and ommit the codes 
        for stock in stockName:
            period1 =  how to fill it
            crumb = how to fill it
            per_stock_max_data = "https://query1.finance.yahoo.com/v7/finance\
                  download/"+stock+"?period1="+period1+"&period2=1508472000&\
                  interval=1d&events=history&"+"crumb="crumb
            yield scrapy.Request(per_stock_max_data,callback=self.parse)

    def parse(self, response):
        content = response.body
        target = response.url
        #do something

我的web scrawler framework上面的空白怎么填？

Answer 1

據我了解，您想下載特定代碼的所有可能數據。 因此，要做到這一點，您實際上不需要提供period1參數，如果您在period1的位置提供 0，那么 Yahoo API 會將最舊的日期作為默認值。

要使用您在問題中顯示的方式下載報價，我們很遺憾必須處理 cookie。 我會讓自己在不使用 Scrapy 的情況下提供解決方案，只需要股票代碼本身：

def get_yahoo_ticker_data(ticker):
    res = requests.get('https://finance.yahoo.com/quote/' + ticker + '/history')
    yahoo_cookie = res.cookies['B']
    yahoo_crumb = None
    pattern = re.compile('.*"CrumbStore":\{"crumb":"(?P<crumb>[^"]+)"\}')
    for line in res.text.splitlines():
        m = pattern.match(line)
        if m is not None:
            yahoo_crumb = m.groupdict()['crumb']
    cookie_tuple = yahoo_cookie, yahoo_crumb

    current_date = int(time.time())
    url_kwargs = {'symbol': ticker, 'timestamp_end': current_date,
        'crumb': cookie_tuple[1]}
    url_price = 'https://query1.finance.yahoo.com/v7/finance/download/' \
                '{symbol}?period1=0&period2={timestamp_end}&interval=1d&events=history' \
                '&crumb={crumb}'.format(**url_kwargs)


    response = requests.get(url_price, cookies={'B': cookie_tuple[0]})

    return pd.read_csv(StringIO(response.text), parse_dates=['Date'])

如果您確實需要最舊的日期，那么您可以使用上面的代碼並從響應中提取第一個日期。

get_yahoo_ticker_data(ticker='AAPL')

我知道網絡抓取不是一個有效的選擇，但它是我們唯一的選擇，因為雅虎已經停用了所有 API。 您可能會找到一些第三方解決方案，但所有這些解決方案都在其源代碼中使用抓取，並且添加了一些會降低整體性能的額外樣板代碼。

Answer 2

安裝熊貓數據閱讀器后：

pip install pandas-datareader

您可以使用以下代碼請求股票價格：

import pandas_datareader as pdr
from datetime import datetime

appl = pdr.get_data_yahoo(symbols='AAPL', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1))
print(appl['Adj Close'])

Answer 3

如果您想要的是整個歷史，您實際上不需要計算最大日期，請使用合理的舊日期（在下面的示例中 1900/01/01）。 例如，假設您對FB的股票感興趣，那么這應該可行

import scrapy
import time


class FinanceSpider(scrapy.Spider):
    name = "finance"
    allowed_domains = ["finance.yahoo.com"]
    start_urls = ['https://finance.yahoo.com/quote/FB']

    def parse(self, response):
        crumb = response.css('script').re_first('user":{"crumb":"(.*?)"').decode('unicode_escape')
        url = ("https://query1.finance.yahoo.com/v7/finance/download/FB" +
               "?period1=-2208988800&period2=" + str(int(time.time())) + "&interval=1d&events=history&" +
               "crumb={}".format(crumb))
        return scrapy.Request(url, callback=self.parse_csv)

    def parse_csv(self, response):
        lines = response.body.strip().split('\n')
        print(lines[0])
        print(lines[1])
        print(lines[-1])

Answer 4

period1 和 period2 都是“自紀元以來的秒數”值，您可以使用datetime.datetime.fromtimestamp(dt)和int(dt.timestamp())在 python 時間戳和這些值之間進行轉換。 但正如其他人已經提到的，您不需要為這些參數指定確切的數字，您可以為 period1 使用零，為 period2 為所有股票使用 2000000000。
重要的是，相同的“面包屑”值對於下載所有股票一段時間（大約 1 周）有效。 因此，與其在每次下載請求之前獲取新的“面包屑”，不如將其緩存並僅在收到“未經授權”響應時更新； 您的下載執行速度會快兩倍。 獲取 crumb 值的最簡單方法是請求 Yahoo 主頁 ( https://finance.yahoo.com/ ) 並在那里找到 "user":{"crumb":" 子字符串。

Answer 5

如果您只是將 0 代替周期 1，它應該可以工作，因為間隔將是您剛剛單擊的時間的開始。

您可以將第二個時間段作為一個非常大的整數（例如 1900000000）放入，它只會包含該日期之前發生的所有內容。

Answer 6

遇到這個線程並想添加另一個選項：Python yfinance package。代碼 API 有一個歷史方法，通過該方法可以將時間段指定為'max' ，以便返回全部可用資產數據。 例子：

import yfinance as yf

spy = yf.Ticker('SPY').history(
    period='max',
    interval='1d'
)

yfinance package 利用pandas package 並以DataFrame object 的形式返回spy ：

                                 Open        High  ...  Stock Splits  Capital Gains
Date                                               ...                             
1993-01-29 00:00:00-05:00   25.236177   25.236177  ...           0.0            0.0
1993-02-01 00:00:00-05:00   25.236163   25.397589  ...           0.0            0.0
1993-02-02 00:00:00-05:00   25.379641   25.469322  ...           0.0            0.0
1993-02-03 00:00:00-05:00   25.487262   25.738368  ...           0.0            0.0
1993-02-04 00:00:00-05:00   25.810116   25.881861  ...           0.0            0.0
...                               ...         ...  ...           ...            ...
2023-01-24 00:00:00-05:00  398.880005  401.149994  ...           0.0            0.0
2023-01-25 00:00:00-05:00  395.950012  400.700012  ...           0.0            0.0
2023-01-26 00:00:00-05:00  403.130005  404.920013  ...           0.0            0.0
2023-01-27 00:00:00-05:00  403.660004  408.160004  ...           0.0            0.0
2023-01-30 00:00:00-05:00  402.799988  405.119995  ...           0.0            0.0

[7555 rows x 8 columns]

在撰寫本文時（2023 年 1 月 30 日），這代表了可通過雅虎財經獲得的 SPY 的完整每日數據集。 或者， period參數的值為None將導致相同的結果。 值得注意的是，省略period值將使用默認的"1mo"值。

注意：查看 Yfinance 的源代碼會發現，一段“最大”或“無”會導致開始日期使用值-2208994789 （Unix 時間為 1900）的情況。

如何從yahoo finance獲取最大歷史價格數據？

問題描述

6 個解決方案

解決方案1
9 2017-10-23 17:01:09

解決方案2
2 2017-10-20 11:56:51

解決方案3
0 2017-10-24 07:27:52

解決方案4
0 2017-10-26 09:34:18

解決方案5
0 2018-07-30 17:29:22

解決方案6
0 2023-01-31 00:50:44

如何從yahoo finance獲取最大歷史價格數據？

問題描述

6 個解決方案

解決方案1 9 2017-10-23 17:01:09

解決方案2 2 2017-10-20 11:56:51

解決方案3 0 2017-10-24 07:27:52

解決方案4 0 2017-10-26 09:34:18

解決方案5 0 2018-07-30 17:29:22

解決方案6 0 2023-01-31 00:50:44

解決方案1
9 2017-10-23 17:01:09

解決方案2
2 2017-10-20 11:56:51

解決方案3
0 2017-10-24 07:27:52

解決方案4
0 2017-10-26 09:34:18

解決方案5
0 2018-07-30 17:29:22

解決方案6
0 2023-01-31 00:50:44