简体   繁体   English

如何通过解析包含代码列表的文本文件来刮取Yahoo Finance?

[英]How can I scrape Yahoo Finance by parsing through a text file containing a list of tickers?

I have a text file containing a list of company tickers. 我有一个包含公司代码清单的文本文件。

My intention is to be able to scrape Yahoo Finance to get financial data for each of these tickers. 我的目的是能够利用雅虎财经来获取每个代码的财务数据。

I do not know how to tell Python to go over each individual line in my text file, assign it to a variable and then scrape that particular Yahoo Finance page and retrieve its data and store it, repeating the process for how many tickers are contained in the text file. 我不知道如何告诉Python查看我的文本文件中的每一行,将其分配给变量然后抓取特定的Yahoo Finance页面并检索其数据并存储它,重复包含多少代码的过程。文本文件。

I found some code online to be able to scrape data from Yahoo Finance. 我在网上发现了一些能够从雅虎财经中获取数据的代码。 Then I tried to add a for loop so that each ticker is added to a symbol variable (see code). 然后我尝试添加一个for循环,以便将每个自动收报机添加到符号变量中(请参阅代码)。

def scrape_table(url):
    page = requests.get(url)
    tree = html.fromstring(page.content)
    table = tree.xpath('//table')
    assert len(table) == 1

    df = pd.read_html(lxml.etree.tostring(table[0], method='html'))[0]

    df = df.set_index(0)
    df = df.dropna()
    df = df.transpose()
    df = df.replace('-', '0')

    # The first column should be a date
    df[df.columns[0]] = pd.to_datetime(df[df.columns[0]])
    cols = list(df.columns)
    cols[0] = 'Date'
    df = df.set_axis(cols, axis='columns', inplace=False)

    numeric_columns = list(df.columns)[1::]
    df[numeric_columns] = df[numeric_columns].astype(np.float64)

    return df

symbol = []

tf = open("SMI_tickers.txt")
for line in tf.readlines():
   symbol.append(print(line))

balance_sheet_url = 'https://finance.yahoo.com/quote/' + symbol + 
'/balance-sheet?p=' + symbol

df_balance_sheet = scrape_table(balance_sheet_url)
df_balance_sheet

The error message I got was 我得到的错误信息是

TypeError: can only concatenate str (not "list") to str TypeError:只能将str(不是“list”)连接到str

My end goal is to be able to import financial data for each of these companies so that I can perform some calculations on each stock. 我的最终目标是能够为这些公司中的每一家公司导入财务数据,以便我可以对每只股票进行一些计算。

You were trying to pass a list instead of a string. 您试图传递列表而不是字符串。 You need to iterate through all the symbols in your file, which you can do as follows. 您需要遍历文件中的所有符号,您可以执行以下操作。 This code will append the results of the scraping function (which I have not tested) to a list balance_sheets. 此代码会将抓取功能(我尚未测试)的结果附加到列表balance_sheets。

balance_sheets=[]
tf = open("SMI_tickers.txt")
for line in tf.readlines():
    symbol.append(line)
    balance_sheet_url = 'https://finance.yahoo.com/quote/' + line + '/balance-sheet?p=' + line
    balance_sheets.append(scrape_table(balance_sheet_url))

I'm assuming your SMI_tickers.txt has a list of ticker symbols with one symbol per line. 我假设您的SMI_tickers.txt有一个股票代码列表,每行一个符号。 The erros comes when you build your url, you try to concatenate symbol, which is a list, not a string. 构建网址时会出现错误,您尝试连接符号,这是一个列表,而不是字符串。

Change your code to below. 将您的代码更改为以下。

tf = open("SMI_tickers.txt")
for line in tf.readlines():
  symbol.append(line)
for symb in symbol:
    balance_sheet_url = 'https://finance.yahoo.com/quote/' + symb + 
          '/balance-sheet?p=' + symb

    df_balance_sheet = scrape_table(balance_sheet_url)
    df_balance_sheet

You don't actually need to read the symbols into a list, you could just use the symbol once you get it from the file without the list. 您实际上不需要将符号读入列表,只需从没有列表的文件中获取符号就可以使用该符号。

tf = open("SMI_tickers.txt")
for line in tf.readlines():
    balance_sheet_url = 'https://finance.yahoo.com/quote/' + line+ 
          '/balance-sheet?p=' + line

    df_balance_sheet = scrape_table(balance_sheet_url)
    df_balance_sheet

In both cases df_balance_sheet will be your pandas datafram. 在这两种情况下,df_balance_sheet都将成为您的pandas datafram。

For scraping balance sheet info, amongst anything else you might like, check out the yahoo_fin package http://theautomatic.net/yahoo_fin-documentation/ Makes things a wee bit easier 要抓取资产负债表信息,除了您可能想要的任何其他信息,请查看yahoo_fin软件包http://theautomatic.net/yahoo_fin-documentation/让事情变得更轻松一点

import pandas as pd
from yahoo_fin.stock_info import get_balance_sheet

stock_list = ["AAPL", "IBM", "AMZN"]

all_data = {ticker: get_balance_sheet(ticker)
                 for ticker in stock_list}

aapl = pd.DataFrame(all_data['AAPL'])
aapl = aapl.set_index('Period Ending')

Hope this helps 希望这可以帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何有效地从雅虎财经下载大量股票行情? - How can efficiently download a large list of tickers from Yahoo Finance? 如何浏览股票行情清单以更新数据框 - How can I go through a list of equities tickers to update a dataframe 我可以从 yahoo Finance 加载多个股票代码,但在将新列保存到 csv 之前添加新列时遇到问题 - I can load multiple stock tickers from yahoo finance, but I am having trouble adding new columns before it is saved to a csv 如何从雅虎财经中抓取特定数据? - How to scrape particular data from Yahoo Finance? 多线程抓取雅虎财经 - Multithreading to Scrape Yahoo Finance 某些ETF报价器在Yahoo Finance DataReader中不起作用 - Some ETF Tickers Not Working in Yahoo Finance DataReader 从 Yahoo Finance 下载多个代码数据 - Downloading multiple tickers data from Yahoo Finance 雅虎财经 API 为某些代码返回 NAN - Yahoo Finance API returning NAN for some tickers 使用 for 循环从雅虎财经检索大型股票行情列表的财务数据到数据帧时出错 - Error in retrieving financial data for large list of tickers from yahoo finance into a dataframe using for loop 尝试从数组中抓取 Yahoo Finance 时,“必须是 str,而不是列表” - “must be str, not list” while trying scrape Yahoo Finance from an array
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM