如何通过解析包含代码列表的文本文件来刮取Yahoo Finance？

Question

I have a text file containing a list of company tickers. 我有一个包含公司代码清单的文本文件。

My intention is to be able to scrape Yahoo Finance to get financial data for each of these tickers. 我的目的是能够利用雅虎财经来获取每个代码的财务数据。

I do not know how to tell Python to go over each individual line in my text file, assign it to a variable and then scrape that particular Yahoo Finance page and retrieve its data and store it, repeating the process for how many tickers are contained in the text file. 我不知道如何告诉Python查看我的文本文件中的每一行，将其分配给变量然后抓取特定的Yahoo Finance页面并检索其数据并存储它，重复包含多少代码的过程。文本文件。

I found some code online to be able to scrape data from Yahoo Finance. 我在网上发现了一些能够从雅虎财经中获取数据的代码。 Then I tried to add a for loop so that each ticker is added to a symbol variable (see code). 然后我尝试添加一个for循环，以便将每个自动收报机添加到符号变量中（请参阅代码）。

def scrape_table(url):
    page = requests.get(url)
    tree = html.fromstring(page.content)
    table = tree.xpath('//table')
    assert len(table) == 1

    df = pd.read_html(lxml.etree.tostring(table[0], method='html'))[0]

    df = df.set_index(0)
    df = df.dropna()
    df = df.transpose()
    df = df.replace('-', '0')

    # The first column should be a date
    df[df.columns[0]] = pd.to_datetime(df[df.columns[0]])
    cols = list(df.columns)
    cols[0] = 'Date'
    df = df.set_axis(cols, axis='columns', inplace=False)

    numeric_columns = list(df.columns)[1::]
    df[numeric_columns] = df[numeric_columns].astype(np.float64)

    return df

symbol = []

tf = open("SMI_tickers.txt")
for line in tf.readlines():
   symbol.append(print(line))

balance_sheet_url = 'https://finance.yahoo.com/quote/' + symbol + 
'/balance-sheet?p=' + symbol

df_balance_sheet = scrape_table(balance_sheet_url)
df_balance_sheet

The error message I got was 我得到的错误信息是

TypeError: can only concatenate str (not "list") to str TypeError：只能将str（不是“list”）连接到str

My end goal is to be able to import financial data for each of these companies so that I can perform some calculations on each stock. 我的最终目标是能够为这些公司中的每一家公司导入财务数据，以便我可以对每只股票进行一些计算。

Answer 1

You were trying to pass a list instead of a string. 您试图传递列表而不是字符串。 You need to iterate through all the symbols in your file, which you can do as follows. 您需要遍历文件中的所有符号，您可以执行以下操作。 This code will append the results of the scraping function (which I have not tested) to a list balance_sheets. 此代码会将抓取功能（我尚未测试）的结果附加到列表balance_sheets。

balance_sheets=[]
tf = open("SMI_tickers.txt")
for line in tf.readlines():
    symbol.append(line)
    balance_sheet_url = 'https://finance.yahoo.com/quote/' + line + '/balance-sheet?p=' + line
    balance_sheets.append(scrape_table(balance_sheet_url))

Answer 2

I'm assuming your SMI_tickers.txt has a list of ticker symbols with one symbol per line. 我假设您的SMI_tickers.txt有一个股票代码列表，每行一个符号。 The erros comes when you build your url, you try to concatenate symbol, which is a list, not a string. 构建网址时会出现错误，您尝试连接符号，这是一个列表，而不是字符串。

Change your code to below. 将您的代码更改为以下。

tf = open("SMI_tickers.txt")
for line in tf.readlines():
  symbol.append(line)
for symb in symbol:
    balance_sheet_url = 'https://finance.yahoo.com/quote/' + symb + 
          '/balance-sheet?p=' + symb

    df_balance_sheet = scrape_table(balance_sheet_url)
    df_balance_sheet

You don't actually need to read the symbols into a list, you could just use the symbol once you get it from the file without the list. 您实际上不需要将符号读入列表，只需从没有列表的文件中获取符号就可以使用该符号。

tf = open("SMI_tickers.txt")
for line in tf.readlines():
    balance_sheet_url = 'https://finance.yahoo.com/quote/' + line+ 
          '/balance-sheet?p=' + line

    df_balance_sheet = scrape_table(balance_sheet_url)
    df_balance_sheet

In both cases df_balance_sheet will be your pandas datafram. 在这两种情况下，df_balance_sheet都将成为您的pandas datafram。

Answer 3

For scraping balance sheet info, amongst anything else you might like, check out the yahoo_fin package http://theautomatic.net/yahoo_fin-documentation/ Makes things a wee bit easier 要抓取资产负债表信息，除了您可能想要的任何其他信息，请查看yahoo_fin软件包http://theautomatic.net/yahoo_fin-documentation/让事情变得更轻松一点

import pandas as pd
from yahoo_fin.stock_info import get_balance_sheet

stock_list = ["AAPL", "IBM", "AMZN"]

all_data = {ticker: get_balance_sheet(ticker)
                 for ticker in stock_list}

aapl = pd.DataFrame(all_data['AAPL'])
aapl = aapl.set_index('Period Ending')

Hope this helps 希望这可以帮助

如何通过解析包含代码列表的文本文件来刮取Yahoo Finance？

问题描述

3 个解决方案

解决方案1
0 2019-05-30 16:14:43

解决方案2
0 2019-05-30 16:20:19

解决方案3
0 2019-08-13 10:17:36

如何通过解析包含代码列表的文本文件来刮取Yahoo Finance？

问题描述

3 个解决方案

解决方案1 0 2019-05-30 16:14:43

解决方案2 0 2019-05-30 16:20:19

解决方案3 0 2019-08-13 10:17:36

解决方案1
0 2019-05-30 16:14:43

解决方案2
0 2019-05-30 16:20:19

解决方案3
0 2019-08-13 10:17:36