如何使用 pd.read_html 并遍历许多不同的 url 并将每组 dfs 存储到 dfs 的主列表中？

Question

I was wondering how to pull tickers from an excel file, load a bunch of websites and run pd.read_html on each website in order to get a big list of dfs that contained the tables of each page?我想知道如何从 excel 文件中提取代码，加载一堆网站并在每个网站上运行 pd.read_html 以获得包含每个页面表格的大 dfs 列表？

This is my list of tickerss: https://docs.google.com/spreadsheets/d/16kdjtOlV1M_rDnM73lPi6ZcMvowQPmtjKu6bYTXK588/edit?usp=sharing这是我的行情列表： https ://docs.google.com/spreadsheets/d/16kdjtOlV1M_rDnM73lPi6ZcMvowQPmtjKu6bYTXK588/edit ? usp = sharing

This is my current code:这是我当前的代码：

from six.moves import urllib
import pandas as pd

df = pd.read_excel('C:/Users/Jacob/Downloads/CEF Tickers.xlsx', sheet_name='Sheet1')

tickers_list = df['Ticker'].tolist()

df_list = []

for ticker in tickers_list:
    df_list[ticker] = pd.read_html(f'https://www.cefconnect.com/fund/{ticker}', header=0)

print(df_list)

And then when I do that, I get:然后当我这样做时，我得到：

TypeError: list indices must be integers or slices, not str

Thank you for your time.感谢您的时间。

Answer 1

from six.moves import urllib
import pandas as pd

df = pd.read_excel('C:/Users/Jacob/Downloads/CEF Tickers.xlsx', sheet_name='Sheet1')

tickers_list = df['Ticker'].tolist()

df_list = []

for ticker in range(len(tickers_list)):
    df_list[ticker] = pd.read_html(f'https://www.cefconnect.com/fund/{ticker}', header=0)

print(df_list)

Answer 2

This is what I did.这就是我所做的。


df = pd.read_excel('C:/Users/Jacob/Downloads/CEF Tickers.xlsx', sheet_name='Sheet1')

tickers_list = df['Ticker'].tolist()
data = pd.DataFrame(columns=tickers_list)


for ticker in tickers_list:
    data[ticker] = pd.read_html(f'https://www.cefconnect.com/fund/{ticker}', header=0)


print(data)

如何使用 pd.read_html 并遍历许多不同的 url 并将每组 dfs 存储到 dfs 的主列表中？

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-09-24 22:57:38

解决方案2
0 2020-09-24 22:57:39

如何使用 pd.read_html 并遍历许多不同的 url 并将每组 dfs 存储到 dfs 的主列表中？

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-09-24 22:57:38

解决方案2 0 2020-09-24 22:57:39

解决方案1
1 已采纳 2020-09-24 22:57:38

解决方案2
0 2020-09-24 22:57:39