如何使用從 HTML 文件導入的 pandas 從數據集中提取特定列？

Question

import requests
import os
import pandas as pd
from bs4 import BeautifulSoup

#Importing html
df = pd.read_html(os.path.expanduser("~/Documents/HTMLSpider/HTMLSpider_test/spotgamma.html"))
print (df['Latest Data'])

我可以在網上找到的所有文檔都指出，從數據集中提取特定列需要您在方括號中指定列 header 的名稱，但是當我嘗試這樣做時會返回 TypeError：

>
    print (df['Latest Data'])
TypeError: list indices must be integers or slices, not str

如果您對數據集的外觀很好奇而不嘗試指定列：

     SpotGamma Proprietary Levels Latest Data  ...    NDX    QQQ
0                        Ref Price:        4465  ...  15283    372
1        SpotGamma Imp. 1 Day Move:      0.91%,  ...    NaN    NaN
2        SpotGamma Imp. 5 Day Move:       2.11%  ...    NaN    NaN
3           SpotGamma Gamma Index™:        0.48  ...   0.04  -0.08
4              Volatility Trigger™:        4415  ...  15075    373
5  SpotGamma Absolute Gamma Strike:        4450  ...  15500    370
6               Gamma Notional(MM):        $157  ...     $4  $-397

Answer 1

注意

df = pd.read_html(os.path.expanduser("~/Documents/HTMLSpider/HTMLSpider_test/spotgamma.html"))

將返回一個數據幀列表，而不是一個。

請參閱： https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html （“將 HTML 個表讀入 DataFrame 個對象的列表中。”）

最好做

ldf = pd.read_html(os.path.expanduser("~/Documents/HTMLSpider/HTMLSpider_test/spotgamma.html"))

進而

df = ldf[0]  # replace 0 with the number of the dataframe you want

得到第一個 dataframe（可能還有更多，檢查len(ldf)看看你得到了多少，哪個有你需要的列）。

如何使用從 HTML 文件導入的 pandas 從數據集中提取特定列？

問題描述

1 個解決方案

解決方案1
3 已采納 2021-09-29 08:40:47

如何使用從 HTML 文件導入的 pandas 從數據集中提取特定列？

問題描述

1 個解決方案

解決方案1 3 已采納 2021-09-29 08:40:47

解決方案1
3 已采納 2021-09-29 08:40:47