如何使用从 HTML 文件导入的 pandas 从数据集中提取特定列？

Question

import requests
import os
import pandas as pd
from bs4 import BeautifulSoup

#Importing html
df = pd.read_html(os.path.expanduser("~/Documents/HTMLSpider/HTMLSpider_test/spotgamma.html"))
print (df['Latest Data'])

我可以在网上找到的所有文档都指出，从数据集中提取特定列需要您在方括号中指定列 header 的名称，但是当我尝试这样做时会返回 TypeError：

>
    print (df['Latest Data'])
TypeError: list indices must be integers or slices, not str

如果您对数据集的外观很好奇而不尝试指定列：

     SpotGamma Proprietary Levels Latest Data  ...    NDX    QQQ
0                        Ref Price:        4465  ...  15283    372
1        SpotGamma Imp. 1 Day Move:      0.91%,  ...    NaN    NaN
2        SpotGamma Imp. 5 Day Move:       2.11%  ...    NaN    NaN
3           SpotGamma Gamma Index™:        0.48  ...   0.04  -0.08
4              Volatility Trigger™:        4415  ...  15075    373
5  SpotGamma Absolute Gamma Strike:        4450  ...  15500    370
6               Gamma Notional(MM):        $157  ...     $4  $-397

Answer 1

注意

df = pd.read_html(os.path.expanduser("~/Documents/HTMLSpider/HTMLSpider_test/spotgamma.html"))

将返回一个数据帧列表，而不是一个。

请参阅： https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html （“将 HTML 个表读入 DataFrame 个对象的列表中。”）

最好做

ldf = pd.read_html(os.path.expanduser("~/Documents/HTMLSpider/HTMLSpider_test/spotgamma.html"))

进而

df = ldf[0]  # replace 0 with the number of the dataframe you want

得到第一个 dataframe（可能还有更多，检查len(ldf)看看你得到了多少，哪个有你需要的列）。

如何使用从 HTML 文件导入的 pandas 从数据集中提取特定列？

问题描述

1 个解决方案

解决方案1
3 已采纳 2021-09-29 08:40:47

如何使用从 HTML 文件导入的 pandas 从数据集中提取特定列？

问题描述

1 个解决方案

解决方案1 3 已采纳 2021-09-29 08:40:47

解决方案1
3 已采纳 2021-09-29 08:40:47