從帶有Python的HTML表中僅提取一列數據？

Question

我正在嘗試為我正在做的一個小項目提取一些NBA統計信息，並且我只需要從HTML表格中提取幾列（垂直向上和向下）數據，例如此處的數據。 我現在只想獲取PTS，那么我應該如何只提取那一列數據呢？ 我已經知道它是每個數據行的倒數第三個元素，但是我不確定應該如何解析數據。

Answer 1

我建議您閱讀整個html表，然后選擇所需的列。 也許您會失去一些速度，但會獲得更多的簡單性。

使用pandas的read_html函數很容易做到：

import urllib2
import pandas as pd

page1 = urllib2.urlopen(
    'http://www.basketball-reference.com/players/h/hardeja01/gamelog/2015/').read()

#Select the correct table by some attributes, in this case id=pgl_basic.
#The read_html function returns a list of tables.
#In this case we select the first (and only) table with this id
stat_table = pd.io.html.read_html(page1,attrs={'id':'pgl_basic'})[0]

#Just select the column we needed. 
point_column = stat_table['PTS']

print point_column

如果您還不熟悉熊貓，則可以從以下網站閱讀更多信息： http : //pandas-docs.github.io/pandas-docs-travis/10min.html

例如，您可能要從表中刪除標題行或將表拆分為多個表。

從帶有Python的HTML表中僅提取一列數據？

問題描述

1 個解決方案

解決方案1
1 2015-04-07 13:10:47

從帶有Python的HTML表中僅提取一列數據？

問題描述

1 個解決方案

解決方案1 1 2015-04-07 13:10:47

解決方案1
1 2015-04-07 13:10:47