使用python，LXML和XPATH從網站上的表中提取信息

Question

經過大量的努力，我設法從該網站的表格中提取了一些我需要的信息：

http://gbgfotboll.se/serier/?scr=table&ftid=57108

我從“ Kommande Matcher”表（第二張表）中提取了日期和球隊名稱。

但是現在我完全陷入嘗試從第一張表中提取的問題：

第一列“滯后”
第二列“ S”
6h欄“ GM-IM”
最后一列“ P”

有任何想法嗎？ ，謝謝

Answer 1

我剛剛做到了：

from io import BytesIO
import urllib2 as net
from lxml import etree
import lxml.html    

request = net.Request("http://gbgfotboll.se/serier/?scr=table&ftid=57108")
response = net.urlopen(request)
data = response.read()

collected = [] #list-tuple of [(col1, col2...), (col1, col2...)]
dom = lxml.html.parse(BytesIO(data))
#all table rows    
xpatheval = etree.XPathDocumentEvaluator(dom)
rows = xpatheval('//div[@id="content-primary"]/table[1]/tbody/tr')

for row in rows:
    columns = row.findall("td")
    collected.append((
        columns[0].find("a").text.encode("utf8"), # Lag
        columns[1].text, # S
        columns[5].text, # GM-IM
        columns[7].text, # P - last column
    ))

for i in collected: print i

您可以直接在lxml.html.parse（）中傳遞URL，而不是調用urllib2。 另外，您可以按類屬性來獲取目標表，如下所示：

# new version
from lxml import etree
import lxml.html    

collected = [] #list-tuple of [(col1, col2...), (col1, col2...)]
dom = lxml.html.parse("http://gbgfotboll.se/serier/?scr=table&ftid=57108")
#all table rows
xpatheval = etree.XPathDocumentEvaluator(dom)
rows = xpatheval("""//div[@id="content-primary"]/table[
    contains(concat(" ", @class, " "), " clTblStandings ")]/tbody/tr""")

for row in rows:
    columns = row.findall("td")
    collected.append((
        columns[0].find("a").text.encode("utf8"), # Lag
        columns[1].text, # S
        columns[5].text, # GM-IM
        columns[7].text, # P - last column
    ))

for i in collected: print i

使用python，LXML和XPATH從網站上的表中提取信息

問題描述

1 個解決方案

解決方案1
2 已采納 2015-04-04 17:42:39

使用python，LXML和XPATH從網站上的表中提取信息

問題描述

1 個解決方案

解決方案1 2 已采納 2015-04-04 17:42:39

解決方案1
2 已采納 2015-04-04 17:42:39