XPATH不从HTML Python提取表

Question

I am trying to extract tables from an HTML document using the xpath module in Python. 我正在尝试使用Python中的xpath模块从HTML文档中提取表。 If I print the downloaded HTML, I see the full DOM as it should be. 如果我打印下载的HTML，则可以看到完整的DOM。 However, when I use xpath.get, it give me a tbody section, but not the one I want and certainly not the only one that should be there. 但是，当我使用xpath.get时，它给了我一个tbody部分，但不是我想要的部分，当然也没有应该出现的唯一部分。 Here is the script. 这是脚本。

import requests
from webscraping import download, xpath
D = download.Download()
url = 'http://labs.mementoweb.org/timemap/json/http://www.awebsiteimscraping.com'
r = requests.get(url)
data = []
mementos = r.json()['mementos']['list']
for memento in mementos:
    data.append(D.get(memento['uri']))
# print xpath.get(data[10], '//table')
print type(data[0])
# print data[10]
print len(data)

I'm new to this, so idk if it matters, but the type of each element in 'data' is str. 我对此并不陌生，因此请务必使用idk，但“数据”中每个元素的类型均为str。

Answer 1

Convert type of data to dict using json.loads() 使用json.loads（）将数据类型转换为dict

Try this, 尝试这个，

import requests
import json
from webscraping import download, xpath
D = download.Download()
url = 'http://labs.mementoweb.org/timemap/json/http://www.awebsiteimscraping.com'
r = requests.get(url)
data = []
mementos = r.json()['mementos']['list']
for memento in mementos:
    data.append(D.get(memento['uri']))
# print xpath.get(data[10], '//table')
print type(data[0])
# print data[10]
print len(data)
json_data = json.loads(data)
print type(json_data[0])

XPATH不从HTML Python提取表

问题描述

1 个解决方案

解决方案1
2 2016-01-12 07:15:51

XPATH不从HTML Python提取表

问题描述

1 个解决方案

解决方案1 2 2016-01-12 07:15:51

解决方案1
2 2016-01-12 07:15:51