简体   繁体   English

XPATH不从HTML Python提取表

[英]XPATH Not Extracting Tables From HTML Python

I am trying to extract tables from an HTML document using the xpath module in Python. 我正在尝试使用Python中的xpath模块从HTML文档中提取表。 If I print the downloaded HTML, I see the full DOM as it should be. 如果我打印下载的HTML,则可以看到完整的DOM。 However, when I use xpath.get, it give me a tbody section, but not the one I want and certainly not the only one that should be there. 但是,当我使用xpath.get时,它给了我一个tbody部分,但不是我想要的部分,当然也没有应该出现的唯一部分。 Here is the script. 这是脚本。

import requests
from webscraping import download, xpath
D = download.Download()
url = 'http://labs.mementoweb.org/timemap/json/http://www.awebsiteimscraping.com'
r = requests.get(url)
data = []
mementos = r.json()['mementos']['list']
for memento in mementos:
    data.append(D.get(memento['uri']))
# print xpath.get(data[10], '//table')
print type(data[0])
# print data[10]
print len(data)

I'm new to this, so idk if it matters, but the type of each element in 'data' is str. 我对此并不陌生,因此请务必使用idk,但“数据”中每个元素的类型均为str。

Convert type of data to dict using json.loads() 使用json.loads()将数据类型转换为dict

Try this, 尝试这个,

import requests
import json
from webscraping import download, xpath
D = download.Download()
url = 'http://labs.mementoweb.org/timemap/json/http://www.awebsiteimscraping.com'
r = requests.get(url)
data = []
mementos = r.json()['mementos']['list']
for memento in mementos:
    data.append(D.get(memento['uri']))
# print xpath.get(data[10], '//table')
print type(data[0])
# print data[10]
print len(data)
json_data = json.loads(data)
print type(json_data[0])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM