[英]How do I use Python and lxml to parse a local html file?
我正在使用 python 中的本地 html 文件,并且我正在尝试使用 lxml 来解析该文件。 出于某种原因,我无法正确加载文件,我不确定这是否与我的本地机器上没有设置 http 服务器、etree 使用或其他原因有关。
我对此代码的参考是: http : //docs.python-guide.org/en/latest/scenarios/scrape/
这可能是一个相关的问题: Requests : No connection adapters were found for, error in Python3
这是我的代码:
from lxml import html
import requests
page = requests.get('C:\Users\...\sites\site_1.html')
tree = html.fromstring(page.text)
test = tree.xpath('//html/body/form/div[3]/div[3]/div[2]/div[2]/div/div[2]/div[2]/p[1]/strong/text()')
print test
我得到的回溯如下:
C:\Python27\python.exe "C:/Users/.../extract_html/extract.py"
Traceback (most recent call last):
File "C:/Users/.../extract_html/extract.py", line 4, in <module>
page = requests.get('C:\Users\...\sites\site_1.html')
File "C:\Python27\lib\site-packages\requests\api.py", line 69, in get
return request('get', url, params=params, **kwargs)
File "C:\Python27\lib\site-packages\requests\api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 567, in send
adapter = self.get_adapter(url=request.url)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 641, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for 'C:\Users\...\sites\site_1.html'
Process finished with exit code 1
您可以看到它与“连接适配器”有关,但我不确定这意味着什么。
如果文件是本地文件,则不应使用requests
——只需打开文件并读入即可。 requests
需要与 Web 服务器通信。
with open(r'C:\Users\...site_1.html', "r") as f:
page = f.read()
tree = html.fromstring(page)
有一个更好的方法:使用parse
函数而不是fromstring
tree = html.parse("C:\Users\...site_1.html")
print(html.tostring(tree))
您也可以尝试使用美汤
from bs4 import BeautifulSoup
f = open("filepath", encoding="utf8")
soup = BeautifulSoup(f)
f.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.