[英]Python lxml.html xpath doesn't return any element
I'm using requests with lxml to grab some content from my website, but sometimes it doesn't return the elements it should.我正在使用带有 lxml 的请求从我的网站中获取一些内容,但有时它不会返回它应该返回的元素。 I just tried it on a Wikipedia page and 20% of the time, it doesn't work, here is the code to reproduce the "bug":
我只是在 Wikipedia 页面上尝试过,有 20% 的时间,它不起作用,这里是重现“错误”的代码:
import requests
import lxml.html
url= "https://en.wikipedia.org/w/index.php?title=Web_crawler&action=edit§ion=2"
resp = requests.get(url)
print(resp.text[:500]) #print <title> tag
tree = lxml.html.fromstring(resp.text)
title = tree.xpath('//title') #returns an empty list []
as you can see here, when I print the HTML out of requests lib, I see the following:正如您在此处看到的,当我从请求库中打印 HTML 时,我看到以下内容:
<!DOCTYPE html>
<html class="client-nojs" lang="en" dir="ltr">
<head>
<meta charset="UTF-8"/>
<title>Editing Web crawler (section) - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!0,"wgSeparatorTransformTable":["",""],"
...
You can see the <title>
tag very clearly, but looks like with the xpath //title
LXML can't catch it properly.您可以非常清楚地看到
<title>
标签,但看起来像 xpath //title
LXML 无法正确捕捉它。 When I print title
I get a empty list []
This code works just fine for some other URLs like this one https://en.wikipedia.org/wiki/Web_crawler
any thoughts?当我打印
title
时,我得到一个空列表[]
此代码适用于其他一些 URL,例如https://en.wikipedia.org/wiki/Web_crawler
有什么想法吗?
thanks to @jackFeeting comment, I updated lxml and my code worked just fine.感谢@jackFeeting 评论,我更新了 lxml 并且我的代码运行良好。
pip3 install --upgrade lxml
updated from version 4.4.1
to 4.6.2
pip3 install --upgrade lxml
从版本4.4.1
更新到4.6.2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.