简体   繁体   English

Python lxml.html xpath 不返回任何元素

[英]Python lxml.html xpath doesn't return any element

I'm using requests with lxml to grab some content from my website, but sometimes it doesn't return the elements it should.我正在使用带有 lxml 的请求从我的网站中获取一些内容,但有时它不会返回它应该返回的元素。 I just tried it on a Wikipedia page and 20% of the time, it doesn't work, here is the code to reproduce the "bug":我只是在 Wikipedia 页面上尝试过,有 20% 的时间,它不起作用,这里是重现“错误”的代码:

import requests
import lxml.html
url= "https://en.wikipedia.org/w/index.php?title=Web_crawler&action=edit&section=2"
resp = requests.get(url)
print(resp.text[:500]) #print <title> tag
tree = lxml.html.fromstring(resp.text)
title = tree.xpath('//title') #returns an empty list []

as you can see here, when I print the HTML out of requests lib, I see the following:正如您在此处看到的,当我从请求库中打印 HTML 时,我看到以下内容:

<!DOCTYPE html>
<html class="client-nojs" lang="en" dir="ltr">
<head>
<meta charset="UTF-8"/>
<title>Editing Web crawler (section) - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!0,"wgSeparatorTransformTable":["",""],"
...

You can see the <title> tag very clearly, but looks like with the xpath //title LXML can't catch it properly.您可以非常清楚地看到<title>标签,但看起来像 xpath //title LXML 无法正确捕捉它。 When I print title I get a empty list [] This code works just fine for some other URLs like this one https://en.wikipedia.org/wiki/Web_crawler any thoughts?当我打印title时,我得到一个空列表[]此代码适用于其他一些 URL,例如https://en.wikipedia.org/wiki/Web_crawler有什么想法吗?

thanks to @jackFeeting comment, I updated lxml and my code worked just fine.感谢@jackFeeting 评论,我更新了 lxml 并且我的代码运行良好。 pip3 install --upgrade lxml updated from version 4.4.1 to 4.6.2 pip3 install --upgrade lxml从版本4.4.1更新到4.6.2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM