Python lxml.html xpath 不返回任何元素

Question

I'm using requests with lxml to grab some content from my website, but sometimes it doesn't return the elements it should.我正在使用带有 lxml 的请求从我的网站中获取一些内容，但有时它不会返回它应该返回的元素。 I just tried it on a Wikipedia page and 20% of the time, it doesn't work, here is the code to reproduce the "bug":我只是在 Wikipedia 页面上尝试过，有 20% 的时间，它不起作用，这里是重现“错误”的代码：

import requests
import lxml.html
url= "https://en.wikipedia.org/w/index.php?title=Web_crawler&action=edit&section=2"
resp = requests.get(url)
print(resp.text[:500]) #print <title> tag
tree = lxml.html.fromstring(resp.text)
title = tree.xpath('//title') #returns an empty list []

as you can see here, when I print the HTML out of requests lib, I see the following:正如您在此处看到的，当我从请求库中打印 HTML 时，我看到以下内容：

<!DOCTYPE html>
<html class="client-nojs" lang="en" dir="ltr">
<head>
<meta charset="UTF-8"/>
<title>Editing Web crawler (section) - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!0,"wgSeparatorTransformTable":["",""],"
...

You can see the <title> tag very clearly, but looks like with the xpath //title LXML can't catch it properly.您可以非常清楚地看到<title>标签，但看起来像 xpath //title LXML 无法正确捕捉它。 When I print title I get a empty list [] This code works just fine for some other URLs like this one https://en.wikipedia.org/wiki/Web_crawler any thoughts?当我打印title时，我得到一个空列表[]此代码适用于其他一些 URL，例如https://en.wikipedia.org/wiki/Web_crawler有什么想法吗？

Answer 1

thanks to @jackFeeting comment, I updated lxml and my code worked just fine.感谢@jackFeeting 评论，我更新了 lxml 并且我的代码运行良好。 pip3 install --upgrade lxml updated from version 4.4.1 to 4.6.2 pip3 install --upgrade lxml从版本4.4.1更新到4.6.2

Python lxml.html xpath 不返回任何元素

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-02-23 14:49:08

Python lxml.html xpath 不返回任何元素

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-02-23 14:49:08

解决方案1
0 已采纳 2021-02-23 14:49:08