简体   繁体   English

Scrapy + Python + Xpath:Xpath返回一个空列表

[英]Scrapy + Python + Xpath : Xpath returns an empty list

I need to scrape the links to the images from this page: http://calendar.youtoocanrun.com/events/new-delhi-1/beat-that-run/ 我需要从此页面抓取图像的链接: http : //calendar.youtoocanrun.com/events/new-delhi-1/beat-that-run/

在此处输入图片说明

I wrote this xpath: 我写了这个xpath:

response.xpath('//li[@class="geodir-active-slide"]/img/@src').extract()

It returned empty list. 它返回了空列表。 It should have returned the links to both gif and jpg files. 它应该已经返回了gif和jpg文件的链接。 Why? 为什么?

The problem is not in your XPath expression, but in the assumption that the element you are looking for is in the page raw HTML file downloaded by Scrapy. 问题不在于您的XPath表达式中,而是假设您要查找的元素位于Scrapy下载的页面原始HTML文件中。

Scrapy doesn't run any JavaScript files so that in many cases the response you get in Scrapy is different than what you see in the developer tools. Scrapy不会运行任何JavaScript文件,因此在许多情况下,您在Scrapy中获得的响应与在开发人员工具中看到的响应是不同的。

If you open the same website using the "view page source" option from your browser, you'll see that the element you're looking for is not there. 如果使用浏览器中的“查看页面源”选项打开相同的网站,则会看到所需的元素不存在。 This means that such element is generated dynamically using JavaScript. 这意味着该元素是使用JavaScript动态生成的。

There are some ways to solve this and I'd approach it in this order: 有一些方法可以解决此问题,我将按以下顺序进行处理:

  1. check the page HTML and look for JS code containing the data you need; 检查页面HTML并查找包含所需数据的JS代码;
  2. inspect the requests that your browser is doing in the requests panel in developer tools and try to find a request that is bringing that content for you; 在开发人员工具的“请求”面板中检查浏览器正在执行的请求,并尝试查找为您带来该内容的请求;
  3. use a headless browser to render the page for you; 使用无头浏览器为您呈现页面;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM