Scrapy-Xpath在shell中有效，但在代码中无效

Question

I'm trying to crawl a website (I got their authorization), and my code returns what I want in scrapy shell, but I get nothing in my spider. 我试图爬网一个网站（我获得了他们的授权），并且我的代码返回了我想要的外壳中的内容，但是我的蜘蛛却什么也没有。

I also checked all the previous question similar to this one without any success, eg, the website doesn't use javascript in the home page to load the elements I need. 我还检查了所有与该问题类似的先前问题，但均未成功，例如，该网站未在首页中使用javascript加载所需的元素。

import scrapy


class MySpider(scrapy.Spider):
    name = 'MySpider'

    start_urls = [ #WRONG URL, SHOULD BE https://shop.app4health.it/ PROBLEM SOLVED!
        'https://www.app4health.it/',
    ]

    def parse(self, response):
        self.logger.info('A response from %s just arrived!', response.url)
        print ('PRE RISULTATI')

        results =  response.selector.xpath('//*[@id="nav"]/ol/li[*]/a/@href').extract()
        # results = response.css('li a>href').extract()


        # This works on scrapy shell, not in code
        #risultati =  response.xpath('//*[@id="nav"]/ol/li[1]/a').extract()
        print (risultati)




        #for pagineitems in risultati:
               # next_page = pagineitems 
        print ('NEXT PAGE')
        #Ignores the request cause already done. Insert dont filter
        yield scrapy.Request(url=risultati, callback=self.prodotti,dont_filter = True)

    def prodotti(self, response):
        self.logger.info('A REEEESPONSEEEEEE from %s just arrived!', response.url)
        return 1

The website i'm trying to crawl is https://shop.app4health.it/ 我要抓取的网站是https://shop.app4health.it/

The xpath command that i use is this one : 我使用的xpath命令是这个：

response.selector.xpath('//*[@id="nav"]/ol/li[*]/a/@href').extract()

I know there are some problems with the prodotti function ecc..., but that's not the point. 我知道prodotti函数ecc有一些问题，但这不是重点。 I would like to understand why the xpath selector works with scrapy shell ( i get exactly the links that i need ), but when i run it in my spider, i always get a null list. 我想了解为什么xpath选择器可用于scrapy shell（我确切地获得了我需要的链接），但是当我在自己的Spider中运行它时，我总是得到一个空列表。

If it can help, when i use CSS selectors in my spider, it works fine and it finds the elements, but i would like to use xpath ( i need it in the future development of my application ). 如果有帮助，当我在我的Spider中使用CSS选择器时，它可以正常工作并找到元素，但是我想使用xpath（在我的应用程序的未来开发中需要它）。

Thanks for the help :) 谢谢您的帮助：）

EDIT : I tried to print the body of the first response ( from start_urls ) and it's correct, i get the page i want. 编辑：我试图打印第一个响应的正文（从start_urls），它是正确的，我得到了我想要的页面。 When i use selectors in my code ( even the one that have been suggested ) they all work fine in shell, but i get nothing in my code! 当我在代码中使用选择器（甚至是建议的选择器）时，它们在shell中都可以正常工作，但是我的代码却什么也没有！

EDIT 2 I have become more experienced with Scrapy and web crawling, and i realised that sometimes, the HTML page that you get in your browser might be different from the one you get with the Scrapy request! 编辑2我已经对Scrapy和Web爬网有了更多的经验，我意识到有时在浏览器中获得的HTML页面可能与通过Scrapy请求获得的HTML页面有所不同！ In my experience some website would respond with a different HTML compared to the one you see in your browser! 根据我的经验，与您在浏览器中看到的网站相比，某些网站会以不同的HTML响应！ That's why sometimes if you use the "correct" xpath/css query taken from the browser, it might return nothing if used in your Scrapy code. 这就是为什么有时如果您使用从浏览器获取的“正确” xpath / css查询，如果在您的Scrapy代码中使用的话，它可能不返回任何内容。 Always check if the body of your response is what you were expecting! 始终检查您的回复内容是否符合您的期望！

SOLVED : Path is correct. 求助：路径正确。 I wrote the wrong start_urls! 我写错了start_urls！

Answer 1

    //nav[@id="mmenu"]//ul/li[contains(@class,"level0")]/a[contains(@class,"level-top")]/@href

使用此xpath，在创建xpath之前也请考虑页面的“视图源”

Answer 2

Alternatively to Desperado's answer you can use css selectors which are much simpler but more than enough for your use case: 除了Desperado的答案之外，您还可以使用css选择器，该选择器要简单得多，但对于您的用例来说已经足够了：

$ scrapy shell "https://shop.app4health.it/"
In [1]: response.css('.level0 .level-top::attr(href)').extract()
Out[1]: 
['https://shop.app4health.it/sonno',
 'https://shop.app4health.it/monitoraggio-e-diagnostica',
 'https://shop.app4health.it/terapia',
 'https://shop.app4health.it/integratori-alimentari',
 'https://shop.app4health.it/fitness',
 'https://shop.app4health.it/benessere',
 'https://shop.app4health.it/ausili',
 'https://shop.app4health.it/prodotti-in-offerta',
 'https://shop.app4health.it/kit-regalo']

scrapy shell command is perfect for debugging issues like this. scrapy shell命令非常适合调试此类问题。

Scrapy-Xpath在shell中有效，但在代码中无效

问题描述

2 个解决方案

解决方案1
0 2018-04-25 07:16:30

解决方案2
0 已采纳 2018-04-25 07:21:27

Scrapy-Xpath在shell中有效，但在代码中无效

问题描述

2 个解决方案

解决方案1 0 2018-04-25 07:16:30

解决方案2 0 已采纳 2018-04-25 07:21:27

解决方案1
0 2018-04-25 07:16:30

解决方案2
0 已采纳 2018-04-25 07:21:27