scrapy.Request不会回调我的函数

Question

I'm sorry if my question is too trivial but I'm behind a wall since this morning... I'm new in scrapy and I already read the doc but I haven't found my answer... 抱歉，如果我的问题太琐碎，但自从今天早上以来我就一直在墙后...我是新手，我已经读过文档，但找不到答案...

I wrote this spider and when I call parse_body in rules = (Rule(LinkExtractor(), callback='parse_body'),) , it does : 我写了这种蜘蛛，当我打电话parse_body在rules = (Rule(LinkExtractor(), callback='parse_body'),)它的作用：

tchatch = response.xpath('//div[@class="ProductPriceBox-item detail"]/div/a/@href').extract()
            print('\n TROUVE \n')
            print(tchatch)
            print('\n DONE \n')

But when I rename, everywhere in my code, the function parse_body by just parse , it just does : 但是，当我在代码中的任何地方重命名函数parse_body ，只需parse即可：

    print('\n EN FAIT, ICI : ', response.url, '\n')

It seems that my scrapy.Request requests are never called.... I even print a lot of useless things to know if my code was running the functions but it prints nothing except the print wrote above. 似乎从来没有调用过我的scrapy.Request请求...。我什至打印了很多无用的东西，以了解我的代码是否正在运行这些功能，但除了上面写的打印内容外，它什么也不print 。

Any idea please? 有什么想法吗？

# -*- coding: utf-8 -*-
import scrapy
import re
import numbers
from fnac.items import FnacItem
from urllib.request import urlopen
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from bs4 import BeautifulSoup

class Fnac(CrawlSpider):
    name = 'FnacCom'
    allowed_domains = ['fnac.com']
    start_urls = ['http://musique.fnac.com/a10484807/The-Cranberries-Something-else-CD-album']

    rules = (
        Rule(LinkExtractor(), callback='parse_body'),
    )

    def parse_body(self, response):
        item = FnacItem()

        nb_sales = response.xpath('//body//table[@summary="données détaillée du vendeur"]/tbody/tr/td/span/text()').re(r'([\d]*) ventes')
        country = response.xpath('//body//table[@summary="données détaillée du vendeur"]/tbody/tr/td/text()').re(r'([A-Z].*)')

        item['nb_sales'] = ''.join(nb_sales).strip()
        item['country'] = ''.join(country).strip()

        print(response.url)
        test_list = response.xpath('//a/@href')
        for test_list in response.xpath('.//div[@class="ProductPriceBox-item detail"]'):
            tchatch = response.xpath('//div[@class="ProductPriceBox-item detail"]/div/a/@href').extract()
            print('\n TROUVE \n')
            print(tchatch)
            print('\n DONE \n')

        yield scrapy.Request(response.url, callback=self.parse_iframe, meta={'item': item})

    def parse_iframe(self, response):
        f_item1 = response.meta['item']

        print('\n EN FAIT, ICI : ', response.url, '\n')
        soup = BeautifulSoup(urlopen(response.url), "lxml")
        iframexx = soup.find_all('iframe')
        if (len(iframexx) != 0):
            for iframe in iframexx:
                yield scrapy.Request(iframe.attrs['src'], callback=self.extract_or_loop, meta={'item': f_item1})
        else:
            yield scrapy.Request(response.url, callback=self.extract_or_loop, meta={'item': f_item1})

    def extract_or_loop(self, response):
        f_item2 = response.meta['item']

        print('\n PEUT ETRE ICI ? \n')
        address = response.xpath('//body//div/p/text()').re(r'.*Adresse \: (.*)\n?.*')
        email = response.xpath('//body//div/ul/li[contains(text(),"@")]/text()').extract()
        name = response.xpath('//body//div/p[@class="customer-policy-label"]/text()').re(r'Infos sur la boutique \: ([a-zA-Z0-9]*\s*)')
        phone = response.xpath('//body//div/p/text()').re(r'.*Tél \: ([\d]*)\n?.*')
        siret = response.xpath('//body//div/p/text()').re(r'.*Siret \: ([\d]*)\n?.*')
        vat = response.xpath('//body//div/text()').re(r'.*TVA \: (.*)')

        if (len(name) != 0):
            print('\n', name, '\n')
            f_item2['name'] = ''.join(name).strip()
            f_item2['address'] = ''.join(address).strip()
            f_item2['phone'] = ''.join(phone).strip()
            f_item2['email'] = ''.join(email).strip()
            f_item2['vat'] = ''.join(vat).strip()
            f_item2['siret'] = ''.join(siret).strip()
            yield f_item2
        else:
            for sel in response.xpath('//html/body'):
                list_urls = sel.xpath('//a/@href').extract()
                list_iframe = response.xpath('//div[@class="ProductPriceBox-item detail"]/div/a/@href').extract()
                if (len(list_iframe) != 0):
                    for list_iframe in list_urls:
                        print('\n', list_iframe, '\n')
                        print('\n GROS TCHATCH \n')
                        yield scrapy.Request(list_iframe, callback=self.parse_body)
                for url in list_urls:
                    yield scrapy.Request(response.urljoin(url), callback=self.parse_body)

Answer 1

In the scrapy documentation for the CrawlSpider, there is a warning: 在CrawlSpider的草稿文档中，有一个警告：

Warning 警告

When writing crawl spider rules, avoid using parse as callback, since the CrawlSpider uses the parse method itself to implement its logic. 编写爬网蜘蛛规则时，请避免将parse用作回调，因为CrawlSpider使用parse方法本身来实现其逻辑。 So if you override the parse method, the crawl spider will no longer work. 因此，如果您覆盖parse方法，则爬网蜘蛛将不再起作用。

You can check this out, here is the link 您可以检查一下，这里是链接

scrapy.Request不会回调我的函数

问题描述

1 个解决方案

解决方案1
2 2017-07-13 08:55:53

scrapy.Request不会回调我的函数

问题描述

1 个解决方案

解决方案1 2 2017-07-13 08:55:53

解决方案1
2 2017-07-13 08:55:53