scrapy.Request 似乎无法回调 url

Question

I have modified my code to nail down where the error is arising.我已经修改了我的代码来确定错误发生的位置。 I am using scrapy and in the first "def parse" i am trying to call a url and then in the next "def" i am trying to then crawl that url.我正在使用scrapy，在第一个“def parse”中我试图调用一个url，然后在下一个“def”中我试图抓取那个url。

But, i seem unable to make the scray.Request work, it wont crawl the URL.但是，我似乎无法使 scray.Request 工作，它不会抓取 URL。

import scrapy
#from urllib.parse import urljoin

from CharlesChurch.items import CharleschurchItem

class charleschurchSpider(scrapy.Spider):
    name = "charleschurch"
    allowed_domains = ["charleschurch.com"]
    start_urls = ["https://www.charleschurch.com/sitemap"]

    def parse(self, response):
        # for href in response.xpath('//*[@class="contacts-item"]/ul/li/a/@href'):
        #     url = urljoin('https://www.charleschurch.com/',href.extract())
        #     yield scrapy.Request(url, callback=self.parse_dir_contents)
        url = 'https://www.charleschurch.com/north-yorkshire_harrogate/kingsley-park-10923'
        yield scrapy.Request(url, self.parse_dir_contents)

            
    def parse_dir_contents(self, response):
#    def parse(self, response):
        for sel in response.xpath('//*[@id="aspnetForm"]/div[4]'):
            item = CharleschurchItem()
            item['name'] = sel.xpath('//*[@id="XplodePage_ctl12_dsDetailsSnippet_pDetailsContainer"]/span[1]/b/text()').extract()
            item['address'] = sel.xpath('//*[@id="XplodePage_ctl12_dsDetailsSnippet_pDetailsContainer"]/div/*[@itemprop="postalCode"]/text()').extract()
            plotnames = sel.xpath('//div[@class="housetype js-filter-housetype"]/div[@class="housetype__col-2"]/div[@class="housetype__plots"]/div[not(contains(@data-status,"Sold"))]/div[@class="plot__name"]/a/text()').extract()
            plotnames = [plotname.strip() for plotname in plotnames]
            plotids = sel.xpath('//div[@class="housetype js-filter-housetype"]/div[@class="housetype__col-2"]/div[@class="housetype__plots"]/div[not(contains(@data-status,"Sold"))]/div[@class="plot__name"]/a/@href').extract()
            plotids = [plotid.strip() for plotid in plotids]
            plotprices = sel.xpath('//div[@class="housetype js-filter-housetype"]/div[@class="housetype__col-2"]/div[@class="housetype__plots"]/div[not(contains(@data-status,"Sold"))]/div[@class="plot__price"]/text()').extract()
            plotprices = [plotprice.strip() for plotprice in plotprices]
            result = zip(plotnames, plotids, plotprices)
            for plotname, plotid, plotprice in result:
                item['plotname'] = plotname
                item['plotid'] = plotid
                item['plotprice'] = plotprice
                yield item

the error i get is:我得到的错误是：

2020-09-08 22:12:06 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-09-08 22:12:07 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.charleschurch.com/sitemap> (referer: None)
2020-09-08 22:12:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.charleschurch.com/north-yorkshire_harrogate/kingsley-park-10923> (referer: https://www.charleschurch.com/sitemap)
2020-09-08 22:12:08 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.charleschurch.com/north-yorkshire_harrogate/kingsley-park-10923> (referer: https://www.charleschurch.com/sitemap)
Traceback (most recent call last):
  File "C:\Users\andre\Anaconda3\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration: <200 https://www.charleschurch.com/north-yorkshire_harrogate/kingsley-park-10923>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\andre\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 55, in mustbe_deferred
    result = f(*args, **kw)
  File "C:\Users\andre\Anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 60, in process_spider_input
    return scrape_func(response, request, spider)
  File "C:\Users\andre\Anaconda3\lib\site-packages\scrapy\core\scraper.py", line 152, in call_spider
    warn_on_generator_with_return_value(spider, callback)
  File "C:\Users\andre\Anaconda3\lib\site-packages\scrapy\utils\misc.py", line 218, in warn_on_generator_with_return_value
    if is_generator_with_return_value(callable):
  File "C:\Users\andre\Anaconda3\lib\site-packages\scrapy\utils\misc.py", line 203, in is_generator_with_return_value
    tree = ast.parse(dedent(inspect.getsource(callable)))
  File "C:\Users\andre\Anaconda3\lib\ast.py", line 47, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 1
    def parse_dir_contents(self, response):
    ^
IndentationError: unexpected indent

It seems the code yield scrapy.Request(url, self.parse_dir_contents) is not working and i am not sure why?似乎代码yield scrapy.Request(url, self.parse_dir_contents)不起作用，我不知道为什么？

Answer 1

From your logs:从你的日志：

    def parse_dir_contents(self, response):
    ^
IndentationError: unexpected indent

You have an identation error:你有一个识别错误：

    def parse_dir_contents(self, response):
#    def parse(self, response):
        for sel in response.xpath('//*[@id="aspnetForm"]/div[4]'):
            item = CharleschurchItem()

Remove the commented line or fix the identation删除注释行或修复标识

scrapy.Request 似乎无法回调 url

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-09-08 21:37:30

scrapy.Request 似乎无法回调 url

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-09-08 21:37:30

解决方案1
0 已采纳 2020-09-08 21:37:30