來自 Scrapy 的響應 405

Question

我試圖從http://quotes.toscrape.com/抓取作者數據，但不幸的是，當我運行蜘蛛時，作者頁面返回 405； 而在瀏覽器中或通過在 Scrapy shell 中Scrapy shell它返回200 。

class AuthorsSpider(scrapy.Spider):
    name = 'authors'
    allowed_domains = ['quotes.toscrape.com']
    start_urls = ['http://quotes.toscrape.com/']
    custom_settings = {
        'CONCURRENT_REQUESTS': 50,
        'DOWNLOAD_DELAY': 0.1,
        'FEED_URI': f'output/authors_{datetime.datetime.today().strftime("%Y-%m-%d %H-%M-%S")}.csv',
        'FEED_FORMAT': 'csv',
        'FEED_EXPORTERS': {'csv': 'scrapy.exporters.CsvItemExporter'},
        'FEED_EXPORT_ENCODING': 'utf-8',
        'FEED_EXPORT_FIELDS': ('name','birth_date','birth_location','description',) 
    }

    def parse(self, response):
        for _ in response.xpath("//div[@class='quote']"):
            author_page = response.xpath("//a[text()='(about)']/@href").get()
            yield response.follow(author_page,
                                method="GET",
                                callback=self.parse_author)

        next_page = response.xpath("//li[@class='next']/a/@href").get()
        if next_page:
            yield response.follow(next_page, self.parse)


    def parse_author(self, response):
        yield {
            'name': response.xpath("//h3[@class='author-title']/text()").get(),
            'birth_date': response.xpath("//span[@class='author-born-date']/text()").get(),
            'birth_location': response.xpath("//span[@class='author-born-location']/text()").get(),
            'description': response.xpath("//div[@class='author-description']/text()").get()
        }

這是我運行scrapy crawl authors時的部分響應：

2023-01-02 10:53:33 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/10/> (referer: http://quotes.toscrape.com/page/9/)
2023-01-02 10:53:33 [scrapy.core.engine] DEBUG: Crawled (405) <NONE http://quotes.toscrape.com/author/Suzanne-Collins/> (referer: http://quotes.toscrape.com/page/7/)
2023-01-02 10:53:34 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 http://quotes.toscrape.com/author/Suzanne-Collins/>: HTTP status code is not handled or not allowed
2023-01-02 10:53:34 [scrapy.core.engine] DEBUG: Crawled (405) <NONE http://quotes.toscrape.com/author/W-C-Fields/> (referer: http://quotes.toscrape.com/page/8/)
2023-01-02 10:53:34 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (308) to <NONE http://quotes.toscrape.com/author/John-Lennon/> from <GET http://quotes.toscrape.com/author/John-Lennon>
2023-01-02 10:53:34 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 http://quotes.toscrape.com/author/W-C-Fields/>: HTTP status code is not handled or not allowed
2023-01-02 10:53:34 [scrapy.core.engine] DEBUG: Crawled (405) <NONE http://quotes.toscrape.com/author/Alfred-Tennyson/> (referer: http://quotes.toscrape.com/page/8/)

Answer 1

基本上，使用 response.follow() 你要求解析function 再次關注這個 url。 如果你想將 url 傳遞給另一個 function 那么你需要使用 Scrapy.Request() 而不是 response.follow()。 如果你想將作者的頁面 url 傳遞給parse_author那么你的代碼應該是這樣的。

class AuthorsSpider(scrapy.Spider):
    name = 'authors'
    allowed_domains = ['quotes.toscrape.com']
    start_urls = ['http://quotes.toscrape.com/']
    custom_settings = {
        'CONCURRENT_REQUESTS': 50,
        'DOWNLOAD_DELAY': 0.1,
        'FEED_URI': f'output/authors_{datetime.datetime.today().strftime("%Y-%m-%d %H-%M-%S")}.csv',
        'FEED_FORMAT': 'csv',
        'FEED_EXPORTERS': {'csv': 'scrapy.exporters.CsvItemExporter'},
        'FEED_EXPORT_ENCODING': 'utf-8',
        'FEED_EXPORT_FIELDS': ('name','birth_date','birth_location','description',) 
    }

    def parse(self, response):
        for _ in response.xpath("//div[@class='quote']"):
            author_page = response.xpath("//a[text()='(about)']/@href").get()
            yield scrapy.Request(author_page,
                                method="GET",
                                callback=self.parse_author)

        next_page = response.xpath("//li[@class='next']/a/@href").get()
        if next_page:
            yield response.follow(next_page, self.parse)


    def parse_author(self, response):
        yield {
            'name': response.xpath("//h3[@class='author-title']/text()").get(),
            'birth_date': response.xpath("//span[@class='author-born-date']/text()").get(),
            'birth_location': response.xpath("//span[@class='author-born-location']/text()").get(),
            'description': response.xpath("//div[@class='author-description']/text()").get()
        }

附加圖像 如果您還有任何問題，請回復此答案。 快樂學習..

來自 Scrapy 的響應 405

問題描述

1 個解決方案

解決方案1
0 2023-01-02 08:03:14

來自 Scrapy 的響應 405

問題描述

1 個解決方案

解決方案1 0 2023-01-02 08:03:14

解決方案1
0 2023-01-02 08:03:14