使用Scrapy搜尋日語網站，但輸出文件中沒有數據

Question

我是Scrapy的新手。 我希望從日語網站中抓取一些數據，但是當我運行以下蜘蛛程序時，它不會在導出的文件上顯示任何數據。 有人能幫助我嗎。

導出為csv格式在shell中也不會顯示任何結果，只是[] 。

這是我的代碼。

import scrapy

class suumotest(scrapy.Spider):

    name = "testsecond"

    start_urls = [
        'https://suumo.jp/jj/chintai/ichiran/FR301FC005/?tc=0401303&tc=0401304&ar=010&bs=040'
    ]

    def parse(self, response):
        # for following property link
        for href in response.css('.property_inner-title+a::attr(href)').extract():
            yield scrapy.Request(response.urljoin(href), callback=self.parse_info)



    # defining parser to extract data   
    def parse_info(self, response):
        def extract_with_css(query):
            return response.css(query).extract_first().strip()

        yield {
          'Title': extract_with_css('h1.section_title::text'),
          'Fee': extract_with_css('td.detailinfo-col--01 span.detailvalue-item-accent::text'),
          'Fee Descrition': extract_with_css('td.detailinfo-col--01 span.detailvalue-item-text::text'),
          'Prop Description': extract_with_css('td.detailinfo-col--03::text'),
          'Prop Address': extract_with_css('td.detailinfo-col--04::text'),
        }

Answer 1

您在parse方法中使用的第一個CSS選擇器在這里有問題：

response.css('.property_inner-title+a::attr(href)').extract()

+是這里的錯誤。 只需將其替換為一個空格，例如：

response.css('.property_inner-title a::attr(href)').extract()

另一個問題在您定義的extract_with_css()函數中：

def parse_info(self, response):
    def extract_with_css(query):
        return response.css(query).extract_first().strip()

這里的問題是，如果未找到任何值並且.strip()是string基類的函數，則默認情況下extract_first()將返回None ，因為沒有字符串，這將引發錯誤。
要解決此問題，您可以將默認值extract_first設置為空字符串，而不是：

def parse_info(self, response):
    def extract_with_css(query):
        return response.css(query).extract_first('').strip()

使用Scrapy搜尋日語網站，但輸出文件中沒有數據

問題描述

1 個解決方案

解決方案1
2 已采納 2017-01-21 23:22:23

使用Scrapy搜尋日語網站，但輸出文件中沒有數據

問題描述

1 個解決方案

解決方案1 2 已采納 2017-01-21 23:22:23

解決方案1
2 已采納 2017-01-21 23:22:23