简体   繁体   English

Scrapy yield逗号分隔的csv文件

[英]Scrapy yield comma separated csv file

I have made a program to extract materials online as follow. 我制作了一个程序,用于按以下方式在线提取材料。 It works and do generate csv file. 它可以工作并生成csv文件。 However, the data seems not to be comma separated as seen in excel file. 但是,数据似乎不是用逗号分隔,如excel文件所示。 How can I fix that to make the file to be comma separated? 如何解决该问题,使文件以逗号分隔?

import scrapy

class JPItem(scrapy.Item):
question_title = scrapy.Field() 
question_content = scrapy.Field()
question_link = scrapy.Field()
best_answer = scrapy.Field()
best_answer_link = scrapy.Field()

class JPSpider(scrapy.Spider):

name = "jp"
allowed_domains = ['detail.chiebukuro.yahoo.co.jp']

start_urls = [
    'https://detail.chiebukuro.yahoo.co.jp/qa/question_detail/q' + str(x)
    for x in range (10000000000,100000000000)
]

def parse(self, response):
    item = JPItem()

    item['question_title'] = response.css("div.mdPstd.mdPstdQstn.sttsRslvd.clrfx div.ttl h1::text").extract_first()
    item['question_content'] = ''.join([i for i in response.css("div.mdPstdQstn div.ptsQes p::text").extract()])
    item['question_link'] = ''.join(response.css("div.mdPstdQstn p:not([class]) a::text").extract())
    item['best_answer'] = ''.join([i for i in response.css("div.mdPstdBA div.ptsQes p.queTxt::text").extract()])
    item['best_answer_link'] = ''.join(response.css("div.mdPstdBA p:not([class]) a::text").extract())

    yield item

Every item property returns as a list, which is why they look comma-separated in your file. 每个item属性都以列表形式返回,这就是为什么它们在文件中看起来用逗号分隔的原因。 However, the last four item properties you're dealing with won't be lists, because you're using ''.join() on them. 但是,您要处理的最后四个项目属性不会是列表,因为您在它们上使用了''.join() And if you want each list item to populate its own cell in a csv file in Excel, you'll need to iterate through your lists and yield each one separately. 并且,如果希望每个列表项在Excel中的csv文件中填充其自己的单元格,则需要遍历列表并分别yield每个列表项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM