简体   繁体   English

更改 scrapy.getall() 分隔符的简单方法

[英]Simple way to change scrapy .getall() delimiter

I'm running a basic scrapy crawler and I can't seem to find any documentation within scrapy that allows me to change the delimiter of a .getall() .我正在运行一个基本的 scrapy 爬虫,我似乎无法在 scrapy 中找到任何允许我更改.getall()分隔符的文档。 The default appears to be comma separated, but I'm assuming this might cause some errors in data importing elsewhere.默认值似乎是逗号分隔的,但我假设这可能会导致在其他地方导入数据时出现一些错误。

Ideally, I want the exported csv to be comma separated, but the getall() data is pipe or semi-colon separated.理想情况下,我希望导出的 csv 以逗号分隔,但 getall() 数据是 pipe 或分号分隔。 I would prefer to fix this efficiently within the scrapy script.我更愿意在 scrapy 脚本中有效地解决这个问题。 For example, say the bit containing the.getall() is例如,假设包含 the.getall() 的位是

def entry_parse(self, response):
    for entry in response.xpath("//tbody[@class='entry-grid-body infinite']//td[@class]"):
        yield {'entry_labels': entry.xpath(".//div[@class='entry-labels']/span/text()").getall()}

Ideally, it would be nice to be able pass such an argument into getall() or something similar, but I can't seem to find any documentation allowing that.理想情况下,能够将这样的参数传递给 getall() 或类似的东西会很好,但我似乎找不到任何允许这样做的文档。 Any ideas would be helpful.任何想法都会有所帮助。 Thanks.谢谢。

This is not really a problem of scrapy. Since the .getall() method returns a list and the repr of lists have commas by default这不是 scrapy 的真正问题。由于.getall()方法返回一个列表,并且列表的 repr 默认带有逗号

>>>repr(["a","b"])
"['a', 'b']"

you can use json.dumps and change the delimiter before yielding the item using the separators argument您可以使用 json.dumps 并在使用separators参数生成项目之前更改分隔符

import json
def entry_parse(self, response):
    for entry in response.xpath("//tbody[@class='entry-grid-body infinite']//td[@class]"):
        yield {
            'entry_labels': json.dumps(
                entry.xpath(".//div[@class='entry-labels']/span/text()").getall()
                , separators=("|", ":")
                )
        }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM