[英]Simple way to change scrapy .getall() delimiter
I'm running a basic scrapy crawler and I can't seem to find any documentation within scrapy that allows me to change the delimiter of a .getall()
.我正在运行一个基本的 scrapy 爬虫,我似乎无法在 scrapy 中找到任何允许我更改
.getall()
分隔符的文档。 The default appears to be comma separated, but I'm assuming this might cause some errors in data importing elsewhere.默认值似乎是逗号分隔的,但我假设这可能会导致在其他地方导入数据时出现一些错误。
Ideally, I want the exported csv to be comma separated, but the getall() data is pipe or semi-colon separated.理想情况下,我希望导出的 csv 以逗号分隔,但 getall() 数据是 pipe 或分号分隔。 I would prefer to fix this efficiently within the scrapy script.
我更愿意在 scrapy 脚本中有效地解决这个问题。 For example, say the bit containing the.getall() is
例如,假设包含 the.getall() 的位是
def entry_parse(self, response):
for entry in response.xpath("//tbody[@class='entry-grid-body infinite']//td[@class]"):
yield {'entry_labels': entry.xpath(".//div[@class='entry-labels']/span/text()").getall()}
Ideally, it would be nice to be able pass such an argument into getall() or something similar, but I can't seem to find any documentation allowing that.理想情况下,能够将这样的参数传递给 getall() 或类似的东西会很好,但我似乎找不到任何允许这样做的文档。 Any ideas would be helpful.
任何想法都会有所帮助。 Thanks.
谢谢。
This is not really a problem of scrapy. Since the .getall()
method returns a list and the repr of lists have commas by default这不是 scrapy 的真正问题。由于
.getall()
方法返回一个列表,并且列表的 repr 默认带有逗号
>>>repr(["a","b"])
"['a', 'b']"
you can use json.dumps and change the delimiter before yielding the item using the separators
argument您可以使用 json.dumps 并在使用
separators
参数生成项目之前更改分隔符
import json
def entry_parse(self, response):
for entry in response.xpath("//tbody[@class='entry-grid-body infinite']//td[@class]"):
yield {
'entry_labels': json.dumps(
entry.xpath(".//div[@class='entry-labels']/span/text()").getall()
, separators=("|", ":")
)
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.