[英]Arranging one items per one column in a row of csv file in scrapy python
I had items that scraped from a site which i placed them in to json files like below 我从网站上抓取了一些物品,并将其放入如下所示的json文件中
{
"author": ["TIM ROCK"],
"book_name": ["Truk Lagoon, Pohnpei & Kosrae Dive Guide"],
"category": "Travel",
}
{
"author": ["JOY"],
"book_name": ["PARSER"],
"category": "Accomp",
}
I want to store them in csv file with one dictionary per one row in which one item per one column as below 我想将它们存储在csv文件中,每行一本字典,其中每一列一本,如下所示
| author | book_name | category |
| TIM ROCK | Truk Lagoon ... | Travel |
| JOY | PARSER | Accomp |
i am getting the items of one dictionary in one row but with all the columns combined 我在一排中得到一本词典的项,但所有列都合并了
My pipeline.py
code is 我的pipeline.py
代码是
import csv 导入csv
class Blurb2Pipeline(object):
def __init__(self):
self.brandCategoryCsv = csv.writer(open('blurb.csv', 'wb'))
self.brandCategoryCsv.writerow(['book_name', 'author','category'])
def process_item(self, item, spider):
self.brandCategoryCsv.writerow([item['book_name'].encode('utf-8'),
item['author'].encode('utf-8'),
item['category'].encode('utf-8'),
])
return item
The gist is this is very simple with csv.DictWriter
: 要点是使用csv.DictWriter
非常简单:
>>> inputs = [{
... "author": ["TIM ROCK"],
... "book_name": ["Truk Lagoon, Pohnpei & Kosrae Dive Guide"],
... "category": "Travel",
... },
... {
... "author": ["JOY"],
... "book_name": ["PARSER"],
... "category": "Accomp",
... }
... ]
>>>
>>> from csv import DictWriter
>>> from cStringIO import StringIO
>>>
>>> buf=StringIO()
>>> c=DictWriter(buf, fieldnames=['author', 'book_name', 'category'])
>>> c.writeheader()
>>> c.writerows(inputs)
>>> print buf.getvalue()
author,book_name,category
['TIM ROCK'],"['Truk Lagoon, Pohnpei & Kosrae Dive Guide']",Travel
['JOY'],['PARSER'],Accomp
It would be better to join those arrays on something, but since elements can be a list or a string , it's a bit tricky. 将这些数组连接到某个对象上会更好,但是由于元素可以是列表或字符串 ,所以有点棘手。 Telling if something is a string or some-other-iterable is one of the few cases in Python where direct type-checking makes good sense. 在Python中为数不多的一种情况下,直接进行类型检查很有意义,这是告诉某些情况是字符串还是其他项可迭代的。
>>> for row in inputs:
... for k, v in row.iteritems():
... if not isinstance(v, basestring):
... try:
... row[k] = ', '.join(v)
... except TypeError:
... pass
... c.writerow(row)
...
>>> print buf.getvalue()
author,book_name,category
TIM ROCK,"Truk Lagoon, Pohnpei & Kosrae Dive Guide",Travel
JOY,PARSER,Accomp
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.