[英]Joining all rows of a CSV file that have the same 1st column value in Python
我有一個類似這樣的CSV文件:
['Name1','','','','','','','','','','','','','','','', ”,“”,“,”,“ +”]
['Name1',“,”,“,”,“,” b“,”,“,”,“,”,“,”,“,”,“ ,“,”,“,”,“]
['Name2','','','','','','','','','','','','','','','', '', '', '', '一種', '']
['Name3','','','','','+','','','','','','','','','','' ,“,”,“,”,“]
現在,我需要一種將第一列名稱相同的所有行連接到一個列的方法,例如:
['Name1',“,”,“,”,“,” b“,”,“,”,“,”,“,”,“,”,“ ,“,”,“,”,“ +”]
['Name2','','','','','','','','','','','','','','','', '', '', '', '一種', '']
['Name3','','','','','+','','','','','','','','','','' ,“,”,“,”,“]
我可以想到一種通過對CSV進行排序然后遍歷每一行和每一列並比較每個值的方法,但是應該有一種更簡單的方法。
有任何想法嗎?
您應該使用itertools.groupby:
t = [
['Name1', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+'],
['Name1', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', ''],
['Name2', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'a', ''],
['Name3', '', '', '', '', '+', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
]
from itertools import groupby
# TODO: if you need to speed things up you can use operator.itemgetter
# for both sorting and grouping
for name, rows in groupby(sorted(t), lambda x:x[0]):
print join_rows(rows)
顯然,您將在單獨的函數中實現合並。 例如這樣:
def join_rows(rows):
def join_tuple(tup):
for x in tup:
if x:
return x
else:
return ''
return [join_tuple(x) for x in zip(*rows)]
def merge_rows(row1, row2):
# merge two rows with the same name
merged_row = ...
return merged_row
r1 = ['Name1', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+']
r2 = ['Name1', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
r3 = ['Name2', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'a', '']
r4 = ['Name3', '', '', '', '', '+', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
rows = [r1, r2, r3, r4]
data = {}
for row in rows:
name = row[0]
if name in data:
data[name] = merge_rows(row, data[name])
else:
data[name] = row
現在,您將擁有data
中的所有行,其中該字典的每個鍵都是名稱,而相應的值是該行。 您現在可以將此數據寫入CSV文件。
您還可以使用defaultdict
:
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> _ = [d[i[0]].append(z) for i in t for z in i[1:]]
>>> d['Name1']
['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
然后做你的專欄加盟
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.