[英]Joining all rows of a CSV file that have the same 1st column value in Python
I have a CSV file that goes something like this: 我有一个类似这样的CSV文件:
['Name1', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+']
['Name1','','','','','','','','','','','','','','','', ”,“”,“,”,“ +”]
['Name1', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', '']['Name1',“,”,“,”,“,” b“,”,“,”,“,”,“,”,“,”,“ ,“,”,“,”,“]
['Name2', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'a', '']['Name2','','','','','','','','','','','','','','','', '', '', '', '一种', '']
['Name3', '', '', '', '', '+', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']['Name3','','','','','+','','','','','','','','','','' ,“,”,“,”,“]
Now, I need a way to join all of the rows that have the same 1st column name into one column, for instance: 现在,我需要一种将第一列名称相同的所有行连接到一个列的方法,例如:
['Name1', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', '+']
['Name1',“,”,“,”,“,” b“,”,“,”,“,”,“,”,“,”,“ ,“,”,“,”,“ +”]
['Name2', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'a', '']['Name2','','','','','','','','','','','','','','','', '', '', '', '一种', '']
['Name3', '', '', '', '', '+', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']['Name3','','','','','+','','','','','','','','','','' ,“,”,“,”,“]
I can think of a way to do this by sorting the CSV and then going trough each row and column and compare each value, but there should probably be an easier way to do it. 我可以想到一种通过对CSV进行排序然后遍历每一行和每一列并比较每个值的方法,但是应该有一种更简单的方法。
Any ideas? 有任何想法吗?
You should use itertools.groupby: 您应该使用itertools.groupby:
t = [
['Name1', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+'],
['Name1', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', ''],
['Name2', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'a', ''],
['Name3', '', '', '', '', '+', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
]
from itertools import groupby
# TODO: if you need to speed things up you can use operator.itemgetter
# for both sorting and grouping
for name, rows in groupby(sorted(t), lambda x:x[0]):
print join_rows(rows)
It's obvious that you'd implement the merging in a separate function. 显然,您将在单独的函数中实现合并。 For example like this:
例如这样:
def join_rows(rows):
def join_tuple(tup):
for x in tup:
if x:
return x
else:
return ''
return [join_tuple(x) for x in zip(*rows)]
def merge_rows(row1, row2):
# merge two rows with the same name
merged_row = ...
return merged_row
r1 = ['Name1', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+']
r2 = ['Name1', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
r3 = ['Name2', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'a', '']
r4 = ['Name3', '', '', '', '', '+', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
rows = [r1, r2, r3, r4]
data = {}
for row in rows:
name = row[0]
if name in data:
data[name] = merge_rows(row, data[name])
else:
data[name] = row
You now have all the rows in data
where each key of this dictionary is the name and the corresponding value is that row. 现在,您将拥有
data
中的所有行,其中该字典的每个键都是名称,而相应的值是该行。 You can now write this data to a CSV file. 您现在可以将此数据写入CSV文件。
You can also use defaultdict
: 您还可以使用
defaultdict
:
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> _ = [d[i[0]].append(z) for i in t for z in i[1:]]
>>> d['Name1']
['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
Then do your column joining 然后做你的专栏加盟
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.