连接Python中具有相同第一列值的CSV文件的所有行

Question

I have a CSV file that goes something like this: 我有一个类似这样的CSV文件：

['Name1', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+'] ['Name1'，''，''，''，''，''，''，''，''，''，''，''，''，''，''，''， ”，“”，“，”，“ +”]
['Name1', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', ''] ['Name1'，“，”，“，”，“，” b“，”，“，”，“，”，“，”，“，”，“ ，“，”，“，”，“]
['Name2', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'a', ''] ['Name2'，''，''，''，''，''，''，''，''，''，''，''，''，''，''，''， ''， ''， ''， '一种'， '']
['Name3', '', '', '', '', '+', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''] ['Name3'，''，''，''，''，'+'，''，''，''，''，''，''，''，''，''，'' ，“，”，“，”，“]

Now, I need a way to join all of the rows that have the same 1st column name into one column, for instance: 现在，我需要一种将第一列名称相同的所有行连接到一个列的方法，例如：

['Name1', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', '+'] ['Name1'，“，”，“，”，“，” b“，”，“，”，“，”，“，”，“，”，“ ，“，”，“，”，“ +”]
['Name2', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'a', ''] ['Name2'，''，''，''，''，''，''，''，''，''，''，''，''，''，''，''， ''， ''， ''， '一种'， '']
['Name3', '', '', '', '', '+', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''] ['Name3'，''，''，''，''，'+'，''，''，''，''，''，''，''，''，''，'' ，“，”，“，”，“]

I can think of a way to do this by sorting the CSV and then going trough each row and column and compare each value, but there should probably be an easier way to do it. 我可以想到一种通过对CSV进行排序然后遍历每一行和每一列并比较每个值的方法，但是应该有一种更简单的方法。

Any ideas? 有任何想法吗？

Answer 1

You should use itertools.groupby: 您应该使用itertools.groupby：

t = [ 
['Name1', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+'],
['Name1', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', ''],
['Name2', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'a', ''],
['Name3', '', '', '', '', '+', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''] 
]

from itertools import groupby

# TODO: if you need to speed things up you can use operator.itemgetter
# for both sorting and grouping
for name, rows in groupby(sorted(t), lambda x:x[0]):
    print join_rows(rows)

It's obvious that you'd implement the merging in a separate function. 显然，您将在单独的函数中实现合并。 For example like this: 例如这样：

def join_rows(rows):
    def join_tuple(tup):
        for x in tup:
            if x: 
                return x
        else:
            return ''
    return [join_tuple(x) for x in zip(*rows)]

Answer 2

def merge_rows(row1, row2):
    # merge two rows with the same name
    merged_row = ...
    return merged_row

r1 = ['Name1', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+']
r2 = ['Name1', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
r3 = ['Name2', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'a', '']
r4 = ['Name3', '', '', '', '', '+', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
rows = [r1, r2, r3, r4]
data = {}
for row in rows:
    name = row[0]
    if name in data:
        data[name] = merge_rows(row, data[name])
    else:
        data[name] = row

You now have all the rows in data where each key of this dictionary is the name and the corresponding value is that row. 现在，您将拥有data中的所有行，其中该字典的每个键都是名称，而相应的值是该行。 You can now write this data to a CSV file. 您现在可以将此数据写入CSV文件。

Answer 3

You can also use defaultdict : 您还可以使用defaultdict ：

>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> _ = [d[i[0]].append(z) for i in t for z in i[1:]]
>>> d['Name1']
['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', '']

Then do your column joining 然后做你的专栏加盟

连接Python中具有相同第一列值的CSV文件的所有行

问题描述

3 个解决方案

解决方案1
3 已采纳 2012-06-14 11:43:33

解决方案2
1 2012-06-14 11:25:57

解决方案3
0 2012-06-14 12:38:43

连接Python中具有相同第一列值的CSV文件的所有行

问题描述

3 个解决方案

解决方案1 3 已采纳 2012-06-14 11:43:33

解决方案2 1 2012-06-14 11:25:57

解决方案3 0 2012-06-14 12:38:43

解决方案1
3 已采纳 2012-06-14 11:43:33

解决方案2
1 2012-06-14 11:25:57

解决方案3
0 2012-06-14 12:38:43