如何在Python中合并具有相同ID的多个列表中的元素？

Question

I have a text file with 670,000 + lines need to process. 我有一个需要处理670,000 +行的文本文件。 Each line has the format of: 每行的格式为：

uid, a, b, c, d, x, y, x1, y1, t, 0,

I did some cleanning and transferred each line to a list: 我做了一些清理，并将每一行转移到一个列表中：

[uid,(x,y,t)]

And my question is: How can I merge (x,y,t)tuples in different lists but have the common uid efficiently? 我的问题是：如何合并不同列表中的（x，y，t）元组，但是有效地拥有公共uid？

For example: I have multiple lists 例如：我有多个列表

[uid1,(x1,y1,t1)]
[uid1,(x2,y2,t2)]
[uid2,(x3,y3,t3)]
[uid3,(x4,y4,t4)]
[uid2,(x5,y5,t5)]
......

And I want to transfer them into: 我想将它们转换为：

[uid1,(x1,y1,t1), (x2,y2,z2)]
[uid2,(x3,y3,t3), (x5,52,z5)]
[uid3,(x4,y4,t4)]
......

Any help would be really appreciated. 任何帮助将非常感激。

Answer 1

You can use the groupby method from itertools . 您可以使用itertools的groupby方法。 Considering you have your original lists in a variable called lists : 考虑到您的原始列表位于一个名为lists的变量lists ：

from itertools import groupby

lists = sorted(lists) # Necessary step to use groupby
grouped_list = groupby(lists, lambda x: x[0])
grouped_list = [(x[0], [k[1] for k in list(x[1])]) for x in grouped_list]

Answer 2

Just use a defaultdict . 只需使用defaultdict 。

import collections

def group_items(items):
    grouped_dict = collections.defaultdict(list)
    for item in items:
        uid = item[0]
        t = item[1]
        grouped_dict[uid].append(t)

    grouped_list = []
    for uid, tuples in grouped_dict.iteritems():
        grouped_list.append([uid] + tuples)

    return grouped_list

items is a list of your initial lists. items是您的初始列表的列表。 grouped_list will be a list of the grouped lists by uid. grouped_list将是uid分组列表的列表。

Answer 3

If your data is stored in a dataframe, you can use .groupby to group by the 'uid', and if you transform the values (x,t,v) to a tuple ((x,t,v),) , you can .sum them (ie concatenate them). 如果数据存储在数据.groupby ，则可以使用.groupby来对'uid'进行分组，如果将值（x，t，v）转换为元组((x,t,v),) ，则可以可以.sum它们相加（即连接它们）。

Here's an example: 这是一个例子：

df = pd.DataFrame.from_records(
    [['a',(1,2,3)],
    ['b',(1,2,3)],
    ['a',(10,9,8)]], columns = ['uid', 'foo']
)

df.apply({'uid': lambda x: x, 'foo': lambda x: (x,)}).groupby('uid').sum()

On my end, it produced: 就我而言，它产生了：

uid foo
a   ((1, 2, 3), (10, 9, 8))
b   ((1, 2, 3),)

Answer 4

How about using defaultdict, like this: 如何使用defaultdict，像这样：

L = [['uid1',(x1,y1,t1)],
        ['uid1',(x2,y2,t2)],
        ['uid2',(x3,y3,t3)],
        ['uid3',(x4,y4,t4)],
        ['uid2',(x5,y5,t5)]]


from collections import defaultdict

dd = defaultdict(list)

for i in L:
    dd[i[0]].append(i[1])

The output: print(dd) 输出： print（dd）

defaultdict(list,
            {'uid1': [(x1, y1, t1), (x2, y2, t2)],
             'uid2': [(x3, y3, t3), (x5, y5, t5)],
             'uid3': [(x4, y4, t4)]})

如何在Python中合并具有相同ID的多个列表中的元素？

问题描述

4 个解决方案

解决方案1
1 2019-08-15 10:26:27

解决方案2
1 已采纳 2019-08-15 10:31:54

解决方案3
0 2019-08-15 10:32:12

解决方案4
0 2019-08-15 10:48:21

如何在Python中合并具有相同ID的多个列表中的元素？

问题描述

4 个解决方案

解决方案1 1 2019-08-15 10:26:27

解决方案2 1 已采纳 2019-08-15 10:31:54

解决方案3 0 2019-08-15 10:32:12

解决方案4 0 2019-08-15 10:48:21

解决方案1
1 2019-08-15 10:26:27

解决方案2
1 已采纳 2019-08-15 10:31:54

解决方案3
0 2019-08-15 10:32:12

解决方案4
0 2019-08-15 10:48:21