简体   繁体   English

如何在Python中合并具有相同ID的多个列表中的元素?

[英]How to merge elements from multiple lists with same ID in Python?

I have a text file with 670,000 + lines need to process. 我有一个需要处理670,000 +行的文本文件。 Each line has the format of: 每行的格式为:

uid, a, b, c, d, x, y, x1, y1, t, 0,

I did some cleanning and transferred each line to a list: 我做了一些清理,并将每一行转移到一个列表中:

[uid,(x,y,t)]

And my question is: How can I merge (x,y,t)tuples in different lists but have the common uid efficiently? 我的问题是:如何合并不同列表中的(x,y,t)元组,但是有效地拥有公共uid?

For example: I have multiple lists 例如:我有多个列表

[uid1,(x1,y1,t1)]
[uid1,(x2,y2,t2)]
[uid2,(x3,y3,t3)]
[uid3,(x4,y4,t4)]
[uid2,(x5,y5,t5)]
......

And I want to transfer them into: 我想将它们转换为:

[uid1,(x1,y1,t1), (x2,y2,z2)]
[uid2,(x3,y3,t3), (x5,52,z5)]
[uid3,(x4,y4,t4)]
......

Any help would be really appreciated. 任何帮助将非常感激。

You can use the groupby method from itertools . 您可以使用itertoolsgroupby方法。 Considering you have your original lists in a variable called lists : 考虑到您的原始列表位于一个名为lists的变量lists

from itertools import groupby

lists = sorted(lists) # Necessary step to use groupby
grouped_list = groupby(lists, lambda x: x[0])
grouped_list = [(x[0], [k[1] for k in list(x[1])]) for x in grouped_list]      

Just use a defaultdict . 只需使用defaultdict

import collections

def group_items(items):
    grouped_dict = collections.defaultdict(list)
    for item in items:
        uid = item[0]
        t = item[1]
        grouped_dict[uid].append(t)

    grouped_list = []
    for uid, tuples in grouped_dict.iteritems():
        grouped_list.append([uid] + tuples)

    return grouped_list

items is a list of your initial lists. items是您的初始列表的列表。 grouped_list will be a list of the grouped lists by uid. grouped_list将是uid分组列表的列表。

If your data is stored in a dataframe, you can use .groupby to group by the 'uid', and if you transform the values (x,t,v) to a tuple ((x,t,v),) , you can .sum them (ie concatenate them). 如果数据存储在数据.groupby ,则可以使用.groupby来对'uid'进行分组,如果将值(x,t,v)转换为元组((x,t,v),) ,则可以可以.sum它们相加(即连接它们)。

Here's an example: 这是一个例子:

df = pd.DataFrame.from_records(
    [['a',(1,2,3)],
    ['b',(1,2,3)],
    ['a',(10,9,8)]], columns = ['uid', 'foo']
)

df.apply({'uid': lambda x: x, 'foo': lambda x: (x,)}).groupby('uid').sum()

On my end, it produced: 就我而言,它产生了:

uid foo
a   ((1, 2, 3), (10, 9, 8))
b   ((1, 2, 3),)

How about using defaultdict, like this: 如何使用defaultdict,像这样:

L = [['uid1',(x1,y1,t1)],
        ['uid1',(x2,y2,t2)],
        ['uid2',(x3,y3,t3)],
        ['uid3',(x4,y4,t4)],
        ['uid2',(x5,y5,t5)]]


from collections import defaultdict

dd = defaultdict(list)

for i in L:
    dd[i[0]].append(i[1])

The output: print(dd) 输出: print(dd)

defaultdict(list,
            {'uid1': [(x1, y1, t1), (x2, y2, t2)],
             'uid2': [(x3, y3, t3), (x5, y5, t5)],
             'uid3': [(x4, y4, t4)]})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM