简体   繁体   English

比较Python中连续元组列表的第一个元素

[英]Comparing first element of the consecutive lists of tuples in Python

I have a list of tuples, each containing two elements. 我有一个元组列表,每个元组包含两个元素。 The first element of few sublists is common. 少数子列表的第一个元素很常见。 I want to compare the first element of these sublists and append the second element in one lists. 我想比较这些子列表的第一个元素,并将第二个元素添加到一个列表中。 Here is my list: 这是我的清单:

myList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]

I would like to make a list of lists out of it which looks something like this:` 我想列出一个列表,看起来像这样:`

NewList=[(2,3,4,5),(6,7,8),(9,10)]

I hope if there is any efficient way. 我希望如果有任何有效的方法。

You can use an OrderedDict to group the elements by the first subelement of each tuple: 您可以使用OrderedDict按元组的第一个子元素对元素进行分组:

myList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]

from collections import OrderedDict

od  = OrderedDict()

for a,b in myList:
    od.setdefault(a,[]).append(b)

print(list(od.values()))
[[2, 3, 4, 5], [6, 7, 8], [9, 10]]

If you really want tuples: 如果你真的想要元组:

print(list(map(tuple,od.values())))
[(2, 3, 4, 5), (6, 7, 8), (9, 10)]

If you did not care about the order the elements appeared and just wanted the most efficient way to group you could use a collections.defaultdict : 如果你不关心元素出现的顺序,只想要最有效的分组方法,你可以使用collections.defaultdict

from collections import defaultdict

od  = defaultdict(list)

for a,b in myList:
    od[a].append(b)

print(list(od.values()))

Lastly, if your data is in order as per your input example ie sorted you could simply use itertools.groupby to group by the first subelement from each tuple and extract the second element from the grouped tuples: 最后,如果您的数据按照您的输入示例排序,即排序,您只需使用itertools.groupby按每个元组的第一个子元素进行分组,并从分组的元组中提取第二个元素:

from itertools import groupby
from operator import itemgetter
print([tuple(t[1] for t in v) for k,v in groupby(myList,key=itemgetter(0))])

Output: 输出:

[(2, 3, 4, 5), (6, 7, 8), (9, 10)]

Again the groupby will only work if your data is sorted by at least the first element. 再次,只有当您的数据至少按第一个元素排序时 ,groupby才会起作用。

Some timings on a reasonable sized list: 在合理大小的列表上的一些时间:

In [33]: myList = [(randint(1,10000),randint(1,10000)) for _ in range(100000)]

In [34]: myList.sort()

In [35]: timeit ([tuple(t[1] for t in v) for k,v in groupby(myList,key=itemgetter(0))])
10 loops, best of 3: 44.5 ms per loop

In [36]: %%timeit                                                               od = defaultdict(list)
for a,b in myList:
    od[a].append(b)
   ....: 
10 loops, best of 3: 33.8 ms per loop

In [37]: %%timeit
dictionary = OrderedDict()
for x, y in myList:
     if x not in dictionary:
        dictionary[x] = [] # new empty list
    dictionary[x].append(y)
   ....: 
10 loops, best of 3: 63.3 ms per loop

In [38]: %%timeit   
od = OrderedDict()
for a,b in myList:
    od.setdefault(a,[]).append(b)
   ....: 
10 loops, best of 3: 80.3 ms per loop

If order matters and the data is sorted , go with the groupby , it will get even closer to the defaultdict approach if it is necessary to map all the elements to tuple in the defaultdict. 如果订单很重要且数据已排序 ,请使用groupby ,如果需要将所有元素映射到defaultdict中的元组,它将更接近defaultdict方法。

If the data is not sorted or you don't care about any order, you won't find a faster way to group than using the defaultdict approach. 如果数据未排序或您不关心任何订单,您将找不到比使用defaultdict方法更快的分组方式。

This feels like a task for a dictionary (if you don't know dictionaries yet, look them up on python.org). 这感觉就像字典的任务(如果你还不知道字典,请在python.org上查看)。 This is a very verbose example, so it's not what I'd write in everyday coding, but it's better to be verbose than unclear: 这是一个非常冗长的例子,所以它不是我在日常编码中写的,但最好是冗长而不清楚:

dictionary = collections.OrderedDict()
for x, y in myList:
    if not dictionary.has_key(x):
        dictionary[x] = [] # new empty list
    # append y to that list
    dictionary[x].append(y)

Having thought about this, the most efficient approach is probably this one-liner (assuming dictionary is an empty dict , ie dictionary = {} or dictionary = OrderedDict() like in Padraic' excellent answer ): 考虑到这一点, 最有效的方法可能是这个单行(假设dictionary是空dict ,即dictionary = {}dictionary = OrderedDict()Padraic'优秀答案 ):

for x,y in myList: dictionary.setdefault(x,[]).append(y)

I'm not saying this is the easiest to read approach, but I like it :) 我不是说这是最容易阅读的方法,但我喜欢它:)

EDIT Ha! 编辑哈! Benchmarking proved me wrong; 基准测试证明我错了; the setdefault approach is slower than the if not dictionary.has_key(x): dictionary[x]=[] approach: setdefault方法比if not dictionary.has_key(x): dictionary[x]=[]方法慢:

>>> timeit.timeit("for x,y in myList:\n    if not dictionary.has_key(x):\n        dictionary[x]=[]\n    dictionary[x].append(y)", "from collections import OrderedDict\nmyList=[(1,2),(1,3),(
1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]\ndictionary=OrderedDict()")
2.2573769092559814
>>> timeit.timeit("for x,y in myList: dictionary.setdefault(x,[]).append(y)", "from collections import OrderedDict\nmyList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]\ndictiona
ry=OrderedDict()")
3.3534231185913086

Of course, Padraic was still right: his defaultdict approach uses but 0.82s on my machine, so it's faster by a factor of 3. 当然,帕德里克还是正确的:他的defaultdict方法使用,但0.82s我的机器上,因此它由3倍的速度更快。

Also, as Padraic pointed out: dict.has_key(x) has been deprecated, and one should use x in dict instead; 另外,正如Padraic所指出的那样: dict.has_key(x)已被弃用,而且应该x in dict使用x in dict ; however, I couldn't measure a speed difference. 但是,我无法测量速度差异。

The following should work: 以下应该有效:

import itertools

myList = [(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]
print [tuple(x[1] for x in g) for k, g in itertools.groupby(myList, key=lambda x: x[0])]

Which displays: 哪个显示:

[(2, 3, 4, 5), (6, 7, 8), (9, 10)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM