比較Python中連續元組列表的第一個元素

Question

我有一個元組列表，每個元組包含兩個元素。 少數子列表的第一個元素很常見。 我想比較這些子列表的第一個元素，並將第二個元素添加到一個列表中。 這是我的清單：

myList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]

我想列出一個列表，看起來像這樣：`

NewList=[(2,3,4,5),(6,7,8),(9,10)]

我希望如果有任何有效的方法。

Answer 1

您可以使用OrderedDict按元組的第一個子元素對元素進行分組：

myList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]

from collections import OrderedDict

od  = OrderedDict()

for a,b in myList:
    od.setdefault(a,[]).append(b)

print(list(od.values()))
[[2, 3, 4, 5], [6, 7, 8], [9, 10]]

如果你真的想要元組：

print(list(map(tuple,od.values())))
[(2, 3, 4, 5), (6, 7, 8), (9, 10)]

如果你不關心元素出現的順序，只想要最有效的分組方法，你可以使用collections.defaultdict ：

from collections import defaultdict

od  = defaultdict(list)

for a,b in myList:
    od[a].append(b)

print(list(od.values()))

最后，如果您的數據按照您的輸入示例排序，即排序，您只需使用itertools.groupby按每個元組的第一個子元素進行分組，並從分組的元組中提取第二個元素：

from itertools import groupby
from operator import itemgetter
print([tuple(t[1] for t in v) for k,v in groupby(myList,key=itemgetter(0))])

輸出：

[(2, 3, 4, 5), (6, 7, 8), (9, 10)]

再次，只有當您的數據至少按第一個元素排序時 ，groupby才會起作用。

在合理大小的列表上的一些時間：

In [33]: myList = [(randint(1,10000),randint(1,10000)) for _ in range(100000)]

In [34]: myList.sort()

In [35]: timeit ([tuple(t[1] for t in v) for k,v in groupby(myList,key=itemgetter(0))])
10 loops, best of 3: 44.5 ms per loop

In [36]: %%timeit                                                               od = defaultdict(list)
for a,b in myList:
    od[a].append(b)
   ....: 
10 loops, best of 3: 33.8 ms per loop

In [37]: %%timeit
dictionary = OrderedDict()
for x, y in myList:
     if x not in dictionary:
        dictionary[x] = [] # new empty list
    dictionary[x].append(y)
   ....: 
10 loops, best of 3: 63.3 ms per loop

In [38]: %%timeit   
od = OrderedDict()
for a,b in myList:
    od.setdefault(a,[]).append(b)
   ....: 
10 loops, best of 3: 80.3 ms per loop

如果訂單很重要且數據已排序，請使用groupby ，如果需要將所有元素映射到defaultdict中的元組，它將更接近defaultdict方法。

如果數據未排序或您不關心任何訂單，您將找不到比使用defaultdict方法更快的分組方式。

Answer 2

這感覺就像字典的任務（如果你還不知道字典，請在python.org上查看）。 這是一個非常冗長的例子，所以它不是我在日常編碼中寫的，但最好是冗長而不清楚：

dictionary = collections.OrderedDict()
for x, y in myList:
    if not dictionary.has_key(x):
        dictionary[x] = [] # new empty list
    # append y to that list
    dictionary[x].append(y)

Answer 3

考慮到這一點， 最有效的方法可能是這個單行（假設dictionary是空dict ，即dictionary = {}或dictionary = OrderedDict()如Padraic'優秀答案）：

for x,y in myList: dictionary.setdefault(x,[]).append(y)

我不是說這是最容易閱讀的方法，但我喜歡它:)

編輯哈！ 基准測試證明我錯了; setdefault方法比if not dictionary.has_key(x): dictionary[x]=[]方法慢：

>>> timeit.timeit("for x,y in myList:\n    if not dictionary.has_key(x):\n        dictionary[x]=[]\n    dictionary[x].append(y)", "from collections import OrderedDict\nmyList=[(1,2),(1,3),(
1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]\ndictionary=OrderedDict()")
2.2573769092559814
>>> timeit.timeit("for x,y in myList: dictionary.setdefault(x,[]).append(y)", "from collections import OrderedDict\nmyList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]\ndictiona
ry=OrderedDict()")
3.3534231185913086

當然，帕德里克還是正確的：他的defaultdict方法使用，但0.82s我的機器上，因此它由3倍的速度更快。

另外，正如Padraic所指出的那樣： dict.has_key(x)已被棄用，而且應該x in dict使用x in dict ; 但是，我無法測量速度差異。

Answer 4

以下應該有效：

import itertools

myList = [(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]
print [tuple(x[1] for x in g) for k, g in itertools.groupby(myList, key=lambda x: x[0])]

哪個顯示：

[(2, 3, 4, 5), (6, 7, 8), (9, 10)]

比較Python中連續元組列表的第一個元素

問題描述

4 個解決方案

解決方案1
6 2015-09-19 10:53:42

解決方案2
4 2015-09-19 11:00:47

解決方案3
2 2015-09-19 11:08:54

解決方案4
1 2015-09-19 11:13:42

比較Python中連續元組列表的第一個元素

問題描述

4 個解決方案

解決方案1 6 2015-09-19 10:53:42

解決方案2 4 2015-09-19 11:00:47

解決方案3 2 2015-09-19 11:08:54

解決方案4 1 2015-09-19 11:13:42

解決方案1
6 2015-09-19 10:53:42

解決方案2
4 2015-09-19 11:00:47

解決方案3
2 2015-09-19 11:08:54

解決方案4
1 2015-09-19 11:13:42