简体   繁体   English

从Python中的元组列表中获取每个元组的第n个元素的最佳方法

[英]Best way to get the nth element of each tuple from a list of tuples in Python

I had some code that contained zip(*G)[0] (and elsewhere, zip(*G)[1] , with a different G). 我有一些代码包含zip(*G)[0] (和其他地方, zip(*G)[1] ,有不同的G)。 G is a list of tuples. G是元组列表。 What this does is return a list of the first (or generally, for zip(*G)[n] , the n-1 th) element of each tuple in G as a tuple. 这样做是返回zip(*G)[n]中每个元组的第一个(或者一般来说,对于zip(*G)[n] ,第n-1个)元素的列表作为元组。 For example, 例如,

>>> G = [(1, 2, 3), ('a', 'b', 'c'), ('you', 'and', 'me')]
>>> zip(*G)[0]
(1, 'a', 'you')
>>> zip(*G)[1]
(2, 'b', 'and')

This is pretty clever and all, but the problem is that it doesn't work in Python 3, because zip is an iterator there. 这非常聪明,但问题是它在Python 3中不起作用,因为zip是一个迭代器。 Furthermore, 2to3 isn't smart enough to fix it. 此外,2to3还不够聪明,无法修复它。 So the obvious solution is to use list(zip(*G))[0] , but that got me thinking: there is probably a more efficient way to do this. 所以显而易见的解决方案是使用list(zip(*G))[0] ,但这让我思考:可能有一种更有效的方法来做到这一点。 There is no need to create all the tuples that zip creates. 无需创建zip创建的所有元组。 I just need the n th element of each tuple in G. 我只需要G中每个元组的第n个元素。

Is there are more efficient, but equally compact way to do this? 是否有更高效,但同样紧凑的方式来做到这一点? I'm OK with anything from the standard library. 我对标准库中的任何东西都很满意。 In my use case, each tuple in G will be at least length n , so there is no need to worry about the case of zip stopping at the smallest length tuple (ie, zip(*G)[n] will always be defined). 在我的用例中,G中的每个元组将至少长度为n ,因此无需担心zip停在最小长度元组的情况(即, zip(*G)[n]将始终定义) 。

If not, I guess I'll just stick with wrapping the zip in list() . 如果没有,我想我会坚持将zip包装在list()

(PS, I know this is unnecessary optimization. I'm just curious is all) (PS,我知道这是不必要的优化。我只是好奇就是全部)

UPDATE: 更新:

In case anyone cares, I went with the t0, t1, t2 = zip(*G) option. 如果有人关心,我选择了t0, t1, t2 = zip(*G)选项。 First, this lets me give meaningful names to the data. 首先,这可以让我为数据提供有意义的名称。 My G actually consists of length 2 tuples (representing numerators and denominators). 我的G实际上由长度为2的元组组成(代表分子和分母)。 A list comprehension would only be marginally more readable than the zip, but this way is much better (and since in most cases the zip was list I was iterating through in a list comprehension, this makes things flatter). 列表理解只比zip更容易阅读,但这种方式要好得多(因为在大多数情况下,zip是列表我在列表理解中迭代,这使事情变得更平坦)。

Second, as noted by @thewolf and @Sven Marnach's excellent answers, this way is faster for smaller lists. 其次,正如@thewolf和@Sven Marnach的优秀答案所指出的,对于较小的列表,这种方式更快。 My G is actually not large in most cases (and if it is large, then this definitely won't be the bottleneck of the code!). 在大多数情况下,我的G实际上并不大(如果它很大,那么这肯定不会是代码的瓶颈!)。

But there were more ways to do this than I expected, including the new a, *b, c = G feature of Python 3 I didn't even know about. 但是有更多的方法可以做到这一点,包括我甚至不知道的Python 3的新的a, *b, c = G特性。

You can use a list comprehension 您可以使用列表理解

[x[0] for x in G]

or operator.itemgetter() operator.itemgetter()

from operator import itemgetter
map(itemgetter(0), G)

or sequence unpacking 或序列拆包

[x for x, y, z in G]

Edit : Here is my take on timing the different options, also in Python 3.2: 编辑 :这是我对不同选项的计时,也在Python 3.2中:

from operator import itemgetter
import timeit

G = list(zip(*[iter(range(30000))] * 3))

def f1():
    return [x[0] for x in G]
def f2():
    return list(map(itemgetter(0), G))
def f3():
    return [x for x, y, z in G]
def f4():
    return list(zip(*G))[0]
def f5():
    c0, *rest = zip(*G)
    return c0
def f6():
    c0, c1, c2 = zip(*G)
    return c0
def f7():
    return next(zip(*G))

for f in f1, f2, f3, f4, f5, f6, f7:
    print(f.__name__, timeit.timeit(f, number=1000))

Results on my machine: 在我的机器上的结果:

f1 0.6753780841827393
f2 0.8274149894714355
f3 0.5576457977294922
f4 0.7980241775512695
f5 0.7952430248260498
f6 0.7965989112854004
f7 0.5748469829559326

Comments: 评论:

  1. I used a list with 10000 triples to measure the actual processing time, and make function call overhead, name lookups etc. negligible, which would otherwise seriously influence the results. 我使用了一个包含10000个三元组的列表来测量实际处理时间,并使函数调用开销,名称查找等可忽略不计,否则会严重影响结果。

  2. The functions return a list or a tuple – whatever is more convenient for the particular solution. 这些函数返回一个列表或一个元组 - 对于特定的解决方案来说更方便。

  3. Compared to the wolf's answer , I removed the redundant call to tuple() from f4() (the result of the expression is a tuple already), and I added a function f7() which only works to extract the first column. 狼的回答相比,我从f4()移除了对tuple()的冗余调用tuple()表达式的结果已经是元组),并且我添加了一个函数f7() ,它只能提取第一列。

As expected, the list comprehensions are fastest, together with the somewhat less general f7() . 正如预期的那样,列表推导速度最快,而且一般不太通用的f7()

Another edit : Here are the results for ten columns instead of three, with the code adapted where appropriate: 另一个编辑 :以下是十列而不是三列的结果,代码在适当的地方进行了调整:

f1 0.7429649829864502
f2 0.881648063659668
f3 1.234360933303833
f4 1.92038893699646
f5 1.9218590259552002
f6 1.9172680377960205
f7 0.6230220794677734

At least the fastest way in Python 2.7 is 至少Python 2.7中最快的方法是

t0,t1,t2=zip(*G) for SMALLER lists and [x[0] for x in G] in general

Here is the test: 这是测试:

from operator import itemgetter

G = [(1, 2, 3), ('a', 'b', 'c'), ('you', 'and', 'me')]

def f1():
   return tuple(x[0] for x in G)

def f2():
   return tuple(map(itemgetter(0), G))

def f3():
    return tuple(x for x, y, z in G)     

def f4():
    return tuple(list(zip(*G))[0])

def f5():
    t0,*the_rest=zip(*G)
    return t0

def f6():
    t0,t1,t2=zip(*G)
    return t0                

cmpthese.cmpthese([f1,f2,f3,f4,f5,f6],c=100000) 

Results: 结果:

    rate/sec     f4     f5     f1     f2     f3     f6
f4   494,220     -- -21.9% -24.1% -24.3% -26.6% -67.6%
f5   632,623  28.0%     --  -2.9%  -3.0%  -6.0% -58.6%
f1   651,190  31.8%   2.9%     --  -0.2%  -3.2% -57.3%
f2   652,457  32.0%   3.1%   0.2%     --  -3.0% -57.3%
f3   672,907  36.2%   6.4%   3.3%   3.1%     -- -55.9%
f6 1,526,645 208.9% 141.3% 134.4% 134.0% 126.9%     --

If you don't care if the result is a list, a list comprehension if faster. 如果您不关心结果是否为列表,则列表理解如果更快。

Here is a more extended benchmark with variable list sizes: 这是一个具有可变列表大小的更加扩展的基准:

from operator import itemgetter
import time
import timeit 
import matplotlib.pyplot as plt

def f1():
   return [x[0] for x in G]

def f1t():
   return tuple([x[0] for x in G])

def f2():
   return tuple([x for x in map(itemgetter(0), G)])

def f3():
    return tuple([x for x, y, z in G])    

def f4():
    return tuple(list(zip(*G))[0])

def f6():
    t0,t1,t2=zip(*G)
    return t0     

n=100    
r=(5,35)
results={f1:[],f1t:[],f2:[],f3:[],f4:[],f6:[]}    
for c in range(*r):
    G=[range(3) for i in range(c)] 
    for f in results.keys():
        t=timeit.timeit(f,number=n)
        results[f].append(float(n)/t)

for f,res in sorted(results.items(),key=itemgetter(1),reverse=True):
    if f.__name__ in ['f6','f1','f1t']:
        plt.plot(res, label=f.__name__,linewidth=2.5)
    else:    
        plt.plot(res, label=f.__name__,linewidth=.5)

plt.ylabel('rate/sec')
plt.xlabel('data size => {}'.format(r))  
plt.legend(loc='upper right')
plt.show()

Which produces this plot for smaller data sizes (5 to 35): 这为较小的数据大小(5到35)产生了这个图:

小

And this output for larger ranges (25 to 250): 此输出的范围更大(25到250):

大

You can see that f1 , a list comprehension is fastest. 你可以看到f1 ,列表理解是最快的。 f6 and f1t trading places as the fastest to return a tuple. f6f1t交易场所最快返回元组。

A very clever Python 3 only way is with starred assignments or extended iterable unpacking : 一个非常聪明的Python 3唯一的方法是使用星号分配或扩展的可迭代解包

>>> G = [(1, 2, 3), ('a', 'b', 'c'), ('you', 'and', 'me')]
>>> items_I_want,*the_rest=zip(*G)
>>> items_I_want
(1, 'a', 'you')
>>> the_rest
[(2, 'b', 'and'), (3, 'c', 'me')]

Since you are writing code for both, you could use explicit unpacking (which works on Python 2 and Python 3): 既然你要为两者编写代码,你可以使用显式解包(适用于Python 2和Python 3):

>>> z1,z2,z3=zip(*G)
>>> z1
(1, 'a', 'you')
>>> z2
(2, 'b', 'and')
>>> z3
(3, 'c', 'me')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM