简体   繁体   中英

Best way to get the nth element of each tuple from a list of tuples in Python

I had some code that contained zip(*G)[0] (and elsewhere, zip(*G)[1] , with a different G). G is a list of tuples. What this does is return a list of the first (or generally, for zip(*G)[n] , the n-1 th) element of each tuple in G as a tuple. For example,

>>> G = [(1, 2, 3), ('a', 'b', 'c'), ('you', 'and', 'me')]
>>> zip(*G)[0]
(1, 'a', 'you')
>>> zip(*G)[1]
(2, 'b', 'and')

This is pretty clever and all, but the problem is that it doesn't work in Python 3, because zip is an iterator there. Furthermore, 2to3 isn't smart enough to fix it. So the obvious solution is to use list(zip(*G))[0] , but that got me thinking: there is probably a more efficient way to do this. There is no need to create all the tuples that zip creates. I just need the n th element of each tuple in G.

Is there are more efficient, but equally compact way to do this? I'm OK with anything from the standard library. In my use case, each tuple in G will be at least length n , so there is no need to worry about the case of zip stopping at the smallest length tuple (ie, zip(*G)[n] will always be defined).

If not, I guess I'll just stick with wrapping the zip in list() .

(PS, I know this is unnecessary optimization. I'm just curious is all)

UPDATE:

In case anyone cares, I went with the t0, t1, t2 = zip(*G) option. First, this lets me give meaningful names to the data. My G actually consists of length 2 tuples (representing numerators and denominators). A list comprehension would only be marginally more readable than the zip, but this way is much better (and since in most cases the zip was list I was iterating through in a list comprehension, this makes things flatter).

Second, as noted by @thewolf and @Sven Marnach's excellent answers, this way is faster for smaller lists. My G is actually not large in most cases (and if it is large, then this definitely won't be the bottleneck of the code!).

But there were more ways to do this than I expected, including the new a, *b, c = G feature of Python 3 I didn't even know about.

You can use a list comprehension

[x[0] for x in G]

or operator.itemgetter()

from operator import itemgetter
map(itemgetter(0), G)

or sequence unpacking

[x for x, y, z in G]

Edit : Here is my take on timing the different options, also in Python 3.2:

from operator import itemgetter
import timeit

G = list(zip(*[iter(range(30000))] * 3))

def f1():
    return [x[0] for x in G]
def f2():
    return list(map(itemgetter(0), G))
def f3():
    return [x for x, y, z in G]
def f4():
    return list(zip(*G))[0]
def f5():
    c0, *rest = zip(*G)
    return c0
def f6():
    c0, c1, c2 = zip(*G)
    return c0
def f7():
    return next(zip(*G))

for f in f1, f2, f3, f4, f5, f6, f7:
    print(f.__name__, timeit.timeit(f, number=1000))

Results on my machine:

f1 0.6753780841827393
f2 0.8274149894714355
f3 0.5576457977294922
f4 0.7980241775512695
f5 0.7952430248260498
f6 0.7965989112854004
f7 0.5748469829559326

Comments:

  1. I used a list with 10000 triples to measure the actual processing time, and make function call overhead, name lookups etc. negligible, which would otherwise seriously influence the results.

  2. The functions return a list or a tuple – whatever is more convenient for the particular solution.

  3. Compared to the wolf's answer , I removed the redundant call to tuple() from f4() (the result of the expression is a tuple already), and I added a function f7() which only works to extract the first column.

As expected, the list comprehensions are fastest, together with the somewhat less general f7() .

Another edit : Here are the results for ten columns instead of three, with the code adapted where appropriate:

f1 0.7429649829864502
f2 0.881648063659668
f3 1.234360933303833
f4 1.92038893699646
f5 1.9218590259552002
f6 1.9172680377960205
f7 0.6230220794677734

At least the fastest way in Python 2.7 is

t0,t1,t2=zip(*G) for SMALLER lists and [x[0] for x in G] in general

Here is the test:

from operator import itemgetter

G = [(1, 2, 3), ('a', 'b', 'c'), ('you', 'and', 'me')]

def f1():
   return tuple(x[0] for x in G)

def f2():
   return tuple(map(itemgetter(0), G))

def f3():
    return tuple(x for x, y, z in G)     

def f4():
    return tuple(list(zip(*G))[0])

def f5():
    t0,*the_rest=zip(*G)
    return t0

def f6():
    t0,t1,t2=zip(*G)
    return t0                

cmpthese.cmpthese([f1,f2,f3,f4,f5,f6],c=100000) 

Results:

    rate/sec     f4     f5     f1     f2     f3     f6
f4   494,220     -- -21.9% -24.1% -24.3% -26.6% -67.6%
f5   632,623  28.0%     --  -2.9%  -3.0%  -6.0% -58.6%
f1   651,190  31.8%   2.9%     --  -0.2%  -3.2% -57.3%
f2   652,457  32.0%   3.1%   0.2%     --  -3.0% -57.3%
f3   672,907  36.2%   6.4%   3.3%   3.1%     -- -55.9%
f6 1,526,645 208.9% 141.3% 134.4% 134.0% 126.9%     --

If you don't care if the result is a list, a list comprehension if faster.

Here is a more extended benchmark with variable list sizes:

from operator import itemgetter
import time
import timeit 
import matplotlib.pyplot as plt

def f1():
   return [x[0] for x in G]

def f1t():
   return tuple([x[0] for x in G])

def f2():
   return tuple([x for x in map(itemgetter(0), G)])

def f3():
    return tuple([x for x, y, z in G])    

def f4():
    return tuple(list(zip(*G))[0])

def f6():
    t0,t1,t2=zip(*G)
    return t0     

n=100    
r=(5,35)
results={f1:[],f1t:[],f2:[],f3:[],f4:[],f6:[]}    
for c in range(*r):
    G=[range(3) for i in range(c)] 
    for f in results.keys():
        t=timeit.timeit(f,number=n)
        results[f].append(float(n)/t)

for f,res in sorted(results.items(),key=itemgetter(1),reverse=True):
    if f.__name__ in ['f6','f1','f1t']:
        plt.plot(res, label=f.__name__,linewidth=2.5)
    else:    
        plt.plot(res, label=f.__name__,linewidth=.5)

plt.ylabel('rate/sec')
plt.xlabel('data size => {}'.format(r))  
plt.legend(loc='upper right')
plt.show()

Which produces this plot for smaller data sizes (5 to 35):

小

And this output for larger ranges (25 to 250):

大

You can see that f1 , a list comprehension is fastest. f6 and f1t trading places as the fastest to return a tuple.

A very clever Python 3 only way is with starred assignments or extended iterable unpacking :

>>> G = [(1, 2, 3), ('a', 'b', 'c'), ('you', 'and', 'me')]
>>> items_I_want,*the_rest=zip(*G)
>>> items_I_want
(1, 'a', 'you')
>>> the_rest
[(2, 'b', 'and'), (3, 'c', 'me')]

Since you are writing code for both, you could use explicit unpacking (which works on Python 2 and Python 3):

>>> z1,z2,z3=zip(*G)
>>> z1
(1, 'a', 'you')
>>> z2
(2, 'b', 'and')
>>> z3
(3, 'c', 'me')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM