简体   繁体   中英

flattening list if sublists are same length

I've got a list such as [[1,2], [3,4], [5,6], [7,8], [9,10]] . I want to get [1,2,3,4,5,6,7,8,9,10] .

This question gives some very good options for flattening lists in general. The answers given there work with variable length sublists. Here though, I know that every sublist has the same length (in particular length 2).

I'm wondering if it is possible to take advantage of the homogeneous sublist length to improve on the answers given in the question I linked to. In particular, is there anything that will do better at flattening this list than [item for sublist in l for item in sublist] ?

edit: by 'better', I mean faster for a very long list.

edit:

One thing I did not mention - I do not care about the order of the flattened list (but I care about multiplicity)

import timeit
import itertools
def f0():
    l=[[1,2]]*99
    [item for sublist in l for item in sublist]
def f1():
    l=[[1,2]]*99
    list(itertools.chain.from_iterable(l))
def f2():
    l = [[1,2]]*99
    z = map(list,zip(*l))
    z[0].extend(z[1])

print timeit.timeit("f0()", setup="from __main__ import f0, f1, f2", number=10000)
print timeit.timeit("f1()", setup="from __main__ import f0, f1, f2", number=10000)
print timeit.timeit("f2()", setup="from __main__ import f0, f1, f2", number=10000)

yields the output

0.13874912262
0.103307008743
0.10813999176

Could my zip function be done faster?

A little timing suggest that the list comprehension is slightly faster than the itertools version (for short lists - Hackaholic's answer suggests the reverse is true for long lists):

>>> import timeit
>>> timeit.timeit("[item for sublist in a for item in sublist]", 
                  setup="import itertools; a = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]")
1.7200839519500732
>>> timeit.timeit("list(itertools.chain.from_iterable(a))", 
                  setup="import itertools; a = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]")
2.0097079277038574

The key advantage of the iterative method comes if you can avoid building the whole list , iterating over chain.from_iterable 's output rather than passing it to the list constructor.

If you are doing operations on arrays and performance is a key consideration, consider using numpy , which, although not part of the standard library, is much faster (once you have the array):

>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
>>> a
array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10]])
>>> a.ravel()
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
>>> timeit.timeit("a.ravel()",
                  setup="import numpy as np; a = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])")
0.36390113830566406
import itertools
a = [[1,2], [3,4], [5,6], [7,8], [9,10]]
list(itertools.chain.from_iterable(a))

output:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

now compare timing here: for small list

>>> timeit.timeit("list(itertools.chain.from_iterable(a))",setup='import itertools;a = [[1,2], [3,4], [5,6], [7,8], [9,10]]') 
0.9853601455688477
>>> timeit.timeit("[ y for x in a for y in x]",setup='a = [[1,2], [3,4], [5,6], [7,8], [9,10]]') 
0.9124641418457031

for large list:

here are the result why iterators are prefered:

>>> timeit.timeit("list(itertools.chain.from_iterable(a))",setup='import itertools;a = zip(range(100),range(100))',number=1000000) 
8.213459014892578
>>> timeit.timeit("[ y for x in a for y in x]",setup='a=zip(range(100),range(100))',number=1000000) 
12.833590984344482

from small list, list comprehension is good but for large you need to use iterators

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM