简体   繁体   中英

best practice to grow python list

I have very large list of list table and I need to add more columns to it.

tbl = [range(200),range(200),range(200),...]
newCol = [val1, val2]

the way I see it I can do this either:

for idx,val in enumerate(tbl)
    tbl[idx] = newCol + val

or

colRep = [newCol]*len(tbl)
mgr = itertools.izip(colRep,tbl)
newTbl = [ itertools.chain(*elem) for elem in mgr]

Is one really better than the other? Is there better way of doing this?

For readability, a simple list comprehension would do:

In [28]: tbl = [range(2),range(3),range(4)]
In [29]: [newCol + list(elt) for elt in tbl]
Out[29]: 
[['val1', 'val2', 0, 1],
 ['val1', 'val2', 0, 1, 2],
 ['val1', 'val2', 0, 1, 2, 3]]

Note that in Python3, range returns a range object, not a list. So to make the code Python2- and Python3-compatible, I changed newCol + elt to newCol + list(elt) .

If you wish to modify tbl in-place , you could use

tbl[:] = [newCol + list(elt) for elt in tbl]

Note that before we can compare performance, we need to pin down what is the desired result, lest we end up comparing apples to oranges.

The for-loop modifies tbl inplace. Is the inplace-ness important?

The zip/chain code does not modify tbl in-place and instead produces a list of iterators:

In [47]: newTbl
Out[47]: 
[<itertools.chain at 0x7f5aeb0a6750>,
 <itertools.chain at 0x7f5aeb0a6410>,
 <itertools.chain at 0x7f5aeb0a6310>]

That could be what you want, but it would be unfair to compare the performance of these two pieces of code, because the iterators delay the process of enumerating the items inside the iterators. It would be like timing the difference between painting a house and contemplating painting a house.

To make the comparison more fair, we could use list to consume the iterator:

newTbl = [ list(itertools.chain(*elem)) for elem in mgr]

To benchmark the performance of the various options, you could use timeit like this:

import timeit
import itertools

tbl = [range(2),range(3),range(4)]
newCol = ['val1', 'val2']

stmt = {
    'for_loop' : '''\
for idx,val in enumerate(tbl):
    tbl[idx] = newCol + val
''',
    'list_comp': '''tbl = [newCol + elt for elt in tbl]''',
    'inplace_list_comp': '''tbl[:] = [newCol + elt for elt in tbl]''',
    'zip_chain': '''
colRep = [newCol]*len(tbl)
mgr = itertools.izip(colRep,tbl)
newTbl = [ list(itertools.chain(*elem)) for elem in mgr]
'''

}
for s in ('for_loop', 'list_comp', 'inplace_list_comp', 'zip_chain'):
    t = timeit.timeit(
        stmt[s], 
        setup='from __main__ import newCol, itertools; tbl = [range(200)]*10**5',
        number=10)
    print('{:20}: {:0.2f}'.format(s, t))

yields

for_loop            : 1.12
list_comp           : 1.21
inplace_list_comp   : 1.26
zip_chain           : 4.40

So the for_loop may be marginally faster. Be sure to check this with tbl closer to you actual use case. timeit results may differ for a number of reasons, including hardware, OS, and software versions.

Also be aware that this might be senseless pre-optimization if this little piece of code is not a significant bottleneck in your actual code. For example, if your actual code spends 1.21 seconds in this list comprehension and 1000 seconds elsewhere, a tenth of a second improvement here would be insignificant overall.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM