简体   繁体   English

成长python列表的最佳实践

[英]best practice to grow python list

I have very large list of list table and I need to add more columns to it. 我的列表表列表很大,我需要向其中添加更多列。

tbl = [range(200),range(200),range(200),...]
newCol = [val1, val2]

the way I see it I can do this either: 我看到它的方式可以做到这一点:

for idx,val in enumerate(tbl)
    tbl[idx] = newCol + val

or 要么

colRep = [newCol]*len(tbl)
mgr = itertools.izip(colRep,tbl)
newTbl = [ itertools.chain(*elem) for elem in mgr]

Is one really better than the other? 一个真的比另一个好吗? Is there better way of doing this? 有更好的方法吗?

For readability, a simple list comprehension would do: 出于可读性考虑,简单的列表理解即可:

In [28]: tbl = [range(2),range(3),range(4)]
In [29]: [newCol + list(elt) for elt in tbl]
Out[29]: 
[['val1', 'val2', 0, 1],
 ['val1', 'val2', 0, 1, 2],
 ['val1', 'val2', 0, 1, 2, 3]]

Note that in Python3, range returns a range object, not a list. 请注意,在Python3中, range返回范围对象,而不是列表。 So to make the code Python2- and Python3-compatible, I changed newCol + elt to newCol + list(elt) . 因此,为了使代码与Python2和Python3兼容,我将newCol + elt更改为newCol + list(elt)

If you wish to modify tbl in-place , you could use 如果您希望就地修改tbl ,则可以使用

tbl[:] = [newCol + list(elt) for elt in tbl]

Note that before we can compare performance, we need to pin down what is the desired result, lest we end up comparing apples to oranges. 请注意,在比较性能之前,我们需要确定所需的结果,以免最终将苹果与橙子进行比较。

The for-loop modifies tbl inplace. for-loop就地修改tbl Is the inplace-ness important? 到位重要吗?

The zip/chain code does not modify tbl in-place and instead produces a list of iterators: zip/chain不对tbl进行就地修改,而是生成一个迭代器列表:

In [47]: newTbl
Out[47]: 
[<itertools.chain at 0x7f5aeb0a6750>,
 <itertools.chain at 0x7f5aeb0a6410>,
 <itertools.chain at 0x7f5aeb0a6310>]

That could be what you want, but it would be unfair to compare the performance of these two pieces of code, because the iterators delay the process of enumerating the items inside the iterators. 那可能就是您想要的,但是比较这两段代码的性能将是不公平的,因为iterators延迟枚举iterators各项的过程。 It would be like timing the difference between painting a house and contemplating painting a house. 就像计时房子和考虑油漆房子之间的区别一样。

To make the comparison more fair, we could use list to consume the iterator: 为了使比较更公平,我们可以使用list来消耗迭代器:

newTbl = [ list(itertools.chain(*elem)) for elem in mgr]

To benchmark the performance of the various options, you could use timeit like this: 为了基准测试各种选项的性能,您可以使用timeit如下所示:

import timeit
import itertools

tbl = [range(2),range(3),range(4)]
newCol = ['val1', 'val2']

stmt = {
    'for_loop' : '''\
for idx,val in enumerate(tbl):
    tbl[idx] = newCol + val
''',
    'list_comp': '''tbl = [newCol + elt for elt in tbl]''',
    'inplace_list_comp': '''tbl[:] = [newCol + elt for elt in tbl]''',
    'zip_chain': '''
colRep = [newCol]*len(tbl)
mgr = itertools.izip(colRep,tbl)
newTbl = [ list(itertools.chain(*elem)) for elem in mgr]
'''

}
for s in ('for_loop', 'list_comp', 'inplace_list_comp', 'zip_chain'):
    t = timeit.timeit(
        stmt[s], 
        setup='from __main__ import newCol, itertools; tbl = [range(200)]*10**5',
        number=10)
    print('{:20}: {:0.2f}'.format(s, t))

yields 产量

for_loop            : 1.12
list_comp           : 1.21
inplace_list_comp   : 1.26
zip_chain           : 4.40

So the for_loop may be marginally faster. 因此, for_loop可能会略快一些。 Be sure to check this with tbl closer to you actual use case. 确保在更接近您实际用例的地方使用tbl进行检查。 timeit results may differ for a number of reasons, including hardware, OS, and software versions. timeit结果可能由于多种原因而有所不同,包括硬件,操作系统和软件版本。

Also be aware that this might be senseless pre-optimization if this little piece of code is not a significant bottleneck in your actual code. 还应注意,如果这小段代码不是您实际代码中的重要瓶颈,那么这可能是毫无意义的预优化 For example, if your actual code spends 1.21 seconds in this list comprehension and 1000 seconds elsewhere, a tenth of a second improvement here would be insignificant overall. 例如,如果您的实际代码在此列表理解中花费1.21秒,而在其他地方花费1000秒,那么此处的十分之一秒的改进将是微不足道的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM