简体   繁体   English

python迭代列表性能差异列表

[英]python iterating list of lists performance differences

When iterating a list of lists in python 2.7.3 I noticed performance differences when changing the order of the iteration: 当在python 2.7.3中迭代列表列表时,在更改迭代顺序时我注意到了性能差异:

I have a list of 200 lists of 500000 strings. 我有200个500000字符串列表。 I then iterate in the following ways: 然后,我通过以下方式进行迭代:

numberOfRows = len(columns[0])
numberOfColumns = len(columns)

t1 = time.clock()
for i in xrange(numberOfRows):
    for j in xrange(numberOfColumns):
        cell = columns[j][i]
print time.clock() - t1

t1 = time.clock()
for i in xrange(numberOfColumns):
    for j in xrange(numberOfRows):
        cell = columns[i][j]
print time.clock() - t1

The program repeatedly produces outputs similar to this: 该程序反复产生类似于以下内容的输出:

33.97
29.39

Now I expected to have efficient random access on the lists . 现在,我希望可以对列表进行有效的随机访问 Where do these 4 seconds come from; 这4秒从哪里来? is it only caching? 仅缓存吗?

I get something like 我得到类似

30.509407822896037
29.88344778700383

for 对于

columns = [[0] * 500000 for x in range(200)]

If I replace the cell = ... lines with pass , I get 如果我用pass替换cell = ... lines,我得到

8.44722739915369
10.23647023463866

So it's definitely not an issue with creating the xrange objects or something alike. 因此,创建xrange对象或类似对象绝对不是问题。

It's the caching (not by Python, by the computer) of the columns: If I use 这是列的缓存(不是通过Python,不是通过计算机):如果我使用

columns = [[0] * 500000] * 200

I get 我懂了

27.725353873145195
29.592749434295797

Here, always the same column object is used, and there is (almost) no difference in caching. 在这里,始终使用相同的列对象,并且(几乎)在缓存方面没有差异。 Thus (about) the same timing difference as in the pass variant shows. 因此(大约)与pass变量中显示的时间差相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM