[英]python iterating list of lists performance differences
When iterating a list of lists in python 2.7.3 I noticed performance differences when changing the order of the iteration: 当在python 2.7.3中迭代列表列表时,在更改迭代顺序时我注意到了性能差异:
I have a list of 200 lists of 500000 strings. 我有200个500000字符串列表。 I then iterate in the following ways:
然后,我通过以下方式进行迭代:
numberOfRows = len(columns[0])
numberOfColumns = len(columns)
t1 = time.clock()
for i in xrange(numberOfRows):
for j in xrange(numberOfColumns):
cell = columns[j][i]
print time.clock() - t1
t1 = time.clock()
for i in xrange(numberOfColumns):
for j in xrange(numberOfRows):
cell = columns[i][j]
print time.clock() - t1
The program repeatedly produces outputs similar to this: 该程序反复产生类似于以下内容的输出:
33.97
29.39
Now I expected to have efficient random access on the lists . 现在,我希望可以对列表进行有效的随机访问 。 Where do these 4 seconds come from;
这4秒从哪里来? is it only caching?
仅缓存吗?
I get something like 我得到类似
30.509407822896037
29.88344778700383
for 对于
columns = [[0] * 500000 for x in range(200)]
If I replace the cell = ...
lines with pass
, I get 如果我用
pass
替换cell = ...
lines,我得到
8.44722739915369
10.23647023463866
So it's definitely not an issue with creating the xrange
objects or something alike. 因此,创建
xrange
对象或类似对象绝对不是问题。
It's the caching (not by Python, by the computer) of the columns: If I use 这是列的缓存(不是通过Python,不是通过计算机):如果我使用
columns = [[0] * 500000] * 200
I get 我懂了
27.725353873145195
29.592749434295797
Here, always the same column object is used, and there is (almost) no difference in caching. 在这里,始终使用相同的列对象,并且(几乎)在缓存方面没有差异。 Thus (about) the same timing difference as in the
pass
variant shows. 因此(大约)与
pass
变量中显示的时间差相同。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.