简体   繁体   English

Python列表与数组:出现意外性能差异的原因

[英]Python list vs. array: reason for the unexpected performance difference

I'm studying algorithms and data structures at the moment. 我目前正在研究算法和数据结构。

I thought I would run a quick timeit.timeit test for iterating through a list of 2**30 random integers in a list() comparing to the same for the array.array format. 我以为我会运行一个快速的timeit.timeit测试,以遍历list()中2 ** 30个随机整数的list() ,与array.array格式的相同。

I was expecting the array to finish first as one of the few muted benefits I have seen on other posts with a Python array is performance (I was initially wrongly under the impression that the list was implemented as a linked list: thank you for the correction Duncan) 我期望数组首先完成,因为我在其他有关Python数组的文章中看到的一些无声的好处之一就是性能(我最初错误地认为列表是作为链接列表实现的:谢谢您的纠正。邓肯)

Surely an array should be at least as quick as a list? 当然,数组至少应该和列表一样快?

import os
import array
l = list(os.urandom(2**30))
a = array.array('I', l)

def test_list():
 for i in l:
  pass

def test_array():
 for i in a:
  pass

>>> timeit.timeit(test_array, number=5)
50.08525877200009
>>> timeit.timeit(test_list, number=5)
37.00491460799958

Here's my platform information: Python 3.6.5, [GCC 7.3.0] on linux x86_64 (Intel i5 4660) 这是我的平台信息:Linux x86_64(Intel i5 4660)上的Python 3.6.5,[GCC 7.3.0]

First you initialise l to a list of 2**30 Python int values. 首先,将l初始化为2 ** 30个Python int值的列表。

Second you initialise a from the list to create a list of 2**30 C integers. 其次,您从列表中初始化a来创建2 ** 30个C整数的列表。

test_list iterates over the list of Python int values. test_list遍历Python int值列表。 No Python objects are created or destroyed in this process, just a reference counter on each one gets incremented and then decremented. 在此过程中,不会创建或销毁Python对象,只是每个对象上的引用计数器都会递增然后递减。

test_array iterates over the list of C integers creating a new Python int for each element and then destroying it again. test_array遍历C整数列表,为每个元素创建一个新的Python int ,然后再次销毁它。 That's why the array is slower: it creates and destroys 2**30 Python objects. 这就是数组速度较慢的原因:它创建并销毁了2 ** 30个Python对象。

Internally a Python list is just an array of pointers to the objects it contains. 在内部,Python列表只是指向它包含的对象的指针的数组。 That means iterating over the list is as simple and as fast as iterating over an array. 这意味着遍历列表与遍历数组一样简单且一样快。 The array type here will be using less memory overall (or it would be if you hadn't held on to the list) as C integers are much smaller than Python objects, but each access into the array has to convert the C value into a Python object and while object creation is heavily optimised it still takes more time than just getting another reference to an existing object. 由于C整数比Python对象小得多,所以这里的array类型总体上将使用较少的内存(或者如果您没有保留在列表中,则可能是这样),但是对数组的每次访问都必须将C值转换为Python对象,虽然对对象创建进行了充分的优化,但它仍然需要花费更多的时间,而不仅仅是获得对现有对象的另一个引用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM