简体   繁体   English

在 python 中创建列表的最佳和/或最快的方法

[英]Best and/or fastest way to create lists in python

In python, as far as I know, there are at least 3 to 4 ways to create and initialize lists of a given size:在 python 中,据我所知,至少有 3 到 4 种方法来创建和初始化给定大小的列表:

Simple loop with append :带有append简单循环:

my_list = []
for i in range(50):
    my_list.append(0)

Simple loop with += :+=简单循环:

my_list = []
for i in range(50):
    my_list += [0]

List comprehension:列表理解:

my_list = [0 for i in range(50)]

List and integer multiplication:列表和整数乘法:

my_list = [0] * 50

In these examples I don't think there would be any performance difference given that the lists have only 50 elements, but what if I need a list of a million elements?在这些示例中,鉴于列表只有 50 个元素,我认为不会有任何性能差异,但是如果我需要一百万个元素的列表怎么办? Would the use of xrange make any improvement?使用xrange会有所改善吗? Which is the preferred/fastest way to create and initialize lists in python?在 python 中创建和初始化列表的首选/最快方法是什么?

Let's run some time tests* with timeit.timeit :让我们用timeit.timeit运行一些时间测试*:

>>> from timeit import timeit
>>>
>>> # Test 1
>>> test = """
... my_list = []
... for i in xrange(50):
...     my_list.append(0)
... """
>>> timeit(test)
22.384258893239178
>>>
>>> # Test 2
>>> test = """
... my_list = []
... for i in xrange(50):
...     my_list += [0]
... """
>>> timeit(test)
34.494779364416445
>>>
>>> # Test 3
>>> test = "my_list = [0 for i in xrange(50)]"
>>> timeit(test)
9.490926919482774
>>>
>>> # Test 4
>>> test = "my_list = [0] * 50"
>>> timeit(test)
1.5340533503559755
>>>

As you can see above, the last method is the fastest by far.正如你在上面看到的,最后一种方法是迄今为止最快的。


However, it should only be used with immutable items (such as integers).但是,它应该用于不可变项(例如整数)。 This is because it will create a list with references to the same item.这是因为它将创建一个引用相同项目的列表。

Below is a demonstration:下面是一个演示:

>>> lst = [[]] * 3
>>> lst
[[], [], []]
>>> # The ids of the items in `lst` are the same
>>> id(lst[0])
28734408
>>> id(lst[1])
28734408
>>> id(lst[2])
28734408
>>>

This behavior is very often undesirable and can lead to bugs in the code.这种行为通常是不可取的,并可能导致代码中的错误。

If you have mutable items (such as lists), then you should use the still very fast list comprehension:如果您有可变项目(例如列表),那么您应该使用仍然非常快的列表推导式:

>>> lst = [[] for _ in xrange(3)]
>>> lst
[[], [], []]
>>> # The ids of the items in `lst` are different
>>> id(lst[0])
28796688
>>> id(lst[1])
28796648
>>> id(lst[2])
28736168
>>>

*Note: In all of the tests, I replaced range with xrange . *注意:在所有测试中,我都用xrange替换了range Since the latter returns an iterator, it should always be faster than the former.由于后者返回一个迭代器,它应该总是比前者快。

If you want to see the dependency with the length of the list n :如果要查看列表长度为n的依赖项:

Pure python纯蟒蛇

在此处输入图片说明

I tested for list length up to n=10000 and the behavior remains the same.我测试了最多 n=10000 的列表长度,并且行为保持不变。 So the integer multiplication method is the fastest with difference.所以整数乘法是最快的有差异的。

Numpy麻木

For lists with more than ~300 elements you should consider numpy .对于包含超过 300 个元素的列表,您应该考虑numpy

在此处输入图片说明

Benchmark code:基准代码:

import time

def timeit(f):

    def timed(*args, **kwargs):
        start = time.clock()
        for _ in range(100):
            f(*args, **kwargs)
        end = time.clock()
        return end - start
    return timed

@timeit
def append_loop(n):
    """Simple loop with append"""
    my_list = []
    for i in xrange(n):
        my_list.append(0)

@timeit
def add_loop(n):
    """Simple loop with +="""
    my_list = []
    for i in xrange(n):
        my_list += [0]

@timeit   
def list_comprehension(n):        
    """List comprehension"""
    my_list = [0 for i in xrange(n)]

@timeit
def integer_multiplication(n):
    """List and integer multiplication"""
    my_list = [0] * n


import numpy as np

@timeit
def numpy_array(n):
    my_list = np.zeros(n)
    

import pandas as pd 

df = pd.DataFrame([(integer_multiplication(n), numpy_array(n)) for n in range(1000)], 
                  columns=['Integer multiplication', 'Numpy array'])
df.plot()

Gist here .要点在这里

There is one more method which, while sounding weird, is handy in right curcumstances.还有一种方法,虽然听起来很奇怪,但在正确的姜黄中很方便。 If you need to produce the same list many times (initializing matrix for roguelike pathfinding and related stuff in my case), you can store a copy of the list in the tuple, then turn it to list when you need it.如果您需要多次生成相同的列表(在我的情况下为roguelike 寻路和相关内容初始化矩阵),您可以将列表的副本存储在元组中,然后在需要时将其转换为列表。 It is noticeably quicker than generating list via comprehensions and, unlike list multiplication, works with nested data structures.它明显比通过推导式生成列表快,并且与列表乘法不同,它适用于嵌套数据结构。

#  In class definition
def __init__(self):
    self.l = [[1000 for x in range(1000)] for y in range(1000)]
    self.t = tuple(self.l)

def some_method(self):
    self.l = list(self.t)
    self._do_fancy_computation()
    #  self.l is changed by this method

#  Later in code:
for a in range(10):
    obj.some_method()

Voila, on every iteration you have a fresh copy of the same list in no time!瞧,在每次迭代中,您都会立即获得相同列表的新副本!

Disclaimer:免责声明:

I do not have a slightest idea why is this so quick or whether it works anywhere outside CPython 3.4.我不知道为什么这么快,或者它是否可以在 CPython 3.4 之外的任何地方工作。

If you want to create a list incrementing, ie adding 1 every time, use the range function.如果要创建一个递增的列表,即每次加 1,请使用range函数。 In range the start argument is included and the end argument is excluded as shown below:range中包含 start 参数并排除 end 参数,如下所示:

list(range(10,20))
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

If you want to create a list by adding 2 to previous elements use this:如果您想通过向前面的元素添加 2 来创建列表,请使用以下命令:

list(range(10,20,2))
[10, 12, 14, 16, 18]

Here the third argument is the step size to be taken.这里的第三个参数是要采用的步长。 Now you can give any start element, end element and step size and create many lists fast and easy.现在,您可以提供任何开始元素、结束元素和步长,并快速轻松地创建许多列表。

Thank you..!谢谢..!

Happy Learning.. :)快乐学习.. :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM