简体   繁体   English

将迭代器转换为列表的最快方法

[英]Fastest way to convert an iterator to a list

拥有iterator对象,是否有比列表理解更快、更好或更正确的方法来获取迭代器返回的对象列表?

user_list = [user for user in user_iterator]
list(your_iterator)

since python 3.5 you can use * iterable unpacking operator:python 3.5 开始,您可以使用*可迭代解包运算符:

user_list = [*your_iterator]

but the pythonic way to do it is:但是pythonic的方法是:

user_list  = list(your_iterator)

@Robino was suggesting to add some tests which make sense, so here is a simple benchmark between 3 possible ways (maybe the most used ones) to convert an iterator to a list: @Robino 建议添加一些有意义的测试,因此这是将迭代器转换为列表的 3 种可能方法(可能是最常用的方法)之间的简单基准测试:

  1. by type constructor按类型构造函数

list(my_iterator)

  1. by unpacking通过开箱

[*my_iterator]

  1. using list comprehension使用列表理解

[e for e in my_iterator]

I have been using simple_bechmark library我一直在使用simple_bechmark

from simple_benchmark import BenchmarkBuilder
from heapq import nsmallest

b = BenchmarkBuilder()

@b.add_function()
def convert_by_type_constructor(size):
    list(iter(range(size)))

@b.add_function()
def convert_by_list_comprehension(size):
    [e for e in iter(range(size))]

@b.add_function()
def convert_by_unpacking(size):
    [*iter(range(size))]


@b.add_arguments('Convert an iterator to a list')
def argument_provider():
    for exp in range(2, 22):
        size = 2**exp
        yield size, size

r = b.run()
r.plot()

在此处输入图片说明

As you can see there is very hard to make a difference between conversion by the constructor and conversion by unpacking, conversion by list comprehension is the “slowest” approach.正如您所看到的,很难区分构造函数的转换和解包的转换,列表推导式的转换是“最慢”的方法。


I have been testing also across different Python versions (3.6, 3.7, 3.8, 3.9) by using the following simple script:我还使用以下简单脚本在不同的 Python 版本(3.6、3.7、3.8、3.9)上进行了测试:

import argparse
import timeit

parser = argparse.ArgumentParser(
    description='Test convert iterator to list')
parser.add_argument(
    '--size', help='The number of elements from iterator')

args = parser.parse_args()

size = int(args.size)
repeat_number = 10000

# do not wait too much if the size is too big
if size > 10000:
    repeat_number = 100


def test_convert_by_type_constructor():
    list(iter(range(size)))


def test_convert_by_list_comprehension():
    [e for e in iter(range(size))]


def test_convert_by_unpacking():
    [*iter(range(size))]


def get_avg_time_in_ms(func):
    avg_time = timeit.timeit(func, number=repeat_number) * 1000 / repeat_number
    return round(avg_time, 6)


funcs = [test_convert_by_type_constructor,
         test_convert_by_unpacking, test_convert_by_list_comprehension]

print(*map(get_avg_time_in_ms, funcs))

The script will be executed via a subprocess from a Jupyter Notebook (or a script), the size parameter will be passed through command-line arguments and the script results will be taken from standard output.脚本将通过 Jupyter Notebook(或脚本)的子进程执行,大小参数将通过命令行参数传递,脚本结果将从标准输出中获取。

from subprocess import PIPE, run

import pandas

simple_data = {'constructor': [], 'unpacking': [], 'comprehension': [],
        'size': [], 'python version': []}


size_test = 100, 1000, 10_000, 100_000, 1_000_000
for version in ['3.6', '3.7', '3.8', '3.9']:
    print('test for python', version)
    for size in size_test:
        command = [f'python{version}', 'perf_test_convert_iterator.py', f'--size={size}']
        result = run(command, stdout=PIPE, stderr=PIPE, universal_newlines=True)
        constructor, unpacking,  comprehension = result.stdout.split()
        
        simple_data['constructor'].append(float(constructor))
        simple_data['unpacking'].append(float(unpacking))
        simple_data['comprehension'].append(float(comprehension))
        simple_data['python version'].append(version)
        simple_data['size'].append(size)

df_ = pandas.DataFrame(simple_data)
df_

在此处输入图片说明

You can get my full notebook from here .你可以从这里得到我完整的笔记本。

In most of the cases, in my tests, unpacking shows to be faster, but the difference is so small that the results may change from a run to the other.在大多数情况下,在我的测试中,解包显示速度更快,但差异非常小,以至于结果可能会从一次运行更改为另一次运行。 Again, the comprehension approach is the slowest, in fact, the other 2 methods are up to ~ 60% faster.同样,理解方法是最慢的,实际上,其他 2 种方法最多快 60%。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM