简体   繁体   English

获取Python中无限生成器的子集列表

[英]Get list of subset from infinite generator in Python

Summary : I'm trying to learn about itertools.islice . 简介 :我正在尝试了解itertools.islice


I'm trying to find the best way to get a list made up of a subset of the returns from an infinite generator function. 我正在尝试找到一个最佳方法来获取由无限生成器函数返回的子集组成的列表。 For example, I could want a list of the 1000th through 2000th item from a generator. 例如,我可能想要一个生成器的第1000到第2000项的列表。

This is my example generator: 这是我的示例生成器:

def infinite_counter():
    i = 0
    while True:
        i += 2
        yield i

These values are the return index from the generator that I want the list to start and stop: 这些值是我希望列表开始和停止的生成器的返回索引:

start = 1000
end = 2000

Method 1 : list comprehension (fails) 方法1 :列表理解(失败)

[val for ind,val in enumerate(infinite_counter()) if start <= ind <= end ]

This will quite obviously never return, when you expand into this: 当你扩展到这个时,这显然永远不会回归:

for ind, val in enumerate(infinite_counter()):
    if start < ind < end:
       val

Method 2 : list() (works) 方法2 :list() (有效)

list(next(iter([])) if ind > end else val for ind,val in enumerate(infinite_counter()) if ind >= start)

This works, but really feels like a hack. 这有效,但真的感觉像是黑客。 It is also quite hard to follow, however I mistakenly thought it would be faster than Method 3. 这也很难遵循,但我错误地认为它会比方法3更快。

Method 3 : easy method (works) 方法3 :简单的方法(工作)

my_list = []
for ind,val in enumerate(infinite_counter()):
    if ind >= start:
        my_list.append(val)
        if ind >= end:
            break

This is the first way I would think of doing this, before I chided my self from not being pythonic. 这是我想到这样做的第一种方式,然后我责备自己不是pythonic。 I was surprised that this was almost exactly the same as Method 2 in timing. 我很惊讶这与时间方法2几乎完全相同。

Method 4 : itertools.takewhile (works) 方法4 :itertools.takewhile (工作)

[val for ind,val in itertools.takewhile(lambda tup: tup[0] < end, enumerate(infinite_counter())) if ind > start]

At first, I thought takewhile didn't work as I had the lambda as "lambda ind,val:". 起初,我认为takewhile没有工作,因为我有lambda作为“lambda ind,val:”。 But it gives the lambda a tuple of the two values. 但它给lambda提供了两个值的元组。 I just need to take the first term in the tuple as the index for early exit. 我只需要将元组中的第一个术语作为提前退出的索引。 This is slower than Method 2 and 3, and almost as slow as Method 5. 这比方法2和3慢,几乎和方法5一样慢。

Method 5 : wrapping generator (works) 方法5 :包装发电机(工程)

def top_ending_generator(end):
    for ind,val in enumerate(infinite_counter()):
        if ind > end:
            break
        yield ind,val

[val for ind,val in top_ending_generator(end) if ind > start]

This is, as expected, considerably slower than methods 2 and 3. 正如预期的那样,这比方法2和3慢得多。

Overall, I was surprised to see timing of Method 3 to be very close to timing of Method 2. It is more code, but much easier for someone to follow. 总的来说,我很惊讶地发现方法3的时间非常接近方法2的时间。这是更多的代码,但更容易让人跟进。 This is currently how i have this implemented 目前这是我实现的方式

Are there any other methods that I should consider or better solutions for this? 我还应该考虑其他任何方法或更好的解决方案吗?

Edit: 编辑:

Method 6 itertools.islice (the winner) 方法6 itertools.islice (获胜者)

list(itertools.islice(infinite_counter(), start, end))

This is slightly faster than my initial itertools.islice solution with list comprehension: 这比我最初的具有列表推导的itertools.islice解决方案略快一些:

[val for val in itertools.islice(infinite_counter(), start_ind, end_ind)]

Amazing what finding the right method does. 什么找到正确的方法做的很棒。

For those keeping score, my timing found the following: 对于那些得分,我的时间发现如下:

Method 6 = unit time 方法6 =单位时间

Method 2 ~= 2.5 * unit time 方法2~ = 2.5 *单位时间

Method 3 ~= 3 * unit time 方法3~ = 3 *单位时间

Method 4 ~= 4.2 * unit time 方法4~ = 4.2 *单位时间

Method 5 ~= 4 * unit time 方法5~ = 4 *单位时间

from itertools import islice

list(islice(infinite_counter(), 1000, 2000))

Note that this 请注意这一点

list(next(iter([])) if ind > end else val for ind,val in enumerate(infinite_counter()) if ind >= start)

transforms to this 转变为此

def _secret():
    for ind, val in enumerate(infinite_counter()):
        if ind >= start:
            if ind > end:
                yield list(next(iter([])))

            else:
                yield val

list(_secret())

which is easily improvable to 这很容易改进

def _secret():
    for ind, val in enumerate(infinite_counter()):
        if ind < start:
            continue

        if ind > end:
            break

        yield val

list(_secret())

which looks fine to me. 这对我来说很好看。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM