简体   繁体   English

如何从Python 3中的deque获取random.sample()?

[英]How to get random.sample() from deque in Python 3?

I have a collections.deque() of tuples from which I want to draw random samples. 我有一个collections.deque()元组,我想从中抽取随机样本。 In Python 2.7, I can use batch = random.sample(my_deque, batch_size) . 在Python 2.7中,我可以使用batch = random.sample(my_deque, batch_size)

But in Python 3.4 this raises TypeError: Population must be a sequence or set. For dicts, use list(d). 但是在Python 3.4中,这会引发TypeError: Population must be a sequence or set. For dicts, use list(d). TypeError: Population must be a sequence or set. For dicts, use list(d).

What's the best workaround, or recommended way to sample efficiently from a deque in Python 3? 什么是最好的解决方法,或推荐的方法从Python 3中的双端队列中有效地采样?

The obvious way – convert to a list. 显而易见的方式 - 转换为列表。

batch = random.sample(list(my_deque), batch_size))

But you can avoid creating an entire list. 但是你可以避免创建一个完整的列表。

idx_batch = set(sample(range(len(my_deque)), batch_size))
batch = [val for i, val in enumerate(my_deque) if i in idx_batch] 

PS (Edited) PS(已编辑)

Actually, random.sample should work fine with deques in Python >= 3.5. 实际上, random.sample应该可以在Python> = 3.5中使用deques。 because the class has been updated to match the Sequence interface. 因为类已更新以匹配Sequence接口。

In [3]: deq = collections.deque(range(100))

In [4]: random.sample(deq, 10)
Out[4]: [12, 64, 84, 77, 99, 69, 1, 93, 82, 35]

Note! 注意! as Geoffrey Irving has correctly stated in the comment bellow, you'd better convert the queue into a list, because queues are implemented as linked lists, making each index-access O(n) in the size of the queue, therefore sampling m random values will take O(m*n) time. 正如Geoffrey Irving在下面的评论中正确指出的那样,你最好将队列转换成一个列表,因为队列被实现为链表,使每个索引访问O(n)的队列大小,因此随机抽样m值将花费O(m * n)时间。

sample() on a deque works fine in Python ≥3.5, and it's pretty fast. 一个deque上的sample()在Python≥3.5中运行良好,而且速度非常快。

In Python 3.4, you could use this instead, which runs about as fast: 在Python 3.4中,您可以使用它,它运行速度快:

sample_indices = sample(range(len(deq)), 50)
[deq[index] for index in sample_indices]

On my MacBook using Python 3.6.8, this solution is over 44 times faster than Eli Korvigo's solution. 在使用Python 3.6.8的MacBook上,该解决方案比Eli Korvigo的解决方案快44倍。 :) :)

I used a deque with 1 million items, and I sampled 50 items: 我使用了一个有100万件物品的deque ,我抽样了50件物品:

from random import sample
from collections import deque

deq = deque(maxlen=1000000)
for i in range(1000000):
    deq.append(i)

sample_indices = set(sample(range(len(deq)), 50))

%timeit [deq[i] for i in sample_indices]
1.68 ms ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit sample(deq, 50)
1.94 ms ± 60.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit sample(range(len(deq)), 50)
44.9 µs ± 549 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit [val for index, val in enumerate(deq) if index in sample_indices]
75.1 ms ± 410 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

That said, as others have pointed out, a deque is not well suited for random access. 也就是说,正如其他人所指出的那样, deque并不适合随机访问。 If you want to implement a replay memory, you could instead use a rotating list like this: 如果要实现重放内存,可以使用如下旋转列表:

class ReplayMemory:
    def __init__(self, max_size):
        self.buffer = [None] * max_size
        self.max_size = max_size
        self.index = 0
        self.size = 0

    def append(self, obj):
        self.buffer[self.index] = obj
        self.size = min(self.size + 1, self.max_size)
        self.index = (self.index + 1) % self.max_size

    def sample(self, batch_size):
        indices = sample(range(self.size), batch_size)
        return [self.buffer[index] for index in indices]

With a million items, sampling 50 items is blazingly fast: 拥有一百万件物品,抽样50件物品非常快:

%timeit mem.sample(50)
#58 µs ± 691 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM