简体   繁体   English

从列表或元组列表中选择随机对象的哪种方法更有效?

[英]Which is the more efficient way to choose a random pair of objects from a list of lists or tuples?

I have got a list of 2d coordinates with this structure: 我有一个这个结构的2d坐标列表:

coo = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0)]

Where coo[0] is the first coordinate stored in a tuple. 其中coo[0]是存储在元组中的第一个坐标。

I would like to choose two different random coordinates. 我想选择两个不同的随机坐标。 I can of course use this method: 我当然可以使用这种方法:

import numpy  as np
rndcoo1 = coo[np.random.randint(0,len(coo))]
rndcoo2 = coo[np.random.randint(0,len(coo))]
if rndcoo1 != rndcoo2:
     #do something

But because I have to repeat this operation 1'000'000 times I was wondering if there is a faster method to do that. 但是因为我必须重复这个操作1'000'000次,我想知道是否有更快的方法来做到这一点。 np.random.choice() can't be used for 2d array is there any alternative that I can use? np.random.choice()不能用于2d数组是否可以使用任何替代方案?

import random
result = random.sample(coo, 2)

will give you the expected output. 会给你预期的输出。 And it is (probably) as fast as you can get with Python. 而且(可能)和Python一样快。

Listed in this post is a vectorized approach that gets us a number of such random choices for a number of iterations in one go without looping through those many times of iterations. 本文中列出的是一种矢量化方法,它可以一次性为多次迭代提供许多这样的随机选择,而无需循环遍历那些多次迭代。 The idea uses np.argpartition and is inspired by this post . 这个想法使用了np.argpartition并受到this post启发。

Here's the implementation - 这是实施 -

def get_items(coo, num_items = 2, num_iter = 10):
    idx = np.random.rand(num_iter,len(coo)).argpartition(num_items,axis=1)[:,:2]
    return np.asarray(coo)[idx]

Please note that we would return a 3D array with the first dimension being the number of iterations, second dimension being the number of choices to be made at each iteration and the last dimension is the length of each tuple. 请注意,我们将返回一个3D数组,第一个维度是迭代次数,第二个维度是每次迭代时要做出的选择数,最后一个维度是每个元组的长度。

A sample run should present a bit more clearer picture - 样本运行应该呈现更清晰的图片 -

In [55]: coo = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0)]

In [56]: get_items(coo, 2, 5)
Out[56]: 
array([[[2, 0],
        [1, 1]],

       [[0, 0],
        [1, 1]],

       [[0, 2],
        [2, 0]],

       [[1, 1],
        [1, 0]],

       [[0, 2],
        [1, 1]]])

Runtime test comparing a loopy implementation with random.sample as listed in @freakish's post - 运行时测试比较循环实现与random.sample @freakish's post列出的random.sample -

In [52]: coo = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0)]

In [53]: %timeit [random.sample(coo, 2) for i in range(10000)]
10 loops, best of 3: 34.4 ms per loop

In [54]: %timeit get_items(coo, 2, 10000)
100 loops, best of 3: 2.81 ms per loop

Is coo just an example, or are your coordinates actually equally spaced? coo只是一个例子,或者你的坐标实际上是等距的? If so, you can just sample M 2D-coordinates like this: 如果是这样,您可以像这样采样M 2D坐标:

import numpy

N = 100
M = 1000000
coo = numpy.random.randint(0, N, size=(M, 2))

Of course you can also bias and scale the distribution using addition and multiplication to account for different step sizes and offsets. 当然,您还可以使用加法和乘法来偏差和缩放分布,以考虑不同的步长和偏移。

If you run into memory limitations with large M s, you can of course sample smaller sizes, or just one array of 2 values with size=2 . 如果遇到大M s的内存限制,您当然可以采用较小的大小,或者只是一个2个值size=2数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM