简体   繁体   English

当多个选择时在numpy数组中查找随机出现

[英]Finding a random occurence in a numpy array when more than one choice

I wrote this code to achieve the goal of returning a random value from a list of elements that are matching a predicate condition: 我编写了这段代码,以实现从与谓词条件匹配的元素列表中返回随机值的目标:

N=<int>
sampl = np.random.randint(low=0, high=N+1, size=(10,))
xs = np.where(sampl == 1)
ys = np.array([tuple(x) for x in xs], dtype=int)[0]
x = np.random.choice(ys)

Ex: If I run the code with N=2 and I am looking for only 1 s in the array: 例如:如果我以N=2运行代码,而我只在数组中寻找1 s:

    sampl = np.random.randint(low=0, high=N+1, size=(10,))

--> sampl = [2 1 0 0 0 1 0 0 2 1]

    xs = np.where(sampl == 1)

--> [2 1 0 0 0 1 0 0 2 1]  # Positions 1, 5, 9 are of interest. 
       ^       ^       ^ 

    ys = np.array([tuple(x) for x in xs], dtype=int)[0]

--> ys = [1 5 9] # Put them in an array. 

    x = np.random.choice(ys)

--> x = 9 # Pick a random one and return it

It works but it's not concise and I ran into a few issues trying to make it more elegant. 它可以工作,但是不够简洁,我遇到了一些问题,试图使其更加优雅。

  • numpy.where() returns a tuple when passing nothing else but a condition. numpy.where()在传递条件以外的其他条件时返回一个元组。 I tried passing x=sampl but the runtime complains saying that the function doesn't take params (it does when I inspect the code). 我尝试传递x=sampl但运行时抱怨说该函数不带参数(在我检查代码时会带参数)。
  • Again, making a numpy array from a tuple forces me to return the first element. 同样,从元组创建一个numpy数组会迫使我返回第一个元素。 That's error-prone when testing for edge-cases (such as no value found by predicate.) 在测试边缘情况时(例如谓词找不到值),这很容易出错。

Do you have any suggestions to improve this code? 您对改进此代码有任何建议吗? I want to stick to numpy/pandas as the arrays will become very big. 我想坚持使用numpy / pandas,因为数组会变得很大。

Probably the most elegant way I could think of is to randomly shuffle your array, and then pull off the first occurrence. 我想到的最优雅的方法可能是随机地对数组进行洗牌,然后关闭第一次出现的数组。 That should be pretty concise. 那应该很简洁。

So something like: 所以像这样:

np.random.shuffle(sampl)
x = np.ravel(np.where(sampl==1))[0]

or, like you suggested, without shuffling, that would look something like 或者,就像您建议的那样,在没有改组的情况下,看起来像

x = np.random.choice(np.ravel(np.where(sampl==1)))

On second thought, I guess the choice method will be infinitely faster than shuffling. 再次考虑,我认为choice方法将比改组更快。

The next issue is the edge cases. 下一个问题是边缘情况。 How to handle this depends on what you expect the default behavior to be. 如何处理此问题取决于您期望的默认行为。 If you expect that in most cases the condition will turn up at least one hit, then you should handle the case when there is not a hit with an exception: 如果您希望在大多数情况下条件会至少出现一个命中,那么应该在没有例外的情况下处理该情况,但例外情况是:

try: 
   x = np.random.choice(np.ravel(np.where(sampl==1)))
except: 
   # TODO
   pass

I would highly suggest doing this unless you rarely find a hit. 我强烈建议您这样做,除非您很少找到成功。 But don't take my word for it... time it yourself. 但是请不要相信我……自己动手做。

The other option would be to put in a condition that explicitly checks that 另一个选择是将条件置于明确检查

np.size( np.where(sampl==1) ) > 0

before continuing. 在继续之前。 However, I would guess that that approach is slower than the try...except approach. 但是,我想这种方法比try...except方法要慢。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM