简体   繁体   English

具有多种条件的优雅 2D numpy 索引

[英]Elegant 2D numpy indexing with multiple conditions

I have an array of numbers that I wish to turn into dummy variables (ie arrays with 1 if condition is met, 0 if otherwise).我有一个我希望变成虚拟变量的数字数组(即,如果满足条件,则数组为 1,否则为 0)。 However, the conditions can be numerous and I was wondering if there was a more elegant solution than what I'm using.但是,条件可能很多,我想知道是否有比我使用的更优雅的解决方案。

arr = np.random.randint(0, 50, size=(100, 100))

# What I'm doing

dummy = np.zeros(arr.shape)
dummy[np.where(np.logical_or.reduce((arr== 10, arr== 15, arr==16, arr==17)))] = 1

In the example, every value that is 10, 15, 16, or 17 becomes a one else a zero.在该示例中,每个值为 10、15、16 或 17 的值都变为一个零。 For some dummy variables I have 10+ conditions and the expression can get lengthy, so I'm looking for something cleaner.对于一些虚拟变量,我有 10 多个条件,表达式可能会变得冗长,所以我正在寻找更简洁的东西。 I tried something like this but got a ValueError.我试过这样的事情,但得到了一个 ValueError。

dummy= [1 if x in [10, 15, 16, 17] else 0 for x in arr]

您可以使用np.select

arr = np.select([arr==10, arr==15,arr==16,arr==17],[1,1,1,1],0)

Readability is a bit in the eye of the beholder, but two different ways to use the 1 in ... else 0 construct is to use a double for loop over the elements of the rows of the matrix as可读性是在旁观者的眼睛有点,但两种不同的方式使用1 in ... else 0结构是使用双for在矩阵的行为的元素循环

dummy = [[1 if x in [10, 15, 16, 17] else 0 for x in row] for row in arr]

The output of this is not an np.array (matrix) though but rather a list of list s.不过,它的输出不是np.array (矩阵),而是一个list list Another way to do it, which "hides" the double for loop is to use np.vectorize as另一种方式来做到这一点,其中“皮”的双重for循环是使用np.vectorize作为

dummy_func = np.vectorize(lambda x: 1 if x in [10, 15, 16, 17] else 0)
dummy = dummy_func(arr)

or as a one-liner as或作为单线

dummy = np.vectorize(lambda x: 1 if x in [10, 15, 16, 17] else 0)(arr)

Of these I would probably go for the vectorized approach to keep the data type as an np.array as this is most often a more reasonable choice.其中,可能会采用矢量化方法将数据类型保留为np.array因为这通常是更合理的选择。 And, even if I showed it as possible one-liner, I would still think it would be better to first define the function and then apply it on two different lines.而且,即使我将它显示为可能的单行,我仍然认为最好先定义函数,然后将其应用于两条不同的行。

It should be noted though, that vectorize is basically just a double for loop, so the execution is rather slow compared to other numpy functions.不过应该注意的是,vectorize 基本上只是一个双for循环,因此与其他numpy函数相比,执行速度相当慢。 I wouldn't be surprised if there are other ways as well, that might be able to use the built-in parallel computational behavior of numpy but then it is again a trade off between readability (intent) and speed.如果还有其他方法,我也不会感到惊讶,这可能能够使用numpy的内置并行计算行为,但这又是可读性(意图)和速度之间的权衡。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM