[英]Elegant 2D numpy indexing with multiple conditions
I have an array of numbers that I wish to turn into dummy variables (ie arrays with 1 if condition is met, 0 if otherwise).我有一个我希望变成虚拟变量的数字数组(即,如果满足条件,则数组为 1,否则为 0)。 However, the conditions can be numerous and I was wondering if there was a more elegant solution than what I'm using.
但是,条件可能很多,我想知道是否有比我使用的更优雅的解决方案。
arr = np.random.randint(0, 50, size=(100, 100))
# What I'm doing
dummy = np.zeros(arr.shape)
dummy[np.where(np.logical_or.reduce((arr== 10, arr== 15, arr==16, arr==17)))] = 1
In the example, every value that is 10, 15, 16, or 17 becomes a one else a zero.在该示例中,每个值为 10、15、16 或 17 的值都变为一个零。 For some dummy variables I have 10+ conditions and the expression can get lengthy, so I'm looking for something cleaner.
对于一些虚拟变量,我有 10 多个条件,表达式可能会变得冗长,所以我正在寻找更简洁的东西。 I tried something like this but got a ValueError.
我试过这样的事情,但得到了一个 ValueError。
dummy= [1 if x in [10, 15, 16, 17] else 0 for x in arr]
您可以使用np.select
arr = np.select([arr==10, arr==15,arr==16,arr==17],[1,1,1,1],0)
Readability is a bit in the eye of the beholder, but two different ways to use the 1 in ... else 0
construct is to use a double for
loop over the elements of the rows of the matrix as可读性是在旁观者的眼睛有点,但两种不同的方式使用
1 in ... else 0
结构是使用双for
在矩阵的行为的元素循环
dummy = [[1 if x in [10, 15, 16, 17] else 0 for x in row] for row in arr]
The output of this is not an np.array
(matrix) though but rather a list
of list
s.不过,它的输出不是
np.array
(矩阵),而是一个list
list
。 Another way to do it, which "hides" the double for
loop is to use np.vectorize
as另一种方式来做到这一点,其中“皮”的双重
for
循环是使用np.vectorize
作为
dummy_func = np.vectorize(lambda x: 1 if x in [10, 15, 16, 17] else 0)
dummy = dummy_func(arr)
or as a one-liner as或作为单线
dummy = np.vectorize(lambda x: 1 if x in [10, 15, 16, 17] else 0)(arr)
Of these I would probably go for the vectorized approach to keep the data type as an np.array
as this is most often a more reasonable choice.其中,我可能会采用矢量化方法将数据类型保留为
np.array
因为这通常是更合理的选择。 And, even if I showed it as possible one-liner, I would still think it would be better to first define the function and then apply it on two different lines.而且,即使我将它显示为可能的单行,我仍然认为最好先定义函数,然后将其应用于两条不同的行。
It should be noted though, that vectorize is basically just a double for
loop, so the execution is rather slow compared to other numpy
functions.不过应该注意的是,vectorize 基本上只是一个双
for
循环,因此与其他numpy
函数相比,执行速度相当慢。 I wouldn't be surprised if there are other ways as well, that might be able to use the built-in parallel computational behavior of numpy
but then it is again a trade off between readability (intent) and speed.如果还有其他方法,我也不会感到惊讶,这可能能够使用
numpy
的内置并行计算行为,但这又是可读性(意图)和速度之间的权衡。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.