简体   繁体   English

numpy数组中的快速值交换

[英]Fast value swapping in numpy array

So, this is something, that should be pretty easy, but it seems to take an enormous amount of time for me: I have a numpy array with only two values (example 0 and 255) and I want to invert the matrix in that way, that all values swap (0 becomes 255 and vice versa). 所以,这应该是相当容易的,但它似乎需要花费大量的时间给我:我有一个只有两个值的numpy数组(例如0和255)我想以这种方式反转矩阵,所有值交换(0变为255,反之亦然)。 The matrices are about 2000³ entries big, so this is serious work! 矩阵大约有2000个条目,所以这是认真的工作! I first tried the numpy.invert method, which is not exactly what I expected. 我首先尝试了numpy.invert方法,这不是我所期望的。 So I tried to do that myself by "storing" the values and then override them: 所以我试着通过“存储”值然后覆盖它们来自己做:

for i in range(array.length):
            array[i][array[i]==255]=1
            array[i][array[i]==0]=255
            array[i][array[i]==1]=0

which is behaving as expected, but taking a long time (I guess due to the for loop?). 这是表现出预期的,但需要很长时间(我猜是因为for循环?)。 Would that be faster if I implement that as a multithreaded calculation, where every thread "inverts" a smaller sub-array? 如果我将其实现为多线程计算,其中每个线程“反转”一个较小的子数组,那会更快吗? Or is there another way of doing that more conveniently? 还是有另一种方法可以更方便地做到这一点?

In addition to @JanneKarila's and @EOL's excellent suggestions, it's worthwhile to show a more efficient approach to using a mask to do the swap. 除了@ JanneKarila和@ EOL的优秀建议之外,值得展示一种更有效的方法来使用掩码进行交换。

Using a boolean mask is more generally useful if you have a more complex comparison than simply swapping two values, but your example uses it in a sub-optimal way. 如果比简单交换两个值有更复杂的比较,则使用布尔掩码通常更有用,但您的示例以次优方式使用它。

Currently, you're making multiple temporary copies of the boolean "mask" array (eg array[i] == blah ) in your example above and performing multiple assignments. 目前,您正在上面的示例中制作布尔“掩码”数组的多个临时副本(例如array[i] == blah )并执行多个赋值。 You can avoid this by just making the "mask" boolean array once and the inverting it. 你可以通过只做一次“掩码”布尔数组并反转它来避免这种情况。

If you have enough ram for a temporary copy (of bool dtype), try something like this: 如果你有足够的ram用于临时副本( bool dtype),请尝试这样的事情:

mask = (data == 255)
data[mask] = 0
data[~mask] = 255

Alternately (and equivalently) you could use numpy.where : 或者(等效地)你可以使用numpy.where

data = numpy.where(data == 255, 0, 255)

If you were using a loop to avoid making a full temporary copy, and need to conserve ram, adjust your loop to be something more like this: 如果您使用循环来避免制作完整的临时副本,并且需要保存ram,请将循环调整为更像这样的东西:

for i in range(len(array)):
     mask = (array[i] == 255)
     array[mask] = 0
     array[~mask] = 255

All that having been said, either subtraction or XOR is the way to go in this case, especially if you preform the operation in-place! 所有这一切,无论是减法还是XOR都是这种情况下的方法,特别是如果你就地进行操作!

要交换0和255,如果数据类型是整数类型之一,则可以使用XOR。

array ^= 255

You can simply do: 你可以简单地做:

arr_inverted = 255-arr

This converts all the elements one by one (255 gives 0 and 0 gives 255). 这将逐个转换所有元素(255表示0,0表示255)。 More generally, if you only have two values a and b, the "inversion" is simply done with (a+b)-arr . 更一般地说,如果你只有两个值a和b,那么“反转”就是用(a+b)-arr This also works if the two values are not integers (like floats or complex numbers). 如果两个值不是整数 (如浮点数或复数),这也适用

As Jaime pointed out, if memory is a concern subtract(255, arr, out=arr) swaps the values of arr in-place. 正如Jaime指出的那样,如果内存是一个问题,则subtract(255, arr, out=arr)arr的值交换为就地。

If you more generally have integers in your array, Janne Karila's XOR in-place solution has the advantage of being more concise than the difference in-place solution suggested above. 如果你的阵列中通常有整数 ,Janne Karila的XOR就地解决方案的优势在于比上面建议的差异就地解决方案更简洁。 It can be generalized as arr ^= (a^b) , for swapping two integers a and b . 它可以推广为arr ^= (a^b) ,用于交换两个整数ab

The execution times are similar between both methods (with a 200×200×200 array of uint8 integers, through IPython): 两种方法的执行时间相似(使用200×200×200的uint8整数数组,通过IPython):

>>> arr = np.random.choice((0, 255), (200, 200, 200)).astype('uint8')
>>> %timeit np.bitwise_xor(255, arr, out=arr)
100 loops, best of 3: 7.65 ms per loop
>>> %timeit np.subtract(255, arr, out=arr)
100 loops, best of 3: 7.69 ms per loop

If your array is of type uint8 , arr_inverted = ~a takes the same time, for swapping 0 and 255 (the ~ operator inverts all the bits), and is less general, so it's not worth it (tested with a 200×200×200 array). 如果您的数组类型为uint8 ,则arr_inverted = ~a需要相同的时间,用于交换0和255( ~运算符反转所有位),并且不太通用,因此它不值得(使用200×200×测试) 200阵)。

"I first tried the numpy.invert method, which is not exactly what I expected." “我首先尝试了numpy.invert方法,这并不完全符合我的预期。”

Numpy.invert is exactly what you need. Numpy.invert正是您所需要的。 Can you describe what happened? 你能描述一下发生了什么吗 Did you use an unsigned byte for storage rather than a signed datatype or an integer? 您是否使用无符号字节进行存储而不是有符号数据类型或整数?

Unsigned byte + numpy.invert should do exactly what you want. 无符号字节+ numpy.invert应该完全符合你的要求。

[You should also see faster performance in numpy with unsigned bytes rather than longer or signed datatypes] [您还应该看到numpy中使用无符号字节而不是更长或有符号数据类型的更快性能]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM