简体   繁体   中英

Numpy: Remove neighboring repeated subarrays in a 2x2 array?

Alright, Im new to Numpy but I cant figure this one out so turning it over to the experts. I have a 2x2 table-array like the one below, and I want to "sequentially uniqify" the array. Sequence matters so if there is more than one of the same row arrays next to each other then they are reduntant and should be excluded (the sequence of the row-arrays also matters so [111,222] are to be considered different [222,111]). Framed in another way, I only want to keep the row-arrays whose left or right neighbor (or top/bottom as it looks like below) is different from itself (marked by * in the example below).

[[[492 105]
  [492 105]
  [492 105]*
  [492 106]*
  [492 106]
  [492 106]
  [491 106]*
  [491 106]
  [491 105]*
  [491 105]
  [491 105]
  [492 105]*
  [492 105]
  [492 105]]]

I tried the numpy.uniquify function but that didnt care about the fact that I had a 2x2 array and instead returned each unique number inside each sub-array in a flat list which I dont want, and it sorted and changed the order of my original arrays which I also dont want.

With a simple for-loop I could have easily written out the logic of this, but I need this to be optimized at Numpy speed. The closest Ive gotten is to return a trutharray marking the spots where the left-neighbour is different, which seems to be working:

MYARRAY = numpy.matrix(  my2x2array  )
indexes = numpy.arange(len(MYARRAY))
trutharray = numpy.any(MYARRAY[indexes]!=MYARRAY[indexes-1], 1)

However, Im not sure how to proceed and what to do with the trutharray. Tried serving the trutharray to the numpy.extract function but this only returns a flat list of each subarray and doesnt even return all the elements that it should; in the case of my example it returned "[105 492 492 106]".

Any help? How can I proceed with my example and end up with unique sequential subarrays? Or are there any faster solutions for this problem? Numpy is very confusing to me at this stage :p

I guess something like this:

>>> a=array( [[492, 105],
...   [492, 105],
...   [492, 105],
...   [492, 106],
...   [492, 106],
...   [492, 106],
...   [491, 106],
...   [491, 106],
...   [491, 105],
...   [491, 105],
...   [491, 105],
...   [492, 105],
...   [492, 105],
...   [492, 105]]
... )
>>> g_idx=any(a[1:]!=a[:-1], axis=1)
>>> vstack((a[:-1][g_idx][0], a[1:][g_idx]))
array([[492, 105],
       [492, 106],
       [491, 106],
       [491, 105],
       [492, 105]])

That a[:-1][g_idx][0] is necessary or otherwise the first element will be missing.

This may be marginally faster than the solution posted above; it eliminates an unnecessary array creation by preallocating it, and it treats the loop over the last axis to check if all elements are different in a faster manner with some voidview voodoo. But unless im missing something, the posted solution should be close to optimal, and rather trivial; I have a hard time believing that this is indeed the bottleneck in your code.

import numpy as np

a = np.array([[492, 105],
  [492, 105],
  [492, 105],
  [492, 106],
  [492, 106],
  [492, 106],
  [491, 106],
  [491, 106],
  [491, 105],
  [491, 105],
  [491, 105],
  [492, 105],
  [492, 105],
  [492, 105]])


def voidview(arr):
    """view the last axis as a void object."""
    return np.ascontiguousarray(arr).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1]))).reshape(arr.shape[:-1])

q = voidview(a)
I = np.empty(len(q), np.bool)
I[:-1] = q[1:]!=q[:-1]
I[-1] = True
print a[I]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM