简体   繁体   English

按多个轴排序2D numpy数组

[英]Sorting a 2D numpy array by multiple axes

I have a 2D numpy array of shape (N,2) which is holding N points (x and y coordinates). 我有一个2D numpy形状的阵列(N,2),它持有N个点(x和y坐标)。 For example: 例如:

array([[3, 2],
       [6, 2],
       [3, 6],
       [3, 4],
       [5, 3]])

I'd like to sort it such that my points are ordered by x-coordinate, and then by y in cases where the x coordinate is the same. 我想对它进行排序,使得我的点按x坐标排序,然后在x坐标相同的情况下按y排序。 So the array above should look like this: 所以上面的数组应该如下所示:

array([[3, 2],
       [3, 4],
       [3, 6],
       [5, 3],
       [6, 2]])

If this was a normal Python list, I would simply define a comparator to do what I want, but as far as I can tell, numpy's sort function doesn't accept user-defined comparators. 如果这是一个普通的Python列表,我只需要定义一个比较器来做我想要的,但据我所知,numpy的sort函数不接受用户定义的比较器。 Any ideas? 有任何想法吗?


EDIT: Thanks for the ideas! 编辑:感谢您的想法! I set up a quick test case with 1000000 random integer points, and benchmarked the ones that I could run (sorry, can't upgrade numpy at the moment). 我设置了一个包含1000000个随机整数点的快速测试用例,并对我可以运行的那些进行了基准测试(抱歉,目前无法升级numpy)。

Mine:   4.078 secs 
mtrw:   7.046 secs
unutbu: 0.453 secs

Using lexsort : 使用lexsort

import numpy as np    
a = np.array([(3, 2), (6, 2), (3, 6), (3, 4), (5, 3)])

ind = np.lexsort((a[:,1],a[:,0]))    

a[ind]
# array([[3, 2],
#       [3, 4],
#       [3, 6],
#       [5, 3],
#       [6, 2]])

a.ravel() returns a view if a is C_CONTIGUOUS . 如果aC_CONTIGUOUSa.ravel()返回一个视图。 If that is true, @ars's method , slightly modifed by using ravel instead of flatten , yields a nice way to sort a in-place : 如果这是真的, @ ARS的方法 ,稍微用体改ravel ,而不是flatten ,产生一个很好的方式来排序a 就地

a = np.array([(3, 2), (6, 2), (3, 6), (3, 4), (5, 3)])
dt = [('col1', a.dtype),('col2', a.dtype)]
assert a.flags['C_CONTIGUOUS']
b = a.ravel().view(dt)
b.sort(order=['col1','col2'])

Since b is a view of a , sorting b sorts a as well: 由于b是的视图a ,排序b排序a ,以及:

print(a)
# [[3 2]
#  [3 4]
#  [3 6]
#  [5 3]
#  [6 2]]

The title says "sorting 2D arrays". 标题写着“排序2D数组”。 Although the questioner uses an (N,2) -shaped array, it's possible to generalize unutbu's solution to work with any (N,M) array, as that's what people might actually be looking for. 虽然提问者使用(N,2)形数组,但是可以将unutbu的解决方案推广到任何(N,M)数组,因为这是人们可能真正想要的。

One could transpose the array and use slice notation with negative step to pass all the columns to lexsort in reversed order: 可以transpose数组并使用带有负step切片表示法将所有列以相反的顺序传递给lexsort

>>> import numpy as np
>>> a = np.random.randint(1, 6, (10, 3))
>>> a
array([[4, 2, 3],
       [4, 2, 5],
       [3, 5, 5],
       [1, 5, 5],
       [3, 2, 1],
       [5, 2, 2],
       [3, 2, 3],
       [4, 3, 4],
       [3, 4, 1],
       [5, 3, 4]])

>>> a[np.lexsort(np.transpose(a)[::-1])]
array([[1, 5, 5],
       [3, 2, 1],
       [3, 2, 3],
       [3, 4, 1],
       [3, 5, 5],
       [4, 2, 3],
       [4, 2, 5],
       [4, 3, 4],
       [5, 2, 2],
       [5, 3, 4]])

The numpy_indexed package (disclaimer: I am its author) can be used to solve these kind of processing-on-nd-array problems in an efficient fully vectorized manner: numpy_indexed包(免责声明:我是它的作者)可用于以高效的完全矢量化方式解决这类处理和阵列问题:

import numpy_indexed as npi
npi.sort(a)  # by default along axis=0, but configurable

I was struggling with the same thing and just got help and solved the problem. 我正在努力做同样的事情,只是得到了帮助并解决了问题。 It works smoothly if your array have column names (structured array) and I think this is a very simple way to sort using the same logic that excel does: 如果您的数组具有列名(结构化数组),它可以顺利运行,我认为这是使用与excel相同的逻辑进行排序的一种非常简单的方法:

array_name[array_name[['colname1','colname2']].argsort()]

Note the double-brackets enclosing the sorting criteria. 请注意包含排序条件的双括号。 And off course, you can use more than 2 columns as sorting criteria. 当然,您可以使用超过2列作为排序标准。

You can use np.complex_sort . 您可以使用np.complex_sort This has the side effect of changing your data to floating point, I hope that's not a problem: 这有将数据更改为浮点的副作用,我希望这不是问题:

>>> a = np.array([[3, 2], [6, 2], [3, 6], [3, 4], [5, 3]])
>>> atmp = np.sort_complex(a[:,0] + a[:,1]*1j)
>>> b = np.array([[np.real(x), np.imag(x)] for x in atmp])
>>> b
array([[ 3.,  2.],
       [ 3.,  4.],
       [ 3.,  6.],
       [ 5.,  3.],
       [ 6.,  2.]])

EDIT: removed bad answer. 编辑:删除了错误的答案。

Here's one way to do it using an intermediate structured array: 这是使用中间结构化数组执行此操作的一种方法:

from numpy import array

a = array([[3, 2], [6, 2], [3, 6], [3, 4], [5, 3]])

b = a.flatten()
b.dtype = [('x', '<i4'), ('y', '<i4')]
b.sort()
b.dtype = '<i4'
b.shape = a.shape

print b

which gives the desired output: 它给出了所需的输出:

[[3 2]
 [3 4]
 [3 6]
 [5 3]
 [6 2]]

Not sure if this is quite the best way to go about it though. 不知道这是不是最好的方法。

I found one way to do it: 我找到了一种方法:

from numpy import array
a = array([(3,2),(6,2),(3,6),(3,4),(5,3)])
array(sorted(sorted(a,key=lambda e:e[1]),key=lambda e:e[0]))

It's pretty terrible to have to sort twice (and use the plain python sorted function instead of a faster numpy sort), but it does fit nicely on one line. 必须排序两次(并使用普通的python sorted函数而不是更快的numpy排序)非常糟糕,但它确实非常适合一行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM