[英]Sorting a 2D numpy array by multiple axes
I have a 2D numpy array of shape (N,2) which is holding N points (x and y coordinates). 我有一个2D numpy形状的阵列(N,2),它持有N个点(x和y坐标)。 For example: 例如:
array([[3, 2],
[6, 2],
[3, 6],
[3, 4],
[5, 3]])
I'd like to sort it such that my points are ordered by x-coordinate, and then by y in cases where the x coordinate is the same. 我想对它进行排序,使得我的点按x坐标排序,然后在x坐标相同的情况下按y排序。 So the array above should look like this: 所以上面的数组应该如下所示:
array([[3, 2],
[3, 4],
[3, 6],
[5, 3],
[6, 2]])
If this was a normal Python list, I would simply define a comparator to do what I want, but as far as I can tell, numpy's sort function doesn't accept user-defined comparators. 如果这是一个普通的Python列表,我只需要定义一个比较器来做我想要的,但据我所知,numpy的sort函数不接受用户定义的比较器。 Any ideas? 有任何想法吗?
EDIT: Thanks for the ideas! 编辑:感谢您的想法! I set up a quick test case with 1000000 random integer points, and benchmarked the ones that I could run (sorry, can't upgrade numpy at the moment). 我设置了一个包含1000000个随机整数点的快速测试用例,并对我可以运行的那些进行了基准测试(抱歉,目前无法升级numpy)。
Mine: 4.078 secs
mtrw: 7.046 secs
unutbu: 0.453 secs
import numpy as np
a = np.array([(3, 2), (6, 2), (3, 6), (3, 4), (5, 3)])
ind = np.lexsort((a[:,1],a[:,0]))
a[ind]
# array([[3, 2],
# [3, 4],
# [3, 6],
# [5, 3],
# [6, 2]])
a.ravel()
returns a view if a
is C_CONTIGUOUS
. 如果a
是C_CONTIGUOUS
则a.ravel()
返回一个视图。 If that is true, @ars's method , slightly modifed by using ravel
instead of flatten
, yields a nice way to sort a
in-place : 如果这是真的, @ ARS的方法 ,稍微用体改ravel
,而不是flatten
,产生一个很好的方式来排序a
就地 :
a = np.array([(3, 2), (6, 2), (3, 6), (3, 4), (5, 3)])
dt = [('col1', a.dtype),('col2', a.dtype)]
assert a.flags['C_CONTIGUOUS']
b = a.ravel().view(dt)
b.sort(order=['col1','col2'])
Since b
is a view of a
, sorting b
sorts a
as well: 由于b
是的视图a
,排序b
排序a
,以及:
print(a)
# [[3 2]
# [3 4]
# [3 6]
# [5 3]
# [6 2]]
The title says "sorting 2D arrays". 标题写着“排序2D数组”。 Although the questioner uses an (N,2)
-shaped array, it's possible to generalize unutbu's solution to work with any (N,M)
array, as that's what people might actually be looking for. 虽然提问者使用(N,2)
形数组,但是可以将unutbu的解决方案推广到任何(N,M)
数组,因为这是人们可能真正想要的。
One could transpose
the array and use slice notation with negative step
to pass all the columns to lexsort
in reversed order: 可以transpose
数组并使用带有负step
切片表示法将所有列以相反的顺序传递给lexsort
:
>>> import numpy as np
>>> a = np.random.randint(1, 6, (10, 3))
>>> a
array([[4, 2, 3],
[4, 2, 5],
[3, 5, 5],
[1, 5, 5],
[3, 2, 1],
[5, 2, 2],
[3, 2, 3],
[4, 3, 4],
[3, 4, 1],
[5, 3, 4]])
>>> a[np.lexsort(np.transpose(a)[::-1])]
array([[1, 5, 5],
[3, 2, 1],
[3, 2, 3],
[3, 4, 1],
[3, 5, 5],
[4, 2, 3],
[4, 2, 5],
[4, 3, 4],
[5, 2, 2],
[5, 3, 4]])
The numpy_indexed package (disclaimer: I am its author) can be used to solve these kind of processing-on-nd-array problems in an efficient fully vectorized manner: numpy_indexed包(免责声明:我是它的作者)可用于以高效的完全矢量化方式解决这类处理和阵列问题:
import numpy_indexed as npi
npi.sort(a) # by default along axis=0, but configurable
I was struggling with the same thing and just got help and solved the problem. 我正在努力做同样的事情,只是得到了帮助并解决了问题。 It works smoothly if your array have column names (structured array) and I think this is a very simple way to sort using the same logic that excel does: 如果您的数组具有列名(结构化数组),它可以顺利运行,我认为这是使用与excel相同的逻辑进行排序的一种非常简单的方法:
array_name[array_name[['colname1','colname2']].argsort()]
Note the double-brackets enclosing the sorting criteria. 请注意包含排序条件的双括号。 And off course, you can use more than 2 columns as sorting criteria. 当然,您可以使用超过2列作为排序标准。
You can use np.complex_sort
. 您可以使用np.complex_sort
。 This has the side effect of changing your data to floating point, I hope that's not a problem: 这有将数据更改为浮点的副作用,我希望这不是问题:
>>> a = np.array([[3, 2], [6, 2], [3, 6], [3, 4], [5, 3]])
>>> atmp = np.sort_complex(a[:,0] + a[:,1]*1j)
>>> b = np.array([[np.real(x), np.imag(x)] for x in atmp])
>>> b
array([[ 3., 2.],
[ 3., 4.],
[ 3., 6.],
[ 5., 3.],
[ 6., 2.]])
EDIT: removed bad answer. 编辑:删除了错误的答案。
Here's one way to do it using an intermediate structured array: 这是使用中间结构化数组执行此操作的一种方法:
from numpy import array
a = array([[3, 2], [6, 2], [3, 6], [3, 4], [5, 3]])
b = a.flatten()
b.dtype = [('x', '<i4'), ('y', '<i4')]
b.sort()
b.dtype = '<i4'
b.shape = a.shape
print b
which gives the desired output: 它给出了所需的输出:
[[3 2]
[3 4]
[3 6]
[5 3]
[6 2]]
Not sure if this is quite the best way to go about it though. 不知道这是不是最好的方法。
I found one way to do it: 我找到了一种方法:
from numpy import array
a = array([(3,2),(6,2),(3,6),(3,4),(5,3)])
array(sorted(sorted(a,key=lambda e:e[1]),key=lambda e:e[0]))
It's pretty terrible to have to sort twice (and use the plain python sorted
function instead of a faster numpy sort), but it does fit nicely on one line. 必须排序两次(并使用普通的python sorted
函数而不是更快的numpy排序)非常糟糕,但它确实非常适合一行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.