I am mainly interested in 2D arrays of shape Nx3 but the issue appears in arrays of shapes Nxm where m>1 as well. Specifically, I would like to sort an Nx3 array first based on its first column, then second, and finally third. So, assuming that we have array k
given as
array([[0.90625, 0.90625, 0.15625],
[0.40625, 0.40625, 0.15625],
[0.40625, 0.90625, 0.65625],
[0.15625, 0.90625, 0.40625],
[0.90625, 0.40625, 0.90625],
[0.40625, 0.65625, 0.15625],
[0.40625, 0.65625, 0.65625],
[0.15625, 0.65625, 0.40625],
[0.65625, 0.15625, 0.90625],
[0.40625, 0.15625, 0.15625],
[0.40625, 0.90625, 0.40625],
[0.65625, 0.40625, 0.40625],
[0.15625, 0.15625, 0.90625],
[0.40625, 0.40625, 0.40625],
[0.65625, 0.90625, 0.40625],
[0.90625, 0.15625, 0.40625]])
the desired (sorted) array should be
array([[0.15625, 0.15625, 0.90625],
[0.15625, 0.65625, 0.40625],
[0.15625, 0.90625, 0.40625],
[0.40625, 0.15625, 0.15625],
[0.40625, 0.40625, 0.15625],
[0.40625, 0.40625, 0.40625],
[0.40625, 0.65625, 0.15625],
[0.40625, 0.65625, 0.65625],
[0.40625, 0.90625, 0.40625],
[0.40625, 0.90625, 0.65625],
[0.65625, 0.15625, 0.90625],
[0.65625, 0.40625, 0.40625],
[0.65625, 0.90625, 0.40625],
[0.90625, 0.15625, 0.40625],
[0.90625, 0.40625, 0.90625],
[0.90625, 0.90625, 0.15625]])
I thought I could achieve that by using np.lexsort
but it seems I am probably missing something and is not working as expected. So far, I've been doing the following
In [28]: k[np.lexsort((k[:,2], k[:,1], k[:,0]))]
Out[28]:
array([[0.15625, 0.65625, 0.40625],
[0.15625, 0.15625, 0.90625],
[0.15625, 0.90625, 0.40625],
[0.40625, 0.65625, 0.65625],
[0.40625, 0.90625, 0.40625],
[0.40625, 0.15625, 0.15625],
[0.40625, 0.40625, 0.40625],
[0.40625, 0.90625, 0.65625],
[0.40625, 0.40625, 0.15625],
[0.40625, 0.65625, 0.15625],
[0.65625, 0.15625, 0.90625],
[0.65625, 0.90625, 0.40625],
[0.65625, 0.40625, 0.40625],
[0.90625, 0.40625, 0.90625],
[0.90625, 0.15625, 0.40625],
[0.90625, 0.90625, 0.15625]])
It seems that the first column is sorted properly but the others are not. A similar question was asked before but I believe the accepted answer (which is essentially what I am doing) does not work.
From what I understood after looking a little bit more into it, I think it has to do with the values of the array being floats.
EDIT
I found the answer to my problem. However, I'll add it as an "edit" rather than posting it as an answer because I believe this whole situation could possibly be avoided if I had mentioned a fine detail about matrix k
in my original post. Matrix k
is created from another matrix a
, where a
is essentially created by reading a matrix of floats with 16 decimals from a file. Now let's look at the workflow that led me to the solution.
In [6]: k=a[[1,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60]]
In [7]: k
Out[7]:
array([[0.15625, 0.15625, 0.40625],
[0.15625, 0.40625, 0.15625],
[0.15625, 0.65625, 0.15625],
[0.15625, 0.90625, 0.15625],
[0.40625, 0.15625, 0.15625],
[0.40625, 0.40625, 0.15625],
[0.40625, 0.65625, 0.15625],
[0.40625, 0.90625, 0.15625],
[0.65625, 0.15625, 0.15625],
[0.65625, 0.40625, 0.15625],
[0.65625, 0.65625, 0.15625],
[0.65625, 0.90625, 0.15625],
[0.90625, 0.15625, 0.15625],
[0.90625, 0.40625, 0.15625],
[0.90625, 0.65625, 0.15625],
[0.90625, 0.90625, 0.15625]])
In [8]: np.random.shuffle(k)
In [9]: k
Out[9]:
array([[0.15625, 0.90625, 0.15625],
[0.90625, 0.40625, 0.15625],
[0.40625, 0.65625, 0.15625],
[0.90625, 0.90625, 0.15625],
[0.15625, 0.40625, 0.15625],
[0.65625, 0.15625, 0.15625],
[0.40625, 0.90625, 0.15625],
[0.65625, 0.65625, 0.15625],
[0.40625, 0.15625, 0.15625],
[0.90625, 0.65625, 0.15625],
[0.65625, 0.40625, 0.15625],
[0.15625, 0.65625, 0.15625],
[0.65625, 0.90625, 0.15625],
[0.15625, 0.15625, 0.40625],
[0.90625, 0.15625, 0.15625],
[0.40625, 0.40625, 0.15625]])
In [10]: k[np.lexsort((k[:,2],k[:,1],k[:,0]))]
Out[10]:
array([[0.15625, 0.40625, 0.15625],
[0.15625, 0.65625, 0.15625],
[0.15625, 0.90625, 0.15625],
[0.15625, 0.15625, 0.40625],
[0.40625, 0.65625, 0.15625],
[0.40625, 0.90625, 0.15625],
[0.40625, 0.15625, 0.15625],
[0.40625, 0.40625, 0.15625],
[0.65625, 0.15625, 0.15625],
[0.65625, 0.40625, 0.15625],
[0.65625, 0.65625, 0.15625],
[0.65625, 0.90625, 0.15625],
[0.90625, 0.15625, 0.15625],
[0.90625, 0.40625, 0.15625],
[0.90625, 0.65625, 0.15625],
[0.90625, 0.90625, 0.15625]])
In [11]: k=np.round(k, 5)
In [12]: k[np.lexsort((k[:,2],k[:,1],k[:,0]))]
Out[12]:
array([[0.15625, 0.15625, 0.40625],
[0.15625, 0.40625, 0.15625],
[0.15625, 0.65625, 0.15625],
[0.15625, 0.90625, 0.15625],
[0.40625, 0.15625, 0.15625],
[0.40625, 0.40625, 0.15625],
[0.40625, 0.65625, 0.15625],
[0.40625, 0.90625, 0.15625],
[0.65625, 0.15625, 0.15625],
[0.65625, 0.40625, 0.15625],
[0.65625, 0.65625, 0.15625],
[0.65625, 0.90625, 0.15625],
[0.90625, 0.15625, 0.15625],
[0.90625, 0.40625, 0.15625],
[0.90625, 0.65625, 0.15625],
[0.90625, 0.90625, 0.15625]])
In [13]: np.savetxt(sys.stdout, a[[1,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60]], fmt='%.18f')
0.156250000000000000 0.156250000000000000 0.406250000000000000
0.156249999999999972 0.406250000000000000 0.156250000000000028
0.156249999999999972 0.656250000000000000 0.156250000000000028
0.156249999999999972 0.906250000000000000 0.156250000000000028
0.406250000000000000 0.156249999999999972 0.156250000000000028
0.406250000000000000 0.406250000000000000 0.156250000000000028
0.406249999999999944 0.656250000000000000 0.156250000000000028
0.406249999999999944 0.906250000000000000 0.156250000000000028
0.656250000000000000 0.156249999999999972 0.156250000000000028
0.656250000000000000 0.406249999999999944 0.156250000000000028
0.656250000000000000 0.656250000000000000 0.156250000000000028
0.656250000000000000 0.906250000000000000 0.156250000000000056
0.906250000000000000 0.156249999999999972 0.156250000000000028
0.906250000000000000 0.406249999999999944 0.156250000000000028
0.906250000000000000 0.656250000000000000 0.156250000000000056
0.906250000000000000 0.906250000000000000 0.156250000000000056
As can be seen by the above, it was all a matter of rounding errors. Apparently, everything was seemingly fine when printed with a few decimals, but when the file was read and matrix a
was created, it was stored with inaccuracies after the 16th decimal place. Consequently, these inaccuracies were carried down to k
when it was defined from a
. Therefore, lexsort
was giving the correct result from the beginning considering the real number that was stored in the matrix. Everything worked fine when I rounded matrix k
.
Moral of the story: Always check the accuraccies of your values.
I think numpy
isn't flexible for this kind of operations, though I can't deny some kind of solution exists. I recommend you to use other packages such as pandas
or numpy_indexed
(assuming data
is your array):
pandas
import pandas as pd
df = pd.DataFrame(data)
sorted_data = np.array(df.sort_values(by=[0,1,2]))
numpy_indexed
import numpy_indexed as npi
npi.sort(data)
Sources
For more general cases of usages you might like to check this answer
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.