I have a two dimensional list like this
a = [[42, 206], [45, 40], [45, 205], [46, 41], [46, 205], [47, 40], [47, 202], [48, 40], [48, 202], [49, 38]]
Actually these are coordinates in 2D-Euclidean space. I want to sort it like in a way that close points come in a sequence. So, the list looks like the following
sorted_a = [[45,205],[42,206],[46,205],[47,202],[48,202],[45,40],[46,41],[47,40],[48,40],[49,38]]
I have also used the method
sorted_a = sorted(a, key=lambda x: (x[0],x[1]))
but it is not returning me required results. Your help is appreciated. Thanks
I'm not sure this is a sorting problem; it's more of a grouping one (or optimization?)
Sorting requires some criteria for putting the [45,205] list before [42,206]. key
works if you can come up with one number that represents the desired order.
For example calculate the distance from the origin
A = np.array(a)
creates a numpy array:
In [346]: A
Out[346]:
array([[ 42, 206],
[ 45, 40],
[ 45, 205],
[ 46, 41],
[ 46, 205],
[ 47, 40],
[ 47, 202],
[ 48, 40],
[ 48, 202],
[ 49, 38]])
distance or radius in polar coordinates is sum of squares ( sqrt
isn't needed for this purpose). Applying argsort
to this ranks the points by distance from origin.
In [347]: np.sum(A**2,axis=1)
Out[347]: array([44200, 3625, 44050, 3797, 44141, 3809, 43013, 3904, 43108, 3845])
In [348]: r = np.sum(A**2,axis=1)
In [349]: idx = np.argsort(r)
In [350]: idx
Out[350]: array([1, 3, 5, 9, 7, 6, 8, 2, 4, 0], dtype=int32)
In [351]: A[idx,:]
Out[351]:
array([[ 45, 40],
[ 46, 41],
[ 47, 40],
[ 49, 38],
[ 48, 40],
[ 47, 202],
[ 48, 202],
[ 45, 205],
[ 46, 205],
[ 42, 206]])
The list equivalent operation uses a key function like
def foo(xy):
x,y=xy
return x**2+y**2
In [356]: sorted(a, key=foo)
Out[356]:
[[45, 40],
[46, 41],
[47, 40],
[49, 38],
[48, 40],
[47, 202],
[48, 202],
[45, 205],
[46, 205],
[42, 206]]
In numpy
it's fairly easy to come up with pairwise distance (even easier with one of the scipy
tools). But what would you do with those? What defines order based on such distances?
For example to use the kind of iteration that we are often asked to 'vectorize':
In [369]: D = np.zeros((10,10))
In [370]: for i in range(10):
...: for j in range(i,10):
...: D[i,j] = np.sqrt(sum((A[i,:]-A[j,:])**2))
# D[i,j] = np.linalg.norm(A[i,:]-A[j,:])
In [372]: D.astype(int)
Out[372]:
array([[ 0, 166, 3, 165, 4, 166, 6, 166, 7, 168],
[ 0, 0, 165, 1, 165, 2, 162, 3, 162, 4],
[ 0, 0, 0, 164, 1, 165, 3, 165, 4, 167],
[ 0, 0, 0, 0, 164, 1, 161, 2, 161, 4],
[ 0, 0, 0, 0, 0, 165, 3, 165, 3, 167],
[ 0, 0, 0, 0, 0, 0, 162, 1, 162, 2],
[ 0, 0, 0, 0, 0, 0, 0, 162, 1, 164],
[ 0, 0, 0, 0, 0, 0, 0, 0, 162, 2],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 164],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
is a matrix of distances, rounded for ease of display.
numpy has a lexical sort . We could use that to sort on the 2nd coordinate first, and then the 1st coor. That would group all those 200's together:
In [375]: np.lexsort(A.T)
Out[375]: array([9, 1, 5, 7, 3, 6, 8, 2, 4, 0], dtype=int32)
In [376]: A[_,:]
Out[376]:
array([[ 49, 38],
[ 45, 40],
[ 47, 40],
[ 48, 40],
[ 46, 41],
[ 47, 202],
[ 48, 202],
[ 45, 205],
[ 46, 205],
[ 42, 206]])
pairwise distances with that sorted array look like:
array([[ 0, 4, 2, 2, 4, 164, 164, 167, 167, 168],
[ 0, 0, 2, 3, 1, 162, 162, 165, 165, 166],
[ 0, 0, 0, 1, 1, 162, 162, 165, 165, 166],
[ 0, 0, 0, 0, 2, 162, 162, 165, 165, 166],
[ 0, 0, 0, 0, 0, 161, 161, 164, 164, 165],
[ 0, 0, 0, 0, 0, 0, 1, 3, 3, 6],
[ 0, 0, 0, 0, 0, 0, 0, 4, 3, 7],
[ 0, 0, 0, 0, 0, 0, 0, 0, 1, 3],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 4],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
Another way of thinking of this problem is as a search problem, for example seeking to find the order of points that minimizes the 'travel' distance, ie the sum of distances between successive points.
With the original a
( A
), the distance (with default np.linalg.norm
method) between successive points is
In [407]: np.linalg.norm(A[1:]-A[:-1],axis=1)
Out[407]:
array([ 166.02710622, 165. , 164.00304875, 164. ,
165.00303028, 162. , 162.00308639, 162. ,
164.00304875])
and their sum:
In [408]: _.sum()
Out[408]: 1474.0393203904973
With the lexsort
order
In [410]: np.linalg.norm(A1[1:]-A1[:-1],axis=1)
Out[410]:
array([ 4.47213595, 2. , 1. , 2.23606798,
161.00310556, 1. , 4.24264069, 1. ,
4.12310563])
In [411]: _.sum()
Out[411]: 181.07705580534656
Clearly this has better clustering, mainly based on the 2nd column values.
Your sorted_a
improves this sum a bit:
In [414]: sortedA = np.array(sorted_a)
In [415]: np.linalg.norm(sortedA[1:]-sortedA[:-1],axis=1)
Out[415]:
array([ 3.16227766, 4.12310563, 3.16227766, 1. ,
162.0277754 , 1.41421356, 1.41421356, 1. ,
2.23606798])
In [416]: _.sum()
Out[416]: 179.53993144488973
A brute force solution is to try all the permutations, and pick the one that minimizes this sum.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.