简体   繁体   English

按列对二维 numpy 字符串/浮点数组进行排序

[英]Sort 2d numpy string/float array by column

I have the following numpy array (of numbers of rocket launches pr country since 1957), and I would like to sort it in ascending order on number of launches.我有以下 numpy 数组(自 1957 年以来国家/地区的火箭发射次数),我想按发射次数的升序对其进行排序。

   ['Australia', 6.0],
   ['Brazil', 3.0],
   ['China', 269.0],
   ['France', 303.0],
   ['India', 76.0],
   ['Iran', 14.0],
   ['Israel', 11.0],
   ['Japan', 126.0],
   ['Kazakhstan', 701.0],
   ['Kenya', 9.0],
   ['New Zealand', 13.0],
   ['North Korea', 5.0],
   ['Pacific Ocean', 36.0],
   ['Russian Federation', 1398.0],
   ['South Korea', 3.0],
   ['USA', 1351.0]

Problem is, np.sort(a, axis = 0) only sort the values, but countries are not linked, so ei North Korea has launched 269 rockets (which is probably more likely than 5)问题是, np.sort(a,axis = 0) 只对数值进行排序,但国家之间没有联系,所以 ei 朝鲜发射了 269 枚火箭(可能比 5 枚更有可能)

Or, if I do np.sort(a, axis = 1) then I get an error saying或者,如果我执行 np.sort(a,axis = 1) 然后我收到一个错误说

TypeError: '<' not supported between instances of 'float' and 'str'类型错误:“float”和“str”的实例之间不支持“<”

Any Ideas would be very much appreciated!任何想法将不胜感激!

import numpy as np

data = [
   ['Australia', 6.0], 
   ['Brazil', 3.0],
   ['China', 269.0],
   ['France', 303.0],
   ['India', 76.0],
   ['Iran', 14.0],
   ['Israel', 11.0],
   ['Japan', 126.0],
   ['Kazakhstan', 701.0],
   ['Kenya', 9.0],
   ['New Zealand', 13.0],
   ['North Korea', 5.0],
   ['Pacific Ocean', 36.0],
   ['Russian Federation', 1398.0],
   ['South Korea', 3.0],
   ['USA', 1351.0]
]

We can create a structured array and then sort it by keys:我们可以创建一个结构化数组,然后按键对其进行排序:

dtype = [
    ('name', '<U18'),    
    ('rockets', float)
]

data = np.array([tuple(x) for x in data], dtype=dtype) 
sorted_data = np.sort(data, order=['rockets'])          

print(sorted_data)

This is easy with python list sorting:这很容易使用 python 列表排序:

In [208]: alist = [   ['Australia', 6.0],
     ...:    ['Brazil', 3.0],
     ...:    ['China', 269.0],
     ...:    ['France', 303.0],
     ...:    ['India', 76.0],
     ...:    ['Iran', 14.0],
     ...:    ['Israel', 11.0],
     ...:    ['Japan', 126.0],
     ...:    ['Kazakhstan', 701.0],
     ...:    ['Kenya', 9.0],
     ...:    ['New Zealand', 13.0],
     ...:    ['North Korea', 5.0],
     ...:    ['Pacific Ocean', 36.0],
     ...:    ['Russian Federation', 1398.0],
     ...:    ['South Korea', 3.0],
     ...:    ['USA', 1351.0]]
In [209]: newlist = sorted(alist, key=lambda x: x[1])
In [210]: newlist
Out[210]: 
[['Brazil', 3.0],
 ['South Korea', 3.0],
 ['North Korea', 5.0],
 ['Australia', 6.0],
 ['Kenya', 9.0],
 ['Israel', 11.0],
 ['New Zealand', 13.0],
 ['Iran', 14.0],
 ['Pacific Ocean', 36.0],
 ['India', 76.0],
 ['Japan', 126.0],
 ['China', 269.0],
 ['France', 303.0],
 ['Kazakhstan', 701.0],
 ['USA', 1351.0],
 ['Russian Federation', 1398.0]]

With an object dtype array (to preserved string and float columns):使用对象 dtype 数组(保留字符串和浮点列):

In [211]: arr = np.array(alist, object)
In [212]: arr
Out[212]: 
array([['Australia', 6.0],
       ['Brazil', 3.0],
       ['China', 269.0],
       ['France', 303.0],
       ...
       ['USA', 1351.0]], dtype=object)

Get a sorting index by just looking at the 2nd column:只需查看第二列即可获得排序索引:

In [213]: idx = np.argsort(arr[:,1])
In [214]: idx
Out[214]: array([ 1, 14, 11,  0,  9,  6, 10,  5, 12,  4,  7,  2,  3,  8, 15, 13])
In [215]: arr[idx]
Out[215]: 
array([['Brazil', 3.0],
       ['South Korea', 3.0],
       ['North Korea', 5.0],
       ['Australia', 6.0],
       ['Kenya', 9.0],
       ...
       ['Russian Federation', 1398.0]], dtype=object)

The structured array approach in the other answer is fine too.另一个答案中的结构化数组方法也很好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM