简体   繁体   English

如何处理numpy数组中的混合数据类型

[英]How to handle mixed data types in numpy arrays

Stuck in this Numpy Problem 陷入这个顽皮的问题

country=['India','USA']
​gdp=[22,33]

import numpy as np
a=np.column_stack((country,gdp))

array([['India', '22'],
       ['USA', '33']], dtype='<U11')

I have an NDArray and I want to find the maximum of the 2nd column. 我有一个NDArray,我想找到第二列的最大值。 I tried the below 我尝试了以下

print(a.max(axis=1)[1])
print(a[:,1].max())

It threw this error: TypeError: cannot perform reduce with flexible type 它引发了此错误: TypeError: cannot perform reduce with flexible type

Tried converting the type 尝试转换类型

datatype=([('country',np.str_,64),('gross',np.float32)])

new=np.array(a,dtype=datatype)

But got the below error 但是出现了以下错误

could not convert string to float: 'India'. 无法将字符串转换为float:“印度”。

The error is due to the string data in your array, which makes the dtype to be Unicode(indicated by U11 ie, 11-character unicode) string. 该错误是由于数组中的字符串数据导致,使得dtype为Unicode(由U11表示,即11个字符的unicode)字符串。 If you wish to store data in the numerical format, then use structured arrays . 如果您希望以数字格式存储数据,请使用structured arrays However, if you only wish to compute the maximum of the numerical column, use 但是,如果仅希望计算数值列的最大值,请使用

print(a[:, 1].astype(np.int).max())
// 33

You may choose to use other numerical dtypes such as np.float inplace of np.int based on the nature of data in the specific column. 您可以根据特定列中数据的性质,选择使用其他数字dtype,例如np.float np.int

Consider using numpy structured arrays for mixed types. 考虑将numpy结构化数组用于混合类型。 You will have no issues if you explicitly set data types. 如果您明确设置数据类型,则不会有任何问题。

This is often necessary, and certainly advisable, with numpy . 对于numpy ,这通常是必要的,当然也是可取的。

import numpy as np

country = ['India','USA','UK']
gdp = [22,33,4]

a = np.array(list(zip(country, gdp)),
             dtype=[('Country', '|S11'), ('Number', '<i8')])

res_asc = np.sort(a, order='Number')

# array([(b'UK', 4), (b'India', 22), (b'USA', 33)], 
#       dtype=[('Country', 'S11'), ('Number', '<i8')])

res_desc = np.sort(a, order='Number')[::-1]

# array([(b'USA', 33), (b'India', 22), (b'UK', 4)], 
#       dtype=[('Country', 'S11'), ('Number', '<i8')])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM