[英]How can I add two different type of data, string and int, into numpy ndarray
I used pandas.read_csv to read a excel file, there are two columns in my file, one is a string type, the other is integer. 我用pandas.read_csv读取一个excel文件,我的文件中有两列,一列是字符串类型,另一列是整数。
data = pandas.read_csv('data.csv')
Then, I printed out these data types for these numpy ndarrays. 然后,我为这些numpy ndarray打印了这些数据类型。
print(type(data.get_values()[0, 0]))
print(type(data.get_values()[0, 1]))
result: 结果:
<class 'str'>
<class 'int'>
It showed me that there is a way to add two different data types in a same numpy ndarrays. 它告诉我,有一种方法可以在相同的numpy ndarray中添加两种不同的数据类型。
However, when I wanna try to add two different data types of data in a same numpy ndarrays: 但是,当我想在同一个numpy ndarrays中添加两种不同数据类型的数据时:
arr = numpy.ndarray((1, 2))
arr[0][0] = 1
arr[0][1] = 'str'
The Console showed me this information: 控制台向我显示了以下信息:
ValueError: could not convert string to float: 'str'
Does anyone can tell me how to do that like class pandas did? 有人能告诉我该怎么做吗?
You can create numpy ndarray
s with arbitrary C-style datatypes for each of the fields. 您可以为每个字段创建具有任意C样式数据类型的numpy ndarray
。 The trick is to create the datatype for the array first, and then set that as the dtype
for the array. 诀窍是先为数组创建数据类型 ,然后将其设置为数组的dtype
。 The only annoying thing about this is, since they are C-style types, the types have to be defined explicitly and that includes, if you have strings, setting the number of characters each field can contain. 唯一令人讨厌的事情是,由于它们是C样式类型,因此必须显式定义这些类型,并且如果您有字符串,则包括设置每个字段可以包含的字符数。
For example: 例如:
>>> import numpy as np
>>> person_dt = np.dtype([('Name', 'S25'), ('Age', np.uint8)])
>>> person_dt
dtype([('Name', 'S25'), ('Age', 'u1')])
>>> persons = np.array([('alice', 35), ('bob', 39)], dtype=person_dt)
>>> persons
array([(b'alice', 35), (b'bob', 39)],
dtype=[('Name', 'S25'), ('Age', 'u1')])
Here I'm creating a numpy dtype
. 在这里,我正在创建一个numpy dtype
。 Each separate portion of an array is a field
, and I'm assigning Name
and Age
to the names of those fields, and assigning the type for each field. 数组的每个单独部分都是一个field
,我正在为这些字段的名称分配Name
和Age
,并为每个字段分配类型。 So the Name
field is a string of 25 characters or less (which is a \\0
terminated string like you would have in C), and the age is an unsigned integer since our ages will of course be less than 255. Note the b
before the string just represents that the type is a byte-string
因此,“ Name
字段是一个包含25个字符或更少字符的字符串(这是\\0
终止的字符串,就像在C语言中一样),并且年龄是一个无符号整数,因为我们的年龄当然会小于255。请注意前面的b
该字符串仅表示类型是byte-string
Then I simply create the array with the new dtype
and pass in the values. 然后,我只需使用新的dtype
创建数组并传递值。
What's cool about this is you can grab the values by which field they belong to. 最酷的是,您可以通过它们所属的字段来获取值。 For example, you can grab all the ages by grabbing the Age
field, and it will have the type I assigned the ages to: 例如,您可以通过获取“ Age
字段来获取所有Age
,并且它将具有我将年龄分配给的类型:
>>> persons['Age']
array([35, 39], dtype=uint8)
So you can go further and index into these resulting arrays: 因此,您可以进一步深入这些结果数组:
>>> persons['Name'][1]
b'bob'
And you can still create and assign like you would normally: 而且您仍然可以像通常那样创建和分配:
>>> new_persons = np.zeros(5, dtype=person_dt)
>>> new_persons
array([(b'', 0), (b'', 0), (b'', 0), (b'', 0), (b'', 0)],
dtype=[('Name', 'S25'), ('Age', 'u1')])
>>> new_persons[0] = ('alice', 25)
>>> new_persons[1] = ('bob', 26)
>>> new_persons['Name'][2:5]
array([b'', b'', b''],
dtype='|S25')
>>> new_persons['Name'][2:5] = 'carol', 'david', 'eve'
>>> new_persons['Age'][2:5] = 27, 28, 29
>>> new_persons
array([(b'alice', 25), (b'bob', 26), (b'carol', 27), (b'david', 28), (b'eve', 29)],
dtype=[('Name', 'S25'), ('Age', 'u1')])
I attended a talk a little while ago all about creating and managing numpy dtypes
and it was great; 我前不久参加了一次有关创建和管理numpy dtypes
,这很棒。 the Jupyter notebook for the talk is online and you can access it here , which might shed a bit more light on all the different ways you can use them. 演讲的Jupyter笔记本在线,您可以在这里访问 ,它可能会更进一步说明您使用它们的所有不同方式。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.