简体   繁体   English

如何将两种不同类型的数据(字符串和整数)添加到numpy ndarray中

[英]How can I add two different type of data, string and int, into numpy ndarray

I used pandas.read_csv to read a excel file, there are two columns in my file, one is a string type, the other is integer. 我用pandas.read_csv读取一个excel文件,我的文件中有两列,一列是字符串类型,另一列是整数。

data = pandas.read_csv('data.csv')

Then, I printed out these data types for these numpy ndarrays. 然后,我为这些numpy ndarray打印了这些数据类型。

print(type(data.get_values()[0, 0]))
print(type(data.get_values()[0, 1]))

result: 结果:

<class 'str'>
<class 'int'>

It showed me that there is a way to add two different data types in a same numpy ndarrays. 它告诉我,有一种方法可以在相同的numpy ndarray中添加两种不同的数据类型。

However, when I wanna try to add two different data types of data in a same numpy ndarrays: 但是,当我想在同一个numpy ndarrays中添加两种不同数据类型的数据时:

arr = numpy.ndarray((1, 2))
arr[0][0] = 1
arr[0][1] = 'str'

The Console showed me this information: 控制台向我显示了以下信息:

ValueError: could not convert string to float: 'str'

Does anyone can tell me how to do that like class pandas did? 有人能告诉我该怎么做吗?

You can create numpy ndarray s with arbitrary C-style datatypes for each of the fields. 您可以为每个字段创建具有任意C样式数据类型的numpy ndarray The trick is to create the datatype for the array first, and then set that as the dtype for the array. 诀窍是先为数组创建数据类型 ,然后将其设置为数组的dtype The only annoying thing about this is, since they are C-style types, the types have to be defined explicitly and that includes, if you have strings, setting the number of characters each field can contain. 唯一令人讨厌的事情是,由于它们是C样式类型,因此必须显式定义这些类型,并且如果您有字符串,则包括设置每个字段可以包含的字符数。

For example: 例如:

>>> import numpy as np
>>> person_dt = np.dtype([('Name', 'S25'), ('Age', np.uint8)])
>>> person_dt
dtype([('Name', 'S25'), ('Age', 'u1')])
>>> persons = np.array([('alice', 35), ('bob', 39)], dtype=person_dt)
>>> persons
array([(b'alice', 35), (b'bob', 39)],
      dtype=[('Name', 'S25'), ('Age', 'u1')])

Here I'm creating a numpy dtype . 在这里,我正在创建一个numpy dtype Each separate portion of an array is a field , and I'm assigning Name and Age to the names of those fields, and assigning the type for each field. 数组的每个单独部分都是一个field ,我正在为这些字段的名称分配NameAge ,并为每个字段分配类型。 So the Name field is a string of 25 characters or less (which is a \\0 terminated string like you would have in C), and the age is an unsigned integer since our ages will of course be less than 255. Note the b before the string just represents that the type is a byte-string 因此,“ Name字段是一个包含25个字符或更少字符的字符串(这是\\0终止的字符串,就像在C语言中一样),并且年龄是一个无符号整数,因为我们的年龄当然会小于255。请注意前面的b该字符串仅表示类型是byte-string

Then I simply create the array with the new dtype and pass in the values. 然后,我只需使用新的dtype创建数组并传递值。

What's cool about this is you can grab the values by which field they belong to. 最酷的是,您可以通过它们所属的字段来获取值。 For example, you can grab all the ages by grabbing the Age field, and it will have the type I assigned the ages to: 例如,您可以通过获取“ Age字段来获取所有Age ,并且它将具有我将年龄分配给的类型:

>>> persons['Age']
array([35, 39], dtype=uint8)

So you can go further and index into these resulting arrays: 因此,您可以进一步深入这些结果数组:

>>> persons['Name'][1]
b'bob'

And you can still create and assign like you would normally: 而且您仍然可以像通常那样创建和分配:

>>> new_persons = np.zeros(5, dtype=person_dt)
>>> new_persons
array([(b'', 0), (b'', 0), (b'', 0), (b'', 0), (b'', 0)],
      dtype=[('Name', 'S25'), ('Age', 'u1')])
>>> new_persons[0] = ('alice', 25)
>>> new_persons[1] = ('bob', 26)
>>> new_persons['Name'][2:5]
array([b'', b'', b''],
      dtype='|S25')
>>> new_persons['Name'][2:5] = 'carol', 'david', 'eve'
>>> new_persons['Age'][2:5] = 27, 28, 29
>>> new_persons
array([(b'alice', 25), (b'bob', 26), (b'carol', 27), (b'david', 28), (b'eve', 29)],
      dtype=[('Name', 'S25'), ('Age', 'u1')])

I attended a talk a little while ago all about creating and managing numpy dtypes and it was great; 我前不久参加了一次有关创建和管理numpy dtypes ,这很棒。 the Jupyter notebook for the talk is online and you can access it here , which might shed a bit more light on all the different ways you can use them. 演讲的Jupyter笔记本在线,您可以在这里访问 ,它可能会更进一步说明您使用它们的所有不同方式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在python numpy ndarray中添加点-数据类型问题 - How to add a dot in python numpy ndarray - data type issue 数据类型为numpy.ndarray但预期为numpy.int64 - data type is numpy.ndarray but expected numpy.int64 如何将字符串添加到numpy字符串ndarray - How to add a string to a numpy string ndarray 如何将 `numpy.hstack()` 与 `numpy.ndarray` 数据类型一起使用? - How to use `numpy.hstack()` with `numpy.ndarray` data type? 如何获取两个不同的numpy.ndarray子类的__matmul__以返回特定的子类? - How do I get __matmul__ of two different numpy.ndarray subclasses to return a particular subclass? 如何覆盖NumPy的ndarray和我的类型之间的比较? - How can I override comparisons between NumPy's ndarray and my type? 我怎么能总是让 numpy.ndarray.shape 返回一个二值元组? - How can I always have numpy.ndarray.shape return a two valued tuple? 如何使用python将所有索引值的总和加到numpy ndarray的每个元素中? - How can I add to each element of a numpy ndarray the sum of all its index values with python ? MXNET-无效的类型&#39; <type 'numpy.ndarray'> &#39;用于数据,应为NDArray,numpy.ndarray, - MXNET - Invalid type '<type 'numpy.ndarray'>' for data, should be NDArray, numpy.ndarray, 如何在有条件的 ndarray 上使用 numpy.mean()? - How can I use numpy.mean() on ndarray with a condition?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM