简体   繁体   English

dtype = object在创建numpy数组时意味着什么?

[英]What does dtype=object mean while creating a numpy array?

I was experimenting with numpy arrays and created a numpy array of strings: 我正在尝试使用numpy数组并创建了一个numpy字符串数组:

ar1 = np.array(['avinash', 'jay'])

As I have read from from their official guide, operations on numpy array are propagated to individual elements. 正如我从其官方指南中读到的那样,numpy数组上的操作会传播到单个元素。 So I did this: 所以我这样做了:

ar1 * 2

But then I get this error: 但后来我得到了这个错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-22-aaac6331c572> in <module>()
----> 1 ar1 * 2

TypeError: unsupported operand type(s) for *: 'numpy.ndarray' and 'int'

But when I used dtype=object 但是当我使用dtype=object

ar1 = np.array(['avinash', 'jay'], dtype=object)

while creating the array I am able to do all operations. 在创建数组时,我能够完成所有操作。

Can anyone tell me why this is happening? 谁能告诉我为什么会这样?

NumPy arrays are stored as contiguous blocks of memory. NumPy数组存储为连续的内存块。 They usually have a single datatype (eg integers, floats or fixed-length strings) and then the bits in memory are interpreted as values with that datatype. 它们通常具有单个数据类型(例如整数,浮点数或固定长度字符串),然后内存中的位被解释为具有该数据类型的值。

Creating an array with dtype=object is different. 使用dtype=object创建数组是不同的。 The memory taken by the array now is filled with pointers to Python objects which are being stored elsewhere in memory (much like a Python list is really just a list of pointers to objects, not the objects themselves). 现在,数组占用的内存中填充了Python对象的指针 ,这些对象存储在内存中的其他地方 (就像Python list实际上只是指向对象的指针列表,而不是对象本身)。

Arithmetic operators such as * don't work with arrays such as ar1 which have a string_ datatype (there are special functions instead - see below). 诸如*算术运算符不适用于具有string_数据类型的ar1等数组(而是有特殊函数 - 见下文)。 NumPy is just treating the bits in memory as characters and the * operator doesn't make sense here. NumPy只是将内存中的位视为字符,而*运算符在这里没有意义。 However, the line 但是,行

np.array(['avinash','jay'], dtype=object) * 2

works because now the array is an array of (pointers to) Python strings. 因为现在数组是一个(指向)Python字符串的数组。 The * operator is well defined for these Python string objects. *运算符是为这些Python字符串对象定义的。 New Python strings are created in memory and a new object array with references to the new strings is returned. 在内存中创建新的Python字符串,并返回一个引用新字符串的新object数组。


If you have an array with string_ or unicode_ dtype and want to repeat each string, you can use np.char.multiply : 如果你有一个带有string_unicode_的数组并想重复每个字符串,你可以使用np.char.multiply

In [52]: np.char.multiply(ar1, 2)
Out[52]: array(['avinashavinash', 'jayjay'], 
      dtype='<U14')

NumPy has many other vectorised string methods too. NumPy还有许多其他矢量化字符串方法

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM