简体   繁体   中英

Inconsistent dtype inference for mixed floats and strings

np.array([5.3, 1.2, 76.1, 'Alice', 'Bob', 'Claire'])

我想知道为什么这给出了 dtype=U32 的 dtype,但是下面的代码给出了 U6 的 dtype。

np.array(['Alice', 'Bob', 'Claire', 5.3, 1.2, 76.1])

Numpy tries to be efficient when storing datatypes by calculating how many bits it will take to store an object.

import np
a = np.array([5.3, 1.2, 76.1, 'Alice', 'Bob', 'Claire'])
b = np.array(['Alice', 'Bob', 'Claire', 5.3, 1.2, 76.1])
print(a.dtype, b.dtype)

>>> <U32 <U6

Numpy sees 5.3 and puts it into a datatype which is a 32-codepoint data-type due to the datatype conversion rules:

Type of the data (integer, float, Python object, etc.)

Size of the data (how many bytes is in eg the integer)

Byte order of the data (little-endian or big-endian)

If the data type is structured data type, an aggregate of other data types, (eg, describing an array item consisting of an integer and a float),

what are the names of the “fields” of the structure, by which they can be accessed,

what is the data-type of each field, and

which part of the memory block each field takes.

If the data type is a sub-array, what is its shape and data type.

When it sees the other strings in the array, they can fit within the 32-codepoint data-type and so it doesn't have to be changed.

Now, consider the second example. Numpy sees Alice and puts it into a datatype which can hold six bits. Numpy continues along and sees 5.3 , which can also be fit into a 6-codepoint data-type. So no upgrading is required.

Similarly, when running np.array(['Alice', 'Bob', 'Claire', 5.3, 1.2, 76.1, 'Bobby', 2.3000000000001]) it results in a U15 as Numpy sees 2.3000000000001 and finds out that the datatype that it is using is not large enough to hold 2.3000000000001 and then upgrades it.


The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM