[英]What is the default dtype for str like input in numpy?
I just wanted to confirm if the default data type for string is unicode
while creating a ndarray
. 我只是想确认在创建
ndarray
字符串的默认数据类型是否为unicode
。 I could not find any reference which states this clearly. 我找不到任何清楚说明这一点的参考文献。 May be it is too obvious and doesn't need stating.
可能是太明显了,不需要陈述。
When dtype is specified: 指定dtype时:
>>> import numpy as np
>>> g = np.array([['a', 'b'],['c', 'd']], dtype='S')
>>> g
array([[b'a', b'b'],
[b'c', b'd']],
dtype='|S1')
Without specifying the dtype: 没有指定dtype:
>>> g = np.array([['a', 'b'],['c', 'd']])
>>> g
array([['a', 'b'],
['c', 'd']],
dtype='<U1')
Also, what does the literal b
indicate when dtype is specified. 此外,当指定dtype时,文字
b
表示什么。 As per the documentation, it indicates bool
which doesn't seem to be the case here. 根据文档,它表明
bool
似乎不是这里的情况。
Can some one please clarify? 有人可以澄清一下吗?
b'...'
means it's a byte-string and the default dtype for arrays of strings depends on the kind of strings. b'...'
表示它是一个字节字符串,字符串数组的默认dtype取决于字符串的类型。 Unicodes (python 3 strings are unicode) are U
and Python 2 str
or Python 3 bytes
have the dtype S
. Unicodes(python 3字符串是unicode)是
U
而Python 2 str
或Python 3 bytes
都有dtype S
You can find the explanation of dtypes in the NumPy documentation here 您可以在NumPy文档中找到dtypes的说明
Array-protocol type strings
数组协议类型字符串
The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters.
第一个字符指定数据类型,其余字符指定每个项目的字节数,Unicode除外,其中它被解释为字符数。 The item size must correspond to an existing type, or an error will be raised.
项目大小必须与现有类型相对应,否则将引发错误。 The supported kinds are:
支持的种类是:
- '?'
'?' boolean
布尔
- 'b' (signed) byte
'b'(带符号)字节
- 'B' unsigned byte
'B'无符号字节
- 'i' (signed) integer
'i'(签名)整数
- 'u' unsigned integer
'u'无符号整数
- 'f' floating-point
'f'浮点
- 'c' complex-floating point
'c'复杂浮点
- 'm' timedelta
'm'timedelta
- 'M' datetime
'M'日期时间
- 'O' (Python) objects
'O'(Python)对象
- 'S', 'a' zero-terminated bytes (not recommended)
'S','a'以零结尾的字节(不推荐)
- 'U' Unicode string
'U'Unicode字符串
- 'V' raw data (void)
'V'原始数据(无效)
However in your first case you actually forced NumPy to convert it to bytes because you specified dtype='S'
. 但是在第一种情况下,您实际上强制 NumPy将其转换为字节,因为您指定了
dtype='S'
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.