简体   繁体   English

numpy中输入str的默认dtype是什么?

[英]What is the default dtype for str like input in numpy?

I just wanted to confirm if the default data type for string is unicode while creating a ndarray . 我只是想确认在创建ndarray字符串的默认数据类型是否为unicode I could not find any reference which states this clearly. 我找不到任何清楚说明这一点的参考文献。 May be it is too obvious and doesn't need stating. 可能是太明显了,不需要陈述。

When dtype is specified: 指定dtype时:

>>> import numpy as np
>>> g = np.array([['a', 'b'],['c', 'd']], dtype='S')
>>> g
array([[b'a', b'b'],
       [b'c', b'd']], 
      dtype='|S1')

Without specifying the dtype: 没有指定dtype:

>>> g = np.array([['a', 'b'],['c', 'd']])
>>> g
array([['a', 'b'],
       ['c', 'd']], 
      dtype='<U1')

Also, what does the literal b indicate when dtype is specified. 此外,当指定dtype时,文字b表示什么。 As per the documentation, it indicates bool which doesn't seem to be the case here. 根据文档,它表明bool似乎不是这里的情况。

Can some one please clarify? 有人可以澄清一下吗?

b'...' means it's a byte-string and the default dtype for arrays of strings depends on the kind of strings. b'...'表示它是一个字节字符串,字符串数组的默认dtype取决于字符串的类型。 Unicodes (python 3 strings are unicode) are U and Python 2 str or Python 3 bytes have the dtype S . Unicodes(python 3字符串是unicode)是U而Python 2 str或Python 3 bytes都有dtype S You can find the explanation of dtypes in the NumPy documentation here 您可以在NumPy文档中找到dtypes的说明

Array-protocol type strings 数组协议类型字符串

The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters. 第一个字符指定数据类型,其余字符指定每个项目的字节数,Unicode除外,其中它被解释为字符数。 The item size must correspond to an existing type, or an error will be raised. 项目大小必须与现有类型相对应,否则将引发错误。 The supported kinds are: 支持的种类是:

  • '?' '?' boolean 布尔
  • 'b' (signed) byte 'b'(带符号)字节
  • 'B' unsigned byte 'B'无符号字节
  • 'i' (signed) integer 'i'(签名)整数
  • 'u' unsigned integer 'u'无符号整数
  • 'f' floating-point 'f'浮点
  • 'c' complex-floating point 'c'复杂浮点
  • 'm' timedelta 'm'timedelta
  • 'M' datetime 'M'日期时间
  • 'O' (Python) objects 'O'(Python)对象
  • 'S', 'a' zero-terminated bytes (not recommended) 'S','a'以零结尾的字节(不推荐)
  • 'U' Unicode string 'U'Unicode字符串
  • 'V' raw data (void) 'V'原始数据(无效)

However in your first case you actually forced NumPy to convert it to bytes because you specified dtype='S' . 但是在第一种情况下,您实际上强制 NumPy将其转换为字节,因为您指定了dtype='S'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM