简体   繁体   English

在Python 3中使用numpy fromfile将文件读取转换为unicode

[英]Converting bytes read from a file using numpy fromfile to unicode in Python 3

I am trying to read a string of bytes from a file using NumPy fromfile in Python 3. My goal is to convert the bytes to a normal Python 3 string. 我试图在Python 3中使用NumPy fromfile从文件中读取一串字节。我的目标是将字节转换为普通的Python 3字符串。 For example: 例如:

$ echo "1234" > t.txt

Now the file t.txt contains 4 bytes of text. 现在文件t.txt包含4个字节的文本。 Then: 然后:

import numpy as np

values=np.fromfile('t.txt',dtype='|S1',count=4)
print ("values={}".format(values))
values=np.fromfile('t.txt',dtype='|U1',count=4)
print ("values={}".format(values))

gives: 得到:

values=[b'1' b'2' b'3' b'4']
Traceback (most recent call last):
  File "./t.py", line 12, in <module>
    print ("values={}".format(values))
  File "/home/hakon/.pyenv/versions/3.4.2/lib/python3.4/site-packages/numpy/core/numeric.py", line 1715, in array_str
    return array2string(a, max_line_width, precision, suppress_small, ' ', "", str)
  File "/home/hakon/.pyenv/versions/3.4.2/lib/python3.4/site-packages/numpy/core/arrayprint.py", line 454, in array2string
    separator, prefix, formatter=formatter)
  File "/home/hakon/.pyenv/versions/3.4.2/lib/python3.4/site-packages/numpy/core/arrayprint.py", line 328, in _array2string
    _summaryEdgeItems, summary_insert)[:-1]
  File "/home/hakon/.pyenv/versions/3.4.2/lib/python3.4/site-packages/numpy/core/arrayprint.py", line 500, in _formatArray
    word = format_function(a[-1])
UnicodeDecodeError: 'utf-32-le' codec can't decode bytes in position 0-3: codepoint not in range(0x110000)

I would like to obtain a normal Python 3 string like values='1234' . 我想获得一个普通的Python 3字符串,如values='1234' How can this be done? 如何才能做到这一点?

You could use astype to convert the bytes to str: 您可以使用astype将字节转换为str:

import numpy as np

values = np.fromfile('t.txt',dtype='|S1',count=4).astype('|U1')
print(values)
# ['1' '2' '3' '4']

print(values.view('|U4'))
# ['1234']

print(values.dtype)
# <U1

I know the question explicitly asks for np.fromfile , but why not simply use the built-in file interface directly? 我知道问题明确要求np.fromfile ,但为什么不直接使用内置文件接口?

f = open('t.txt', 'r')
values = f.read().rstrip('\n')
f.close()

Note: Python 3 strings are Unicode by default. 注意:默认情况下,Python 3字符串是Unicode。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM