在Python 3中使用numpy fromfile將文件讀取轉換為unicode

Question

我試圖在Python 3中使用NumPy fromfile從文件中讀取一串字節。我的目標是將字節轉換為普通的Python 3字符串。 例如：

$ echo "1234" > t.txt

現在文件t.txt包含4個字節的文本。 然后：

import numpy as np

values=np.fromfile('t.txt',dtype='|S1',count=4)
print ("values={}".format(values))
values=np.fromfile('t.txt',dtype='|U1',count=4)
print ("values={}".format(values))

得到：

values=[b'1' b'2' b'3' b'4']
Traceback (most recent call last):
  File "./t.py", line 12, in <module>
    print ("values={}".format(values))
  File "/home/hakon/.pyenv/versions/3.4.2/lib/python3.4/site-packages/numpy/core/numeric.py", line 1715, in array_str
    return array2string(a, max_line_width, precision, suppress_small, ' ', "", str)
  File "/home/hakon/.pyenv/versions/3.4.2/lib/python3.4/site-packages/numpy/core/arrayprint.py", line 454, in array2string
    separator, prefix, formatter=formatter)
  File "/home/hakon/.pyenv/versions/3.4.2/lib/python3.4/site-packages/numpy/core/arrayprint.py", line 328, in _array2string
    _summaryEdgeItems, summary_insert)[:-1]
  File "/home/hakon/.pyenv/versions/3.4.2/lib/python3.4/site-packages/numpy/core/arrayprint.py", line 500, in _formatArray
    word = format_function(a[-1])
UnicodeDecodeError: 'utf-32-le' codec can't decode bytes in position 0-3: codepoint not in range(0x110000)

我想獲得一個普通的Python 3字符串，如values='1234' 。 如何才能做到這一點？

Answer 1

您可以使用astype將字節轉換為str：

import numpy as np

values = np.fromfile('t.txt',dtype='|S1',count=4).astype('|U1')
print(values)
# ['1' '2' '3' '4']

print(values.view('|U4'))
# ['1234']

print(values.dtype)
# <U1

Answer 2

我知道問題明確要求np.fromfile ，但為什么不直接使用內置文件接口？

f = open('t.txt', 'r')
values = f.read().rstrip('\n')
f.close()

注意：默認情況下，Python 3字符串是Unicode。

在Python 3中使用numpy fromfile將文件讀取轉換為unicode

問題描述

2 個解決方案

解決方案1
2 已采納 2014-10-15 14:49:40

解決方案2
1 2014-10-15 15:15:22

在Python 3中使用numpy fromfile將文件讀取轉換為unicode

問題描述

2 個解決方案

解決方案1 2 已采納 2014-10-15 14:49:40

解決方案2 1 2014-10-15 15:15:22

解決方案1
2 已采納 2014-10-15 14:49:40

解決方案2
1 2014-10-15 15:15:22