如何解码从 Python 中的文件读取的 unicode 字符串？

Question

我有一个包含 UTF-16 字符串的文件。 当我尝试阅读 unicode 时，添加了 " " （双引号）并且字符串看起来像"b'\\xff\\xfeA\\x00'" 。 内置的.decode function 抛出AttributeError: 'str' object has no attribute 'decode' 。 我尝试了几个选项，但没有奏效。

这就是我正在读取的文件的样子

Answer 1

尝试这个：

str.encode().decode()

Answer 2

看起来该文件是通过向其写入字节文字来创建的，如下所示：

some_bytes = b'Hello world'
with open('myfile.txt', 'w') as f:
    f.write(str(some_bytes))

这绕过了这样一个事实，即尝试将字节写入以文本模式打开的文件会引发错误，但代价是文件现在包含"b'hello world'" （注意引号内的“b”）。

解决方案是在写入之前将bytes解码为str ：

some_bytes = b'Hello world'
my_str = some_bytes.decode('utf-16') # or whatever the encoding of the bytes might be
with open('myfile.txt', 'w') as f:
    f.write(my_str)

或以二进制模式打开文件并直接写入字节

some_bytes = b'Hello world'
with open('myfile.txt', 'wb') as f:
    f.write(some_bytes)

请注意，如果以文本模式打开文件，您将需要提供正确的编码

with open('myfile.txt', encoding='utf-16') as f:  # Be sure to use the correct encoding

考虑运行 Python 并设置-b或-bb标志以分别引发警告或异常，以检测字符串化字节的尝试。

如何解码从 Python 中的文件读取的 unicode 字符串？

问题描述

2 个解决方案

解决方案1
0 2020-12-06 14:31:11

解决方案2
0 2020-12-06 15:35:21

如何解码从 Python 中的文件读取的 unicode 字符串？

问题描述

2 个解决方案

解决方案1 0 2020-12-06 14:31:11

解决方案2 0 2020-12-06 15:35:21

解决方案1
0 2020-12-06 14:31:11

解决方案2
0 2020-12-06 15:35:21