使用python错误将'UCS-2 Little Endian'文件编码为'utf8'

Question

I'm trying to encode from UCS-2 Little Endian file to utf8 using python and I'm getting a weird error. 我正在尝试使用python将UCS-2 Little Endian文件编码为utf8 ，我收到了一个奇怪的错误。

The code I'm using: 我正在使用的代码：

file=open("C:/AAS01.txt", 'r', encoding='utf8')
lines = file.readlines()
file.close()

And I'm getting the following error: 我收到以下错误：

Traceback (most recent call last):
  File "C:/Users/PycharmProjects/test.py", line 18, in <module>
    main()
  File "C:/Users/PycharmProjects/test.py", line 7, in main
    lines = file.readlines()
  File "C:\Python34\lib\codecs.py", line 319, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

I tried to use codecs commands, but also didn't work... Any idea what I can do? 我试图使用编解码器命令，但也没有工作......任何想法我能做什么？

Answer 1

The encoding argument to open sets the input encoding. 要open的encoding参数设置输入编码。 Use encoding='utf_16_le' . 使用encoding='utf_16_le' 。

Answer 2

If you're trying to read UCS-2, why are you telling Python it's UTF-8? 如果您正在尝试阅读UCS-2，为什么要告诉Python它是UTF-8？ The 0xff is most likely the first byte of a little endian byte order marker: 0xff很可能是小端字节顺序标记的第一个字节：

>>> codecs.BOM_UTF16_LE
b'\xff\xfe'

UCS-2 is also deprecated, for the simple reason that Unicode outgrew it. UCS-2也被弃用了，原因很简单，Unicode已经超过了它。 The typical replacement would be UTF-16. 典型的替代品是UTF-16。

More info linked in Python 3: reading UCS-2 (BE) file Python 3中链接的更多信息：阅读UCS-2（BE）文件

使用python错误将'UCS-2 Little Endian'文件编码为'utf8'

问题描述

2 个解决方案

解决方案1
5 2017-07-29 20:44:12

解决方案2
3 已采纳 2017-07-29 20:42:57

使用python错误将&#39;UCS-2 Little Endian&#39;文件编码为&#39;utf8&#39;

问题描述

2 个解决方案

解决方案1 5 2017-07-29 20:44:12

解决方案2 3 已采纳 2017-07-29 20:42:57

使用python错误将'UCS-2 Little Endian'文件编码为'utf8'

解决方案1
5 2017-07-29 20:44:12

解决方案2
3 已采纳 2017-07-29 20:42:57