UTF-8解碼ANSI編碼的文件會引發錯誤

Question

這是我想了解的東西。 我的印象是UTF-8向后兼容，因此即使它是ANSI文件，我也可以始終使用UTF-8解碼文本文件。 但這似乎並非如此：

In [1]: ansi_str = 'éµaØc'

In [2]: with open('test.txt', 'w', encoding='ansi') as f:
   ...:     f.write(ansi_str)
   ...:

In [3]: with open('test.txt', 'r', encoding='utf-8') as f:
   ...:     print(f.read())
   ...:
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-3-b0711b7b947e> in <module>
      1 with open('test.txt', 'r', encoding='utf-8') as f:
----> 2     print(f.read())
      3

c:\program files\python37\lib\codecs.py in decode(self, input, final)
    320         # decode input (taking the buffer into account)
    321         data = self.buffer + input
--> 322         (result, consumed) = self._buffer_decode(data, self.errors, final)
    323         # keep undecoded input until the next call
    324         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte

因此，如果我的代碼希望使用UTF-8，並且可能遇到ANSI編碼的文件，則需要處理UnicodeDecodeError。 很好，但是如果有人能對我最初的誤解有所了解，我將不勝感激。

謝謝！

Answer 1

UTF-8向后兼容ASCII 。 不是ANSI。 “ ANSI”甚至沒有描述任何一種特定的編碼。 而且您要測試的那些字符都在ASCII范圍之外，因此，除非您實際使用UTF-8對其進行編碼，否則無法將它們讀取為UTF-8。

UTF-8解碼ANSI編碼的文件會引發錯誤

問題描述

1 個解決方案

解決方案1
1 已采納 2019-06-04 14:00:59

UTF-8解碼ANSI編碼的文件會引發錯誤

問題描述

1 個解決方案

解決方案1 1 已采納 2019-06-04 14:00:59

解決方案1
1 已采納 2019-06-04 14:00:59