Python：读取文件会出现 UnicodeDecodeError

Question

我正在尝试编写一个脚本来清除数据 txt 文件中不必要的字符。 我能够成功运行脚本一次，但每次尝试都会给出错误UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 8149: invalid start byte

import codecs
import sys

if len(sys.argv) < 2:
        startFile = "test.txt"
else:
        startFile = sys.argv[1]

finishFile = "newtest.txt"



def cleanFile():
        f = open(startFile, "r")
        #f = codecs.open("GNMFDB.TXT", "r", "utf-8")
        newFile = open(finishFile, "a")

        for line in f:
                line = line.replace("=", "")

                newFile.write(line)


def clearNewFile():
        newFile = open(finishFile, "w")
        newFile.close()


if __name__ == "__main__":
        #startFile = "test.txt"
        #finishFile = "newtest.txt"
        clearNewFile()
        cleanFile()

我知道这个问题与 UTF-8 试图转换为字符串或类似的东西有关。 从 original.txt 文件中复制一些行并将它们放入我在 vim 中创建的单独的.txt 文件中确实会导致脚本每次都成功运行。 我知道编解码器可以用于这种情况，但是当我尝试它时，它给了我类似的错误（因此该行被注释掉了）。

Answer 1

您是否尝试先对其进行编码，然后在将其写入 newFile 时对其进行解码？ 在读取文件时，在这一行中，您首先必须在读取行时对每一行进行编码，然后在每一行上进行工作，然后再次使用 utf-8 对其进行解码： for line in f: line.encode('utf-8') "your code goes here" line.decode('utf-8')你可以尝试的另一个解决方案是将 try 和 except 块放在 for 循环中，以检查它是否发生在所有行或几行中，如果它发生了在几行中，您可能会删除它们，希望对您有所帮助。

Python：读取文件会出现 UnicodeDecodeError

问题描述

1 个解决方案

解决方案1
0 2020-06-17 16:08:57

Python：读取文件会出现 UnicodeDecodeError

问题描述

1 个解决方案

解决方案1 0 2020-06-17 16:08:57

解决方案1
0 2020-06-17 16:08:57