Python从文件编码问题中读取

Question

when I read like this, some files 当我这样读，有些文件

list_of_files = glob.glob('./*.txt') # create the list of files
for file_name in list_of_files:
    FI = open(file_name, 'r', encoding='cp1252')

Error: 错误：

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1260: character maps to UnicodeDecodeError：“ charmap”编解码器无法解码位置1260的字节0x9d：字符映射到

When I switch to this 当我切换到这个

list_of_files = glob.glob('./*.txt') # create the list of files
for file_name in list_of_files:
    FI = open(file_name, 'r', encoding="utf-8")

Error: 错误：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1459: invalid start byte UnicodeDecodeError：“ utf-8”编解码器无法解码位置1459中的字节0x92：无效的起始字节

And I have read that I should open this as a binary file. 我已经读过我应该以二进制文件形式打开它。 But I'm not sure how to do this. 但我不知道该怎么做。 Here is my function: 这是我的功能：

def readingAndAddToList():
    list_of_files = glob.glob('./*.txt') # create the list of files
    for file_name in list_of_files:
        FI = open(file_name, 'r', encoding="utf-8")
        stext = textProcessing(FI.read())# split returns a list of words delimited by sequences of whitespace (including tabs, newlines, etc, like re's \s)
        secondaryWord_list = stext.split()
        word_list.extend(secondaryWord_list) # Add words to main list
        print("Lungimea fisierului ",FI.name," este de", len(secondaryWord_list), "caractere")
        sortingAndNumberOfApparitions(secondaryWord_list)
        FI.close()

Just the beggining of my functions matter because I get the error at the reading part 只是我的函数的开始很重要，因为我在阅读部分遇到了错误

Answer 1

If you are on windows,open the file in NotePad and save as desired encoding . 如果您在Windows上，请在NotePad中打开该文件并保存所需的编码。 In Linux , DO the same in text editor. 在Linux中，在文本编辑器中也一样。 hope your program runs. 希望你的程序运行。

Python从文件编码问题中读取

问题描述

1 个解决方案

解决方案1
0 2019-03-19 14:03:25

Python从文件编码问题中读取

问题描述

1 个解决方案

解决方案1 0 2019-03-19 14:03:25

解决方案1
0 2019-03-19 14:03:25