简体   繁体   中英

Unicode encoding when reading from text file

I hope you can help.

I'm trying to take a string and check whether or not it is in a text file called PasswordList. This is the code I have written to do this:

Password = input('Enter a password: ')    
with open('PasswordList.txt') as f:
    Found = False
    for line in f:
        if Password in line: 
            print(line)
            Found = True
    if not Found:
        print('Password is not in list')

If I put in something like the letter "e", it will return the lines which contain it until it hits position 4583 where it returns an error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x82 in position 4853: ordinal not in range(128).

I guess that it has to do with encoding between ascii and unicode, as in Python is trying to use the ascii codec to decode a unicode character?

If I try

print (str((sys.getdefaultencoding())))

Then I get "utf-8" as the default encoding.

I'm stuck, what can I do?

Opening the file with the io module:

import io
with io.open('PasswordList.txt', encoding='cp1252') as f:
    ...

However, you do need to know what encoding the data is in. The file itself usually doesn't contain this information, you have to know how it was created.

To determine the encoding of a file created with Notepad, open the file in Notepad. Select File | Save as from the menu. Near the bottom of the dialog, the current encoding appears in a dropdown (screenshot attached).

Now you can try using codecs.open as suggested by wim.

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM