简体   繁体   中英

Unicode error in python when printing a list

Edit: http://pastebin.com/W4iG3tjS - the file

I have a text file encoded in utf8 with some Cyrillic text it. To load it, I use the following code:

import codecs
fopen = codecs.open('thefile', 'r', encoding='utf8')
fread = fopen.read()

fread dumps the file on the screen all unicodish (escape sequences). print fread displays it in readable form (ASCII I guess).

I then try to split it and write it to an empty file with no encoding:

a = fread.split()
for l in a: 
    print>>dasFile, l

But I get the following error message: UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-13: ordinal not in range(128)

Is there a way to dump fread.split() into a file? How can I get rid of this error?

Since you've opened and read the file via codecs.open() , it's been decoded to Unicode. So to output it you need to encode it again, presumably back to UTF-8.

for l in a:
    dasFile.write(l.encode('utf-8'))

print is going to use the default encoding, which is normally "ascii". So you see that error with print. But you can open a file and write directly to it.

a = fopen.readlines() # returns a list of lines already, with line endings intact
# do something with a
dasFile.writelines(a) # doesn't add line endings, expects them to be present already.

assuming the lines in a are encoded already.

PS. You should also investigate the io module.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM