Unicode error in python when printing a list

Question

Edit: http://pastebin.com/W4iG3tjS - the file

I have a text file encoded in utf8 with some Cyrillic text it. To load it, I use the following code:

import codecs
fopen = codecs.open('thefile', 'r', encoding='utf8')
fread = fopen.read()

fread dumps the file on the screen all unicodish (escape sequences). print fread displays it in readable form (ASCII I guess).

I then try to split it and write it to an empty file with no encoding:

a = fread.split()
for l in a: 
    print>>dasFile, l

But I get the following error message: UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-13: ordinal not in range(128)

Is there a way to dump fread.split() into a file? How can I get rid of this error?

Answer 1

Since you've opened and read the file via codecs.open() , it's been decoded to Unicode. So to output it you need to encode it again, presumably back to UTF-8.

for l in a:
    dasFile.write(l.encode('utf-8'))

Answer 2

print is going to use the default encoding, which is normally "ascii". So you see that error with print. But you can open a file and write directly to it.

a = fopen.readlines() # returns a list of lines already, with line endings intact
# do something with a
dasFile.writelines(a) # doesn't add line endings, expects them to be present already.

assuming the lines in a are encoded already.

PS. You should also investigate the io module.

Unicode error in python when printing a list

Question

2 answers

solution1
4 ACCPTED 2012-06-11 10:04:17

solution2
0 2012-06-11 10:04:31

Unicode error in python when printing a list

Question

2 answers

solution1 4 ACCPTED 2012-06-11 10:04:17

solution2 0 2012-06-11 10:04:31

solution1
4 ACCPTED 2012-06-11 10:04:17

solution2
0 2012-06-11 10:04:31